The reason for the discrepancy is that the XFRA history database includes lots of securities whose exchange is something other than XFRA. As a result, querying get_securities(exchanges='XFRA')
doesn't return every security in the XFRA history database. For this exchange, you'll have to query the price databases to get all the sids, rather than going straight to the securities master. Because it's so much data, you probably need to query year by year. Something like this will work:
import pandas as pd
from quantrocket import get_prices
from quantrocket.master import get_securities
securities = pd.DataFrame()
for year in range(2007, 2025):
print(f"querying XFRA for {year}")
prices = get_prices("edi-xfra-1d", start_date=f"{year}-01-01", end_date=f"{year}-12-31", fields="Close")
some_securities = get_securities(sids=prices.columns.tolist())
securities = pd.concat([
securities,
some_securities
]).drop_duplicates()
This will yield total and active numbers in line with the vendor stats on the website.