EDI data discrepancy

I downloaded the EDI data pack for 10 exchanges but for some reason I'm not getting the whole dataset for the XFRA exchange. The number of active listings showing in the securities master for XFRA is only 2845 but the vendor shows 16407 active listings for XFRA:

image

I tried collecting listings again for XFRA but it's returning that listings are up to date.

image

The active listings for the other 9 exchanges in the securities master are in line with the vendor.

Any ideas why this is happening?

Thanks

The reason for the discrepancy is that the XFRA history database includes lots of securities whose exchange is something other than XFRA. As a result, querying get_securities(exchanges='XFRA') doesn't return every security in the XFRA history database. For this exchange, you'll have to query the price databases to get all the sids, rather than going straight to the securities master. Because it's so much data, you probably need to query year by year. Something like this will work:

import pandas as pd
from quantrocket import get_prices
from quantrocket.master import get_securities

securities = pd.DataFrame()

for year in range(2007, 2025):
    print(f"querying XFRA for {year}")
    
    prices = get_prices("edi-xfra-1d", start_date=f"{year}-01-01", end_date=f"{year}-12-31", fields="Close")
    
    some_securities = get_securities(sids=prices.columns.tolist())
    
    securities = pd.concat([
        securities,
        some_securities
    ]).drop_duplicates()

This will yield total and active numbers in line with the vendor stats on the website.

Makes sense and your solution worked. Thanks!