RealTime Vs History db price not matching

nitishk12c · April 6, 2024, 3:34pm

I have few queries regarding the realtime and history databse prices.

q-1. Is it posssible to collect historical data in real time databases? if not how can i have the same data in history and real time database.

I tried colllecting data for history and real time database. with following db configs.

Realtime Config.

{'universes': ['ibkr-fx-eurusd-tick'],
 'vendor': 'ibkr',
 'fields': ['BidPrice', 'AskPrice']}

RealTime Aggregate data config.

{'tick_db_code': 'ibkr-fx-eurusd-tick',
 'bar_size': '1m',
 'fields': ['AskPriceClose', 'BidPriceClose']}

History db config.

{'universes': ['idealpro-fx'],
 'vendor': 'ibkr',
 'bar_size': '1 min',
 'bar_type': 'ASK',
 'shard': 'off',
 'fields': ['Open',
  'High',
  'Low',
  'Close',
  'Volume',
  'Wap',
  'TradeCount',
  'DayHigh',
  'DayLow',
  'DayVolume']}

I collected data for 2024-03-29 and compared the prices with infer_timezone=False parameter passed. ideally both databaase should have same prices but following things are happenning.

q-2 There are less number of rows in history database.
q-3 Prices are different for many rows in history vs realtime.

Brian · April 8, 2024, 1:38pm

The fact that you would get slightly different ask prices when sourcing data from two different IBKR API endpoints does not seem very surprising to me, and I doubt there is a way around it. Understanding exactly why you get slightly different results would require knowing exactly how IBKR processes and stores incoming ticks in its historical database, and that information is not available. IBKR does tell us that the real-time ticks they send back are sampled, so it's possible that the historical data comes from the non-sampled (i.e. more complete) data stream, but that's just speculation.

I suggest a pragmatic approach. Collect the history data and backtest with that. Also collect the real-time data (on a forward basis) and as you collect enough of that, backtest with that too. Compare the backtests and pick the one with the worse results for the purpose of decision-making. Some introduction of noise into your backtests (caused by using different data sources) can actually be a good way to reduce the risk of overfitting.