Easier steps for re-collecting missing historical data

We're working with ES futures data from IBKR. When we originally collected data, it looks like some days didn't get data for the ESZ0 contract (Thursdays between 10/1 and 7/23, inclusive). Using some other threads in the forum, we put this workaround together:

  1. Identify missing data and contract SID.
  2. Drop last saved date for the contract:
jupyter:/var/lib/quantrocket/quantrocket.v2.history.es-fut-1min.sqlite $ sqlite3 quantrocket.v2.history.es-fut-1min.sqlite 'DELETE FROM PenultimateMaxDate WHERE Sid = "QF000000023069"'
  1. Re-collect history for missing contract:
quantrocket history collect 'es-fut-1min' --sids 'QF000000023069'
  1. (optional) Sync DBs in S3
$ quantrocket db s3push --services 'history'
$ quantrocket db s3pull --services 'history' -f
  1. Drop, recreate and ingest bundle:
$ quantrocket zipline drop-bundle 'es-fut-1min-bundle' --confirm-by-typing-bundle-code-again 'es-fut-1min-bundle'
$ quantrocket zipline create-bundle-from-db 'es-fut-1min-bundle' --from-db 'es-fut-1min' --calendar 'GLOBEX'
$ quantrocket zipline ingest 'es-fut-1min-bundle'

Our specific feedback is that it should be possible to force re-collection of history without modifying the sqlite DB directly. Also, ingesting the bundle didn't pick up the missing data, so we had to drop and recreate the bundle to fix that issue.

There’s not an easier way to do that because QuantRocket shouldn’t miss data. Are you sure that happened and can you provide more information about what happened? If IBKR had data that QuantRocket missed, that’s a serious bug and should be reported. If IBKR was missing the data for that period, QuantRocket can’t fix that but it would still be good to know if you saw that happen.

100% sure there was a problem. We saw data missing from multiple Thursdays between July and early October. Re-running data collection didn't pull in the missing data until we cleared the PenultimateMaxDate value.

If you see missing data again, please check whether the missing data is available in TWS or is also missing from TWS. I can't reproduce this so I can't say what may have happened.

We're seeing this again unfortunately. Collecting data for ES is successful after multiple retries, but there's a gap in the data for this contract:

image

It seems like a whole trading week of data is missing, but I can view it in TWS.

Can you download and post the log output for the data collection run where you saw missing dates?

quantrocket flightlog get -d -m quantrocket_history_1 history.log

I'm still not able to reproduce this, so I need more to go on. I get data for those dates:

jupyter:/codeload $ quantrocket history get es-1min -i QF000000026993 -s 2020-12-11 -e 2020-12-17 -t 15:59:00 -f Date | csvlook -I
| Sid            | Date                      |
| -------------- | ------------------------- |
| QF000000026993 | 2020-12-11T15:59:00-06:00 |
| QF000000026993 | 2020-12-14T15:59:00-06:00 |
| QF000000026993 | 2020-12-15T15:59:00-06:00 |
| QF000000026993 | 2020-12-16T15:59:00-06:00 |
| QF000000026993 | 2020-12-17T15:59:00-06:00 |

You could also enable the IB Gateway API logs in case that provides clues, but this would have to be done in advance of data collection (won't work for past runs). You would want to select the IB Gateway option to include market data in the API log file, as this will reveal exactly what IB Gateway is sending.