Minute data ingest error + missing data

I have QR running on two machines. The first I use for live trading, and ingests minute data every morning before the market opens. On this machine, the ingestion this morning appeared to work correctly, but several assets in US equities were missing entire days of minute data, from 2-2-21 to 2-8-21 (ex: CPRI). Other assets (ex: SPY) were not missing this data.

I encountered this missing data through zipline live data pulls using data.history and verified with quantrocket.get_prices that the data was simply missing in the bundle. I tried re-ingesting the bundle after the market closed, but it said already up to date.

To investigate, I switched to my second machine, which I use for backtesting and does not ingest the bundle every morning. I tried ingesting the bundle (after redownloading the entire bundle just two days ago for the ‘May 8 2020’ error) and got this:

quantrocket_zipline_1|ingesting 7 2021-02-08 minute prices for sid FIBBG00YHL8F17 (9317 of 9664)
quantrocket_zipline_1|ingesting 10 2021-02-08 minute prices for sid FIBBG00YHLR8D9 (9318 of 9664)
quantrocket_zipline_1|ingesting 8 2021-02-08 minute prices for sid FIBBG00YHLTL02 (9319 of 9664)
quantrocket_zipline_1|ingesting 72 2021-02-08 minute prices for sid FIBBG00YHM5WR3 (9320 of 9664)
quantrocket_zipline_1|ingesting 272 2021-02-08 minute prices for sid FIBBG00YHMN5W6 (9321 of 9664)
quantrocket_zipline_1|ingesting 7 2021-02-08 minute prices for sid FIBBG00YHMRCD8 (9322 of 9664)
quantrocket_zipline_1|ingesting 26 2021-02-08 minute prices for sid FIBBG00YHW2GC5 (9323 of 9664)
quantrocket_zipline_1|Exception in thread zipline_minute_ingester_3:
quantrocket_zipline_1|Traceback (most recent call last):
quantrocket_zipline_1|  File "sym://qrocket_qrzipline_bundles_usstock_usstock_py", line 519, in wrapper
quantrocket_zipline_1|  File "sym://qrocket_qrzipline_bundles_usstock_usstock_py", line 817, in _ingest_minute_prices_for_sid
quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 731, in write_sid
quantrocket_zipline_1|    self._write_cols(sid, dts, cols, invalid_data_behavior)
quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 792, in _write_cols
quantrocket_zipline_1|    self.pad(sid, day_before_input)
quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 660, in pad
quantrocket_zipline_1|    new_last_date, date)
quantrocket_zipline_1|AssertionError: new_last_date=NaT != date=2021-02-05 00:00:00+00:00
quantrocket_zipline_1|During handling of the above exception, another exception occurred:
quantrocket_zipline_1|Traceback (most recent call last):
quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/threading.py", line 916, in _bootstrap_inner
quantrocket_zipline_1|    self.run()
quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/threading.py", line 864, in run
quantrocket_zipline_1|    self._target(*self._args, **self._kwargs)
quantrocket_zipline_1|  File "sym://qrocket_qrzipline_bundles_usstock_usstock_py", line 521, in wrapper
quantrocket_zipline_1|NameError: name 'self' is not defined

This is the first time I’ve ever had an error ingesting the bundle. Any idea what’s happening?

Slight correction on the above - the ingest error is happening on both machines and is probably the cause of the missing data. The ingest process aborts before a few hundred securities are updated.

Did you try re-running the ingestion again? Unfortunately I’m not able to reproduce these issues. Minute data for CPRI is present in the bundle for the dates you indicate. And doing an incremental ingestion of the 2021-02-08 minute prices, as in your log output, doesn’t have an issue. Not sure what’s different about your situation.

Yes the same error happens each day. The only thing I can think of is maybe the ingestion isn’t compatible with a newer version of numpy I installed in the zipline container? (I had to upgrade to install the version of scikit-learn I need, which I realize is a challenge given the outdated zipline env). Is the ingestion dependent on specific versions of numpy or other libs?

Also, this is quite difficult for me to debug, because I can only try the ingestion once a day. Once the error pops, I can’t try again, because it still marks the bundle as up-to-date (even though several hundred securities haven’t been ingested). Is there a way to force only the daily ingestion without reloading the entire bundle again?

Zipline is highly sensitive to versions so if you change versions I would not expect everything to just work. The versions the container ships with are the only ones we can officially support.

If you choose to customize versions, before reporting an issue, please make sure you can reproduce the issue in the container as we provide it, so we’re not spending time trying to track down an issue that is specific to your customizations.

Ok I’ll test the ingestion tomorrow with a freshly created zipline container.

If I have to keep the libraries as they ship in the zipline container, then I assume I’ll need to use a separate container to compute with higher versions of scikit-learn and then create an endpoint that can be called by zipline.

Would you recommend using the satellite container for this? If so, is there any built-in api framework or should I spin up something like flask? I’ve read the docs on satellite.execute_command, but the disk io doesn’t seem conducive to being called several times a minute.

Ingestion worked normally today, so it probably was a problem with numpy versioning.

It was also fairly straightforward to set up a new container with newer libraries within the quantrocket network and call it for scikit-learn computes, thanks for the documentation!