Since 2.5 zipline history (1d) in a 1m rebuilt db crashes zipline

Consider this SID:

Sid,Symbol,Exchange,Country,Currency,SecType,Etf,Timezone,Name,PriceMagnifier,Multiplier,Delisted,DateDelisted,LastTradeDate,RolloverDate
FIBBG000BW5YW1,USG,XNYS,US,USD,STK,0,America/New_York,"USG CORP",1,1,1,2019-04-23,,

I have a pipeline that returns this ticker as a candidate to buy:

Then if you do a data.history at day level (so it invokes resample) and it crashes zipline completely. Some seems to be wrong in resample as it cannot fine the dates


File "sym://qrocket_qrzipline_backtest_py", line 167, in backtest_algo
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/algorithm.py", line 675, in run
        quantrocket_zipline_1|    for perf in self.get_generator():
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 205, in transform
        quantrocket_zipline_1|    for capital_change_packet in every_bar(dt):
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 133, in every_bar
        quantrocket_zipline_1|    handle_data(algo, current_data, dt_to_use)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/utils/events.py", line 218, in handle_data
        quantrocket_zipline_1|    dt,
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/utils/events.py", line 237, in handle_data
        quantrocket_zipline_1|    self.callback(context, data)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/algorithm.py", line 485, in handle_data
        quantrocket_zipline_1|    self._handle_data(self, data)
        quantrocket_zipline_1|  File "QualUptrend.DEF", line 766, in handle_data
        quantrocket_zipline_1|  File "QualUptrend.DEF", line 566, in trade
        quantrocket_zipline_1|  File "zipline/_protocol.pyx", line 121, in zipline._protocol.check_parameters.__call__.assert_keywords_and_call (zipline/_protocol.c:3824)
        quantrocket_zipline_1|  File "zipline/_protocol.pyx", line 711, in zipline._protocol.BarData.history (zipline/_protocol.c:9253)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 967, in get_history_window
        quantrocket_zipline_1|    field, data_frequency)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 806, in _get_history_daily_window
        quantrocket_zipline_1|    assets, days_for_window, end_dt, field_to_use, data_frequency
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 850, in _get_history_daily_window_data
        quantrocket_zipline_1|    assets, end_dt)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/resample.py", line 440, in closes
        quantrocket_zipline_1|    asset, dt, 'close')
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/dispatch_bar_reader.py", line 97, in get_value
        quantrocket_zipline_1|    return r.get_value(asset, dt, field)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 1167, in get_value
        quantrocket_zipline_1|    value = self._open_minute_file(field, sid, force_reload=force_reload)[minute_pos]
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 1098, in _open_minute_file
        quantrocket_zipline_1|    mode='r',
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 1068, in bcolz.carray_ext.carray.__cinit__
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 1273, in bcolz.carray_ext.carray._open_carray
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 751, in bcolz.carray_ext.chunks.__cinit__
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 767, in bcolz.carray_ext.chunks.read_chunk
        quantrocket_zipline_1|ValueError: chunkfile /var/lib/quantrocket/zipline/data/usstock-1min/2020-01-01T00;00;00/minute_equities.bcolz/01/82/018234.bcolz/close/data/__21.blp not found
        quantrocket_zipline_1|

Can you provide code and dates to reproduce this? I get the expected NaNs in a research environment:

>>> from zipline.research import get_data, sid
>>> asset = sid('FIBBG000BW5YW1')
>>> data = get_data('2021-04-01 12:00:00')
>>> data.history(asset, 'close', 3, '1d')
2021-03-30 00:00:00+00:00   NaN
2021-03-31 00:00:00+00:00   NaN
2021-04-01 00:00:00+00:00   NaN
Freq: C, Name: Equity(FIBBG000BW5YW1 [USG]), dtype: float64

It doesn't not go wrong in research on price but I can reproduce it on volume. I has to do with the volume in stocks (not all) and it fails because of the slippage calculation ( algo.set_slippage(slippage.VolumeShareSlippage(volume_limit=1.0, price_impact=0.01)) )

Als in running the algo when there is no data it doesn't return NaN, like in research, but it gets you the error as above.

Consider the date Happy Signal day : '2010-01-04 14:32:00+00:00'

Consider the equities I want to buy:
{Equity(FIBBG000BLQ5V6 [HGSI]): 0.066796051312861468, Equity(FIBBG000BCQZS4 [AXP]): 0.055884167687848284, Equity(FIBBG000BG14P4 [CREE]): 0.092865427099946024, Equity(FIBBG000BV75B7 [TIF]): 0.080959603112773462, Equity(FIBBG000CX0P89 [GPN]): 0.07857843831533895, Equity(FIBBG000BLBXT4 [HOT]): 0.036604303353267344, Equity(FIBBG000P1F8Q7 [HSP]): 0.08671007978351386, Equity(FIBBG000CWKRM9 [MRX]): 0.073816108720469925}

This will cause the follwing error:

Traceback (most recent call last):
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 1093, in _open_minute_file
        quantrocket_zipline_1|    carray = self._carrays[field][sid]
        quantrocket_zipline_1|KeyError: 8370
        quantrocket_zipline_1|
        quantrocket_zipline_1|During handling of the above exception, another exception occurred:
        quantrocket_zipline_1|
        quantrocket_zipline_1|Traceback (most recent call last):
        quantrocket_zipline_1|  File "sym://qrocket_app_py", line 807, in post
        quantrocket_zipline_1|  File "sym://qrocket_qrzipline_backtest_py", line 167, in backtest_algo
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/algorithm.py", line 675, in run
        quantrocket_zipline_1|    for perf in self.get_generator():
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 205, in transform
        quantrocket_zipline_1|    for capital_change_packet in every_bar(dt):
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 119, in every_bar
        quantrocket_zipline_1|    blotter.get_transactions(current_data)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/finance/blotter/simulation_blotter.py", line 345, in get_transactions
        quantrocket_houston_1|172.18.0.4 - - [06/Apr/2021:06:31:10 +0000] "POST /zipline/backtests/QualUptrend.DEF?capital_base=1000000&start_date=2010-01-01&end_date=2012-03-01&progress=M HTTP/1.1" 500 179 "-" "python-urllib3/1.26.3"
        quantrocket_zipline_1|    slippage.simulate(bar_data, asset, asset_orders):
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/finance/slippage.py", line 162, in simulate
        quantrocket_zipline_1|    volume = data.current(asset, "volume")
        quantrocket_zipline_1|  File "zipline/_protocol.pyx", line 121, in zipline._protocol.check_parameters.__call__.assert_keywords_and_call (zipline/_protocol.c:3824)
        quantrocket_zipline_1|  File "zipline/_protocol.pyx", line 347, in zipline._protocol.BarData.current (zipline/_protocol.c:5387)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 524, in get_spot_value
        quantrocket_zipline_1|    data_frequency,
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 473, in _get_single_asset_value
        quantrocket_zipline_1|    return self._get_minute_spot_value(asset, field, dt)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/data_portal.py", line 696, in _get_minute_spot_value
        quantrocket_zipline_1|    return reader.get_value(asset.sid, dt, column)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/dispatch_bar_reader.py", line 97, in get_value
        quantrocket_zipline_1|    return r.get_value(asset, dt, field)
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 1167, in get_value
        quantrocket_zipline_1|    value = self._open_minute_file(field, sid, force_reload=force_reload)[minute_pos]
        quantrocket_zipline_1|  File "/opt/conda/lib/python3.6/site-packages/zipline/data/minute_bars.py", line 1098, in _open_minute_file
        quantrocket_zipline_1|    mode='r',
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 1068, in bcolz.carray_ext.carray.__cinit__
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 1273, in bcolz.carray_ext.carray._open_carray
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 751, in bcolz.carray_ext.chunks.__cinit__
        quantrocket_zipline_1|  File "bcolz/carray_ext.pyx", line 767, in bcolz.carray_ext.chunks.read_chunk
        quantrocket_zipline_1|ValueError: chunkfile /var/lib/quantrocket/zipline/data/usstock-1min/2020-01-01T00;00;00/minute_equities.bcolz/00/83/008370.bcolz/volume/data/__21.blp not found

So it looks to me something goes wrong when Slippage asks for the volume in 1m mode
When I test this postulation I do:

from zipline.research import get_data, sid
asset = sid('FIBBG000BLQ5V6')
data = get_data('2010-01-04 14:32:00+00:00')
data.history(asset, 'volume', 3, '1m')

I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/minute_bars.py in _open_minute_file(self, field, sid)
   1073         try:
-> 1074             carray = self._carrays[field][sid]
   1075         except KeyError:

KeyError: 8370

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-25-4a1f9ae0d051> in <module>()
      2 asset = sid('FIBBG000BLQ5V6')
      3 data = get_data('2010-01-04 14:32:00+00:00')
----> 4 data.history(asset, 'volume', 3, '1m')

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/_protocol.pyx in zipline._protocol.check_parameters.__call__.assert_keywords_and_call()

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/_protocol.pyx in zipline._protocol.BarData.history()

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/data_portal.py in get_history_window(self, assets, end_dt, bar_count, frequency, field, data_frequency, ffill)
    972             else:
    973                 df = self._get_history_minute_window(assets, end_dt, bar_count,
--> 974                                                      field)
    975         else:
    976             raise ValueError("Invalid frequency: {0}".format(frequency))

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/data_portal.py in _get_history_minute_window(self, assets, end_dt, bar_count, field_to_use)
    904             assets,
    905             field_to_use,
--> 906             minutes_for_window,
    907         )
    908 

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/data_portal.py in _get_minute_window_data(self, assets, field, minutes_for_window)
   1061                                                    minutes_for_window,
   1062                                                    field,
-> 1063                                                    False)
   1064 
   1065     def _get_daily_window_data(self,

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/history_loader.py in history(self, assets, dts, field, is_perspective_after)
    547                                              dts,
    548                                              field,
--> 549                                              is_perspective_after)
    550         end_ix = self._calendar.searchsorted(dts[-1])
    551 

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/history_loader.py in _ensure_sliding_windows(self, assets, dts, field, is_perspective_after)
    429                 adj_dts = prefetch_dts
    430             prefetch_len = len(prefetch_dts)
--> 431             array = self._array(prefetch_dts, needed_assets, field)
    432 
    433             if field == 'sid':

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/history_loader.py in _array(self, dts, assets, field)
    593             dts[0],
    594             dts[-1],
--> 595             assets,
    596         )[0]

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/dispatch_bar_reader.py in load_raw_arrays(self, fields, start_dt, end_dt, sids)
    118                                                 end_dt,
    119                                                 sid_groups[t])
--> 120             for t in asset_types if sid_groups[t]}
    121 
    122         results = []

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/dispatch_bar_reader.py in <dictcomp>(.0)
    118                                                 end_dt,
    119                                                 sid_groups[t])
--> 120             for t in asset_types if sid_groups[t]}
    121 
    122         results = []

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/minute_bars.py in load_raw_arrays(self, fields, start_dt, end_dt, sids)
   1272 
   1273             for i, sid in enumerate(sids):
-> 1274                 carray = self._open_minute_file(field, sid)
   1275                 values = carray[start_idx:end_idx + 1]
   1276                 if indices_to_exclude is not None:

/opt/conda/envs/zipline/lib/python3.6/site-packages/zipline/data/minute_bars.py in _open_minute_file(self, field, sid)
   1077                 carray = self._carrays[field][sid] = bcolz.carray(
   1078                     rootdir=self._get_carray_path(sid, field),
-> 1079                     mode='r',
   1080                 )
   1081             except IOError:

bcolz/carray_ext.pyx in bcolz.carray_ext.carray.__cinit__()

bcolz/carray_ext.pyx in bcolz.carray_ext.carray._open_carray()

bcolz/carray_ext.pyx in bcolz.carray_ext.chunks.__cinit__()

bcolz/carray_ext.pyx in bcolz.carray_ext.chunks.read_chunk()

ValueError: chunkfile /var/lib/quantrocket/zipline/data/usstock-1min/2020-01-01T00;00;00/minute_equities.bcolz/00/83/008370.bcolz/volume/data/__21.blp not found

Same error of not being able to locate 8370 in Bcolz

this is pretty critical as I cant run any backtest anymore as everyone will fail at one point.

I made a test file

COMM = 0.0004

import time
from datetime import datetime
import sys
import boto3
import os
import zipline.api as algo
from zipline.pipeline.filters.master import Universe
from zipline.pipeline import Pipeline
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.data.master import SecuritiesMaster #specific for Quantrocket. can be replaced or omitted
from zipline.pipeline.data.sharadar import Fundamentals #might be specific for Quantrocket. replace with unhedged implementation.
from zipline.finance import slippage, commission
from zipline.pipeline.factors import VWAP,CustomFactor, AverageDollarVolume, Returns, BollingerBands,AnnualizedVolatility,DailyReturns
import pandas as pd
import numpy as np
import logbook
log = logbook.Logger('test - ') 
from IPython import embed


      
def initialize(context):
    algo.set_commission(us_equities=commission.PerDollar(cost= COMM))
    algo.set_slippage(slippage.VolumeShareSlippage(volume_limit=0.05, price_impact=0.01)) 
    algo.set_benchmark(algo.sid('FIBBG000BVPJT8'))  #FIBBG00K26BFK1,VFMO    FIBBG000BDTBL9  SPY FIBBG000BVPJT8,PDP
    
    #Schedule Functions
    algo.schedule_function(trade, algo.date_rules.every_day() , algo.time_rules.market_open(minutes=1))


def before_trading_start(context, data):
    #SET FIXED SIDS
    context.tradesid = algo.sid('FIBBG000BLQ5V6')

        
def trade(context, data):
    print(" ! Happy Signal day : {} ".format(algo.get_datetime()) )  

    algo.order_target_percent(context.tradesid, 0.1)

run with:

from quantrocket.zipline import ZiplineBacktestResult

algo= "test"
backtest(
    algo,
    capital_base = 1000000,
    start_date="2010-01-01",
    end_date="2012-03-01", 
    progress="M",
    filepath_or_buffer=file_result)

Thank you for the example code. Investigating...

This should now be fixed if you do another force update:

$ BUNDLE_NAME=usstock-1min
$ curl -X POST "http://houston/zipline/ingestions/$BUNDLE_NAME?force=true"

Postmortem: When the rebuilt bundle was pushed to S3, a few of the metadata files from the old bundle stuck around and thus were synced down with the new bundle. This caused a mismatch between the data layout bcolz expected based on the metadata files, and the actual data layout.

My apologies for the regression.