Zipline live minute data not mapping correctly

Hi Brian,

I've been trying to figure out why my minute-freq Zipline strategy works fine in backtest but not in live. I've found that the minute data is not correctly being passed to the 'data' object during the live strategy run.

I discovered this by writing the price values from data.current to file each minute in the handle_data method. A screenshot of this file output is here:

To check that the data was being received correctly from IBKR, I queried the aggregate database that is supposed to be mapped to zipline. There, I see that all the data exists and looks correct. A screenshot of this download is here:

As you can see, most of the data passed to the 'data' object in live trading is missing (nan), and when not null, is not accurate. This is the case for all the assets I had under watch (roughly a couple dozen).

My implementation of the realtime data collection in before_trading_start is verbatim from the user guide. Specifically:

if context.arena == 'trade':

    # Request tick data for all assets under watch + SPY 
    sids = [asset.real_sid for asset in context.assets] + ['FIBBG000BDTBL9']
    
    if sids:
        collect_market_data('all-stk-etf-tick-2',
                            sids=sids,
                            until='16:01:00 America/New_York')
        
    # Assign fields for zipline to recognize
    set_realtime_db('usstock-tick-1min-2',
                fields={'close': 'LastPriceClose',
                        'open': 'LastPriceOpen',
                        'high': 'LastPriceHigh',
                        'low': 'LastPriceLow',
                        'volume': 'VolumeClose' })

Tldr, it appears that data is correctly being received and aggregated locally, but is not being fed/mapped to the live zipline strategy correctly. Not sure how to further debug this further. Any ideas?

Additionally, the 'VolumeClose' field in the aggregate db appears to be cumulative volume for the day. The zipline 'volume' field is supposed to be volume traded in the single minute. Is this something that needs to be manually changed in the mapping?

Thanks,
Paul.

Nans can happen because sometimes the stock hasn’t yet traded for that minute (but has by the time you later query the aggregate db). This is explained in the docs and the solution is to request the price field, which is forward-filled.

With regard to Volume, you have a choice with IBKR realtime data. You can use LastSizeSum which is the volume for that minute, but because IBKR data is sampled, it won't reflect every trade. Or you can use VolumeClose which gives you the session volume, and you can do a diff in your code to get the minute volume. There's a new section about this in the docs since this is confusing and complicated.

Thanks for the guidance on volume, I'll try implementing the diff.

As for the nans from tick-sampling at the beginning of the minute, this approach would seem to generate spotty and inaccurate price data. Even for a highly liquid security like SPY, ~50% of minutes return NaN for OHLC.

Any strategy that involves monitoring key price levels (e.g. resistance/support) is infeasible if the 'high' and 'low' come back as nan. (The forward-filled 'price' may not have broken a key support level, but the 'low' may have.) Also, any indicator that relies on these fields, like ATR which uses high, low, and closing price for each interval, is unusable.

Even when the full OHLC set is returned, it's not accurate for the full minute (because as you say in the docs, it happens at the beginning of the minute).

More importantly, this approach differs significantly from the expected data feed in a zipline backtest, which is the full OHLC price set for the previous complete minute. Any backtested strategy code will have been written assuming this is the data that's fed through the 'data' object.

How can we access the last complete minute price data during live trading? Is there a workaround via query of the agg database?

Paul.

To answer my own question (lol), a workaround I'm trying now is to query using data.history and access the last minute with the second to last row of data.

However, I do want to point out that the current behavior of live and backtest are significantly different in a way that will almost certainly lead to hard-to-catch coding errors in strategy.

Specifically, suppose we query the 'data' object as so:

hist = data.history(asset, ['open', 'high', 'low', 'close', 'volume'], 5, '1m')

To access the OHLC of the most recent complete minute in backtest (e.g. to access the data from the 9:30am minute at 9:31am backtest time), we would use:

last_minute = hist.iloc[-1]
last_high = last_minute['high']

# Can also use current()
last_high = data.current(asset, 'high')

However, in live trading, the same data would be:

last_minute = hist.iloc[-2]
last_high = last_minute['high']

# Cannot use current(), will return incomplete minute data, possibly nan

As a suggestion for future releases, perhaps you might consider wiring the 'data' object in live to provide the exact same data output as in backtest? This would minimize rewritting and debugging.

You’re right, to mirror the backtest it should be returning the previous completed minute’s data, not the current partially completed minute’s data. Thanks for catching that. We’ll get a fix out for that.

1 Like

Thanks, I appreciate your responsiveness.

Another smaller issue I experienced happens when I store a list of assets in 'context' (e.g. context.assets). When the context is re-read each day after initialization, the 'symbol' property of every asset in that list has changed to 'USD' (I'm guessing that when the context is dumped, it stores the currency field in place of the symbol).

Fortunately the sid field is correctly picked, and I've worked around it by looking up each asset with sid() each day in before_trading_start.

It's best to avoid storing asset objects in context. See the 2.1.0 release notes for an explanation of the reason. It’s better to store the string sids if you need to store something. Storing an asset will actually raise an exception but it doesn't currently check for lists of assets so you bypassed the warning.

Version 2.3.2 which aligns Zipline's live trading behavior to match backtesting is now available:

1 Like

This topic was automatically closed after 2 days. New replies are no longer allowed.