Error ingesting ASX data - bin edges must be unique

JayJayD · February 8, 2018, 3:15pm

Hi there!

I'm having some issues ingesting the data from ASX into Zipline.

So far I followed the tutorials and docs until I try to send the data to Zipline using quantrocket.zipline.ingest_bundle, but I got this output:

Seems all observations are NAN. In any case I try to run (just in case, I updated the zipline container to its last version) the up_minus_down algorithm and I get this error:

.

Any hint will be much appreciated.
JJD

Brian · February 8, 2018, 3:32pm

I think the nans during ingestion are probably harmless. The backtest error may be because your universe is too small. Check the logs to see if it's related to the pipeline. The algo pipeline uses deciles and with a small universe it may not be possible to split the data that much. Try again with at least 30-50 stocks in your universe.

JayJayD · February 8, 2018, 4:09pm

Actually, the universe has 135 stocks.

I'm thinking is has to do with zipline calendar and/or timezone issues, what do you think?

Brian · February 9, 2018, 5:56pm

You're right, the error message is ultimately calendar-related. Pipeline is calculated each day, and if the NYSE calendar thinks it's a trading day but all the values are nan that day because it was actually a holiday on ASX (or other non-NYSE exchange), the pipeline will choke with this error. To be precise, this could happen due to any gap in the data, not just due to holidays.

There are a couple solutions.

First, I think nan values should be forward-filled during ingestion. Nan volume should be filled with 0, and nan open/high/low/close should be forward-filled with the last available close. This solution will work for any data gaps regardless of whether they're calendar-related. We're about to release an update with this behavior.

Second, it would be good for zipline to support more calendars, as having a correct calendar is still preferable to forward filling. We plan to add more calendars in the future, but you could implement one sooner by looking at the existing calendar implementations and making a pull request to our fork of Zipline (or to Zipline itself but we'll probably merge it faster than they will). Zipline has a page documenting custom calendar creation.

Regarding the nans during ingestion, I confirmed that this is usually harmless and expected, and is unrelated to the backtest/pipeline error. Zipline warns about any missing days for any symbols in your data. So anytime the data for a symbol doesn't go back as far as the calendar, you'll get these warnings.

Brian · February 9, 2018, 6:52pm

The forward-fill behavior for ingestion is now available by updating the zipline service to 1.1.1.11.

zipline:
  image: 'quantrocket/zipline:1.1.1.11'

Then redeploy the zipline service.

JayJayD · February 9, 2018, 8:45pm

Thank you Brian!

IMO, an unexpected NAN can't be harmless, especially if we're working with software that in the short run will manage real money.

I'm already working in the ASX calendar (for 2018 at least), as soon it is ready I'll share it.

I updated Zipline and I still the up-down algorithm is not running. Seems there are some securities that don’t have price data, even to seed the feedforward. I’ll analyze the data to check if that's the case.

JJD

Brian · February 9, 2018, 11:46pm

Looking forward to the ASX calendar!

Regarding nans, to be clear, what I said is that expected nans are harmless. Nans are expected whenever the price series for two different securities having two different start dates are combined into a shared date index. If one security goes back to 1995 and the other IPO'ed in 2015, the IPO will have nans for dates before 2015. This will result in lots of warnings from zipline when it ingests your data. Unexpected nans are of course another matter. Certainly knowing any pitfalls or limitations of your dataset and designing your algorithm to be resilient to them is a critical part of the process.

My hunch is that your latest error can/should be handled in the algo code itself. The demo algo is meant to get you started but I wouldn't consider it highly robust, although I do hope most users will be able to run it without issue. I've personally run it on US and Japanese stocks. I would start printing and recording values and see what that reveals. And do share anything you discover that might make the demo algo more robust!

Brian · February 17, 2018, 5:31pm