Historical price data from IB is available further back than what's in my database

I have a DB with several ETFs, I don't have a start date set in the DB configuration and everything was working fine. However, I noticed last week that the pricing data for some ETFs was only starting as of June 4 2019. If I create a new DB with the same universe and same configurations as the one I already have and re-download the data, I see the pricing that goes back all the way to May 2003 when the ETF launched.

I've noticed this for 2 ETFs on 2 different DBs (none had start dates configured). The ETFs in question are conid: 37080143 and conid: 27443032.

Any idea as to what might be going on? It may be related to the issue I had with the futures head timestamps before.

Btw, the DBs in question are for ADJUSTED_LAST bar types. I have 2 other DBs for the same ETFs with same configurations but for MIDPOINT and TRADES bar types and those are fine.

It's likely that IB provides deeper data for those ETFs now than when you first collected them. One reason this can happen is that when a security switches to a different primary listing exchange, IB often no longer returns the history before the change, but in some cases they might later add it back (perhaps based on complaints). Thus the head timestamp would be later and then be earlier. (Head timestamps are stored by conid and bar type.) There are probably other reasons why IB might backfill data sometimes.

The solution is to purge the head timestamp for that bar type:

sqlite3 /var/lib/quantrocket/quantrocket.history.availability.sqlite 'DELETE FROM HeadTimestamp WHERE ConId = <conid> and BarType = "ADJUSTED_LAST"'

If you want to then collect the new data into an existing database, you can purge the security from the database:

sqlite3 /var/lib/quantrocket/quantrocket.history.<your-db>.sqlite 'DELETE FROM PenultimateMaxDate WHERE ConId = <conid>'
sqlite3 /var/lib/quantrocket/quantrocket.history.<your-db>.sqlite 'DELETE FROM MarketData WHERE ConId = <conid>'

Unlike the similar issue for back month futures, I don't think this is a good candidate for QuantRocket to handle automatically. We expect back month futures with no historical data to eventually have data, so it makes sense for QuantRocket to re-check the head timestamp for those instruments. But we don't expect IB to backfill data with any rhyme or reason, and it would be too much of a performance hit to always re-check head timestamps for the rare times something like this happens.

This worked. However, I don't think it's because there's more info now than before. In fact the way I noticed was because when I ran a script it was giving back results that made no sense. And then I figured out it was because there was only a month's worth of data when it was assuming it was more like a year's worth of data. I've been running this same script weekly since before June 4, so the data was there before.

I've noticed this on a couple more ETFs. I thought maybe it had something to do with the exchange. All except 1 of the missing ETFs are traded on the LSEETF exchange. But not all the LSEETF ETFs have missing data. Also the ETF which I first noticed trades on ARCA. Also, not all are missing from the same dates, some are since June 4th, others June 17th. I know for a fact all these ETFs had data before since I created the DB several months ago and was using it without any problems.

For this DB it isn't a such a big deal since its relatively small and I noticed there was missing data. What worries me is that for large universes, this may happen and scripts may run with missing info without the user noticing.

If I look at another problem ETF, ConId 59262747, I see that the HeadTimeStamp for Adjusted Last is 2009-04-08 and the PenultimateMaxDate is 2019-06-25. But it only has price data since June 4th.

This is likely a bug. Thanks for reporting. Here's what I think is happening.

Background

To detect price adjustments due to splits, dividends, etc., the history service compares the incoming price data to what's already stored in the database, using the record with the penultimate max date as the point of comparison. If the incoming and existing records differ, the history service deletes the entire price history for that conid and re-queues the conid, causing the full and correctly adjusted history to be collected from IB.

Before version 1.3, the history service determined the penultimate max date by directly querying the price data in the MarketData table. For large intraday databases this was too slow, so version 1.3 introduced a new PenultimateMaxDate table in which is stored the penultimate max date for each security. The table is updated each time data is collected and is faster to consult during data collection.

Issue

Some digging has revealed that a bug was introduced in version 1.3 in that the history service does not delete the record from PenultimateMaxDate at the time it deletes the price history from the MarketData table.

Impact

Normally this is a harmless and transient issue because the PenultimateMaxDate table is not consulted when the entire history is re-collected. Internally, the history service knows to collect the entire history and does so, then updates the PenultimateMaxDate with the correct date, and everything is fine.

However, what would happen if the data collection is interrupted for some reason? The interruption could either be due to the user cancelling the collection or because a certain class of re-tryable errors occurred (certain kinds of IB Gateway timeouts or transient OS write errors), causing the data collection to fail and automatically restart. In that case the internal "memory" is lost and the history service must re-consult the (incorrect) PenultimateMaxDate table to know how much data to collect. This would result in only collecting history back to the penultimate max date instead of the full history.

The limited circumstances in which this bug can cause problems probably explains why it's gone undetected since version 1.3.

Solution

A fixed is applied in quantrocket/history:1.7.1. The PenultimateMaxDate table will now be correctly cleared at the same time the price data is deleted when adjustments are detected. See how to update.

This fixes the problem going forward but you also need to force re-collection for affected securities in existing databases.

To see how many securities may be affected in a given database, you can look for securities whose min date is after the date you created the database (note that this approach casts a wide net and might catch false positives like IPOs with genuinely limited data):

$ DB_CREATION_DATE=2019-01-01 # approximate date db was created
$ DB_CODE=my-db
$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "SELECT COUNT(*) FROM (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"

Then delete the history and penultimate max date for those securities:

$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "DELETE FROM PenultimateMaxDate WHERE ConId IN (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"
$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "DELETE FROM MarketData WHERE ConId IN (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"

Then collect data as normal.

Great, thank you for the quick fix!

This topic was automatically closed after 5 days. New replies are no longer allowed.