This is likely a bug. Thanks for reporting. Here's what I think is happening.
Background
To detect price adjustments due to splits, dividends, etc., the history service compares the incoming price data to what's already stored in the database, using the record with the penultimate max date as the point of comparison. If the incoming and existing records differ, the history service deletes the entire price history for that conid and re-queues the conid, causing the full and correctly adjusted history to be collected from IB.
Before version 1.3, the history service determined the penultimate max date by directly querying the price data in the MarketData
table. For large intraday databases this was too slow, so version 1.3 introduced a new PenultimateMaxDate
table in which is stored the penultimate max date for each security. The table is updated each time data is collected and is faster to consult during data collection.
Issue
Some digging has revealed that a bug was introduced in version 1.3 in that the history service does not delete the record from PenultimateMaxDate
at the time it deletes the price history from the MarketData
table.
Impact
Normally this is a harmless and transient issue because the PenultimateMaxDate
table is not consulted when the entire history is re-collected. Internally, the history service knows to collect the entire history and does so, then updates the PenultimateMaxDate
with the correct date, and everything is fine.
However, what would happen if the data collection is interrupted for some reason? The interruption could either be due to the user cancelling the collection or because a certain class of re-tryable errors occurred (certain kinds of IB Gateway timeouts or transient OS write errors), causing the data collection to fail and automatically restart. In that case the internal "memory" is lost and the history service must re-consult the (incorrect) PenultimateMaxDate
table to know how much data to collect. This would result in only collecting history back to the penultimate max date instead of the full history.
The limited circumstances in which this bug can cause problems probably explains why it's gone undetected since version 1.3.
Solution
A fixed is applied in quantrocket/history:1.7.1
. The PenultimateMaxDate
table will now be correctly cleared at the same time the price data is deleted when adjustments are detected. See how to update.
This fixes the problem going forward but you also need to force re-collection for affected securities in existing databases.
To see how many securities may be affected in a given database, you can look for securities whose min date is after the date you created the database (note that this approach casts a wide net and might catch false positives like IPOs with genuinely limited data):
$ DB_CREATION_DATE=2019-01-01 # approximate date db was created
$ DB_CODE=my-db
$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "SELECT COUNT(*) FROM (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"
Then delete the history and penultimate max date for those securities:
$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "DELETE FROM PenultimateMaxDate WHERE ConId IN (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"
$ sqlite3 /var/lib/quantrocket/quantrocket.history.$DB_CODE.sqlite "DELETE FROM MarketData WHERE ConId IN (SELECT ConId FROM MarketData GROUP BY ConId HAVING MIN(Date) >= '$DB_CREATION_DATE')"
Then collect data as normal.