Possible bug when fetching prices in Python with Timestamp

Hi,
When using the get_prices function in Python, there seems to be a bug if you use pd.Timestamp to specify a start date.

If i use CLI to get prices for NVDA (FIBBG000BBJQV0) from 2020-09-03 until 2020-10-02, the price in the first row (2020-09-03) is correct (134.9609).

$ quantrocket history get usstock-1d -s 2020-09-30 -e 2020-10-02 -i
 FIBBG000BBJQV0 -f Close

Sid,Date,Close
FIBBG000BBJQV0,2020-09-30,134.9609
FIBBG000BBJQV0,2020-10-01,135.7987
FIBBG000BBJQV0,2020-10-02,130.2903

But when i use get_prices function in Python by specifying start_date with pandas Timestamp, the price in the first row (2020-09-03) is incorrect (541.2200)

from quantrocket import get_prices

get_prices('usstock-1d', sids='FIBBG000BBJQV0', start_date=pd.Timestamp(2020, 9, 30), end_date=pd.Timestamp(2020, 10, 2), fields=['Close'])

Sid               FIBBG000BBJQV0
Field Date                      
Close 2020-09-30        541.2200
      2020-10-01        135.7987
      2020-10-02        130.2903

If i use string to specify start_date, the price in the first row (2020-09-03) is correct (134.9609).

from quantrocket import get_prices

get_prices('usstock-1d', sids='FIBBG000BBJQV0', start_date='2020-09-30', end_date
=pd.Timestamp(2020, 10, 2), fields=['Close'])

Sid               FIBBG000BBJQV0
Field Date                      
Close 2020-09-30        134.9609
      2020-10-01        135.7987
      2020-10-02        130.2903

Am I missing something here?

I’m not able to reproduce getting 541.22 as the 2020-09-30 price. If I run the query with Timestamps, I get correct prices, it’s just that the start date is not inclusive:

get_prices('usstock-1d', sids='FIBBG000BBJQV0', start_date=pd.Timestamp(2020, 9, 30), end_date=pd.Timestamp(2020, 10, 2), fields=['Close'])

Sid               FIBBG000BBJQV0
Field Date                      
Close 2020-10-01        135.7987
      2020-10-02        130.2903

The reason 2020-09-30 get omitted from the query result when using a Timestamp is that the Timestamp gets serialized to a datetime: ‘2020-09-30T00:00:00’, while the dates for an end-of-day database are stored in the database as dates without times, i.e. “2020-09-30”. The dates are compared lexically, and “2020-09-30” is less than ‘2020-09-30T00:00:00’, even though, as a date, it is identical. This isn’t ideal; if you query an end-of-day database with a datetime string, the history service should probably drop the time component so that the returned date range is as expected. For now, you can either use strings or use pd.Timestamp(...).date().

As for the 541 price you show in your output, I can’t reproduce it so I’m not sure what to say about that one.

I think I know why you can't reproduce the problem I reported: Pandas Timestamp awares timezones, and my timezone is different from yours. The following code should reproduce it.

import pandas as pd
from quantrocket import get_prices

start_date = pd.Timestamp('2020-09-30', tz='Asia/Seoul')
end_date = pd.Timestamp('2020-10-02', tz='Asia/Seoul')
get_prices('usstock-1d', sids='FIBBG000BBJQV0', start_date=start_date, end_date=end_date, fields=['Close'])

Of course, it's US equities and I could always specify US timezone or use a string date, but it seems like a potential flaw as an investment environment if it could result in false prices depending on what timezone you're running in.

If this can be fixed, that would be great.

I don't think timezone is the issue. I get the same resulting when specifying Asia/Seoul, and pandas won't default the timezone to your local timezone; it will only apply a timezone if you explicitly specify it and will be tz-naive otherwise.

Is it possible you have a history database called 'usstock-1d' and also have a Zipline bundle named 'usstock-1d'? That could cause what you're seeing. get_prices queries both history databases and Zipline bundles that match the name you specify. As I described above, when you use Timestamps, the history database will (currently) exclude the start date from the results. But this doesn't affect the Zipline bundle, so it would return data for the start date, and get_prices will stitch together the results from the two data sources. The reason for the different price levels (135 vs 541) is because the two data sources apply splits differently: the history service returns data fully adjusted as of today, while the Zipline bundle returns data fully adjusted as of the end date of the query, as described under the Adjustments section in the docs.

If this is what's happening, the solution is to use a different name for the Zipline bundle, e.g. 'usstock-1d-bundle'.

1 Like

Wow, that's true, I renamed the zipline bundle like you said and that fixed the problem. I didn't know that the get_prices queries both history DB and Zipline bundle, and that they handle stock split prices differently. In addition, lexical comparison of date... It's a bit tricky, but I've learned.

Thank you so much as always!