Use usstock-1min to create 5 minutes bars for backtesting

Is there a way to create 5 minute bars use the usstock-1min DB? I've considered resampling or had wondered that maybe aggregate dbs can be created for non-tick dbs?

Thanks!

The best approach would be to load the data into pandas, resample it, then store the resampled data in a custom database, which you can then use for backtesting. You could create a custom script to keep the custom database up to date with new data. This would work best if you're working with a selection of symbols; the custom database size will likely be an issue if you try to resample the entire usstock-1min bundle.

Would you happen to have any snippets on how to accomplish this? The date time multi-index format of usstock-1min is something relatively new to me and is causing some friction!

Thanks!

You can convert the (Date, Time) MultiIndex into a single-level datetime index:

closes.index = pd.to_datetime(
    closes.index.get_level_values("Date").astype(str) + ' ' + closes.index.get_level_values("Time")).tz_localize("America/New_York")

Then you can resample:

closes.resample('5min').last()

# highs.resample('5min').max()
# etc

Hey Brian,

working through setting up the custom db. Found an error in the documentation here:

In the second example for intraday data sets, tz is not a variable for the pd.to_datetime function:

Thanks for catching that, it should be pd.to_datetime(...).tz_localize('America/New_York') if the data doesn't already have tz offsets.

I am noticing the zipline bundle for usstock-1min doesn't have after hours market data and jumps from the end of day close to the open of the next trading day.
for example:

In backtesting this lets the indicators continue to calculate reset.

In my newly created 5min db. it looks like the resample added a bunch of empty rows for this missing hours.

I am okay with dropping the row with no data before ingesting into the database going forward, but my ideal would be to have the overnight data as it could change how our indicators calculate.

Let me know if that is possible and if it isn't, if dropping the blank rows would be ideal.

Thanks!

The bundle covers regular trading hours only, so your best bet is to drop the rows outside regular trading hours.

Does get_prices or moonshot add rows for the missing times? So, if my bundle only has 9:30-5 pm est trading data, will it wrap the 0-9:29 and 5:01-12:59 times around the dataframe?

If so, should I be forward filling my dataframe to carry over the end of day data points to the beginning of the next day?

No, it doesn’t add extended hours.