Hi Brian,
Since re-downloading the new zipline minute bundle, I've been experiencing out-of-memory failures during my longer running backtests. After profiling, I isolated the growth in memory usage to data.history()
calls.
Whenever a history call is made for a new batch of stocks, the memory used by the backtest process grows, and that memory is not released for the duration of the backtest. You can replicate with the following backtest code:
import psutil
from zipline.api import sid
def initialize(context):
context.dayCount = 0
return
def before_trading_start(context, data):
context.dayCount += 1
context.minCount = 0
# Get 50 random asset sids from securities master
securities = pd.read_csv('/codeload/ib_master.csv')
context.sids = securities[(securities['Delisted']==0) & (securities['SecType']=='STK')]['Sid'].sample(n=50)
context.assets = []
for s in context.sids:
try:
context.assets.append(sid(s))
except Exception as e:
print('Exception at sid {}: {}'.format(s, e))
return
def handle_data(context, data):
context.minCount += 1
if context.minCount == 1:
print('MEM USAGE BEFORE PULL (DAY {}):'.format(context.dayCount), psutil.Process().memory_percent())
daily_panel = data.history(context.assets, ["price", "open", "high", "low", "close", "volume"], 150, "1d")
if context.minCount == 1:
print('MEM USAGE AFTER PULL (DAY {}):'.format(context.dayCount), psutil.Process().memory_percent())
print('-------')
The output looks something like this:
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 1): 1.7565882120200094
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 1): 2.53061302150199
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 2): 2.5621292061117003
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 2): 3.2926607050990113
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 3): 3.321810736529819
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 3): 3.9942861059729267
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 4): 4.022216470816672
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 4): 4.680665674508553
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 5): 4.725159111604615
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 5): 5.452958557436905
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 6): 5.487938595153874
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 6): 6.2009800752827005
quantrocket_zipline_1|-------
quantrocket_zipline_1|MEM USAGE BEFORE PULL (DAY 7): 6.2403509127130965
quantrocket_zipline_1|MEM USAGE AFTER PULL (DAY 7): 6.932218980890507
This continues to grow until an error like this is thrown:
quantrocket_flightlog_1|2021-04-22 23:53:33 quantrocket.zipline: ERROR the system killed the worker handling the request, likely an Out Of Memory error; please add more memory or try a smaller request
It's worth mentioning that even though the code above pulls data every minute of the trading day, mem usage grows almost entirely in the first minute, when the new assets are pulled for the first time.
Obviously this puts an untenable cap on the duration of any backtest and destabilizes the system. Any thoughts on why this is happening and how to work around?
Thanks,
Paul.