I have successfully loaded about a year of CustomData using the documented procedures and can load them using the zipline pipeline interface.
However, there seems to be a major performance issue if any of the defined fields are of type "str" and in the CcstomFudamental class, the field is pulled using the 'object' type as required. The Performace hit is so significant that I am not sure I can use the CustomFundemantes for any other data type besides 'float'.
Any insight into this would be greatly appreciated.
example timing
top_decile = (dollar_volume_decile.eq(9))
class CustomFundamentals(Database):
CODE = "testset-fundamentals"
LOOKBACK_WINDOW = 180
market_cap_cmpt = Column(float)
entval_cmpt = Column(float)
RBICSFocus_l2_name = Column(object)
pipe = Pipeline(columns={
'market_cap_cmpt': CustomFundamentals.market_cap_cmpt.latest,
'entval_cmpt': CustomFundamentals.entval_cmpt.latest,
# 'RBICSFocus_l2_name': CustomFundamentals.RBICSFocus_l2_name.latest
}, screen=(top_decile)
)
from datetime import datetime
#import timeit
start = datetime.now()
data = run_pipeline(pipe, start_date='2021-04-26', end_date='2021-04-26')
end = datetime.now()
print("Time taken:", end-start)
Time taken: 0:00:03.371350
Now the same pipe with a text column uncommented;
pipe = Pipeline(columns={
'market_cap_cmpt': CustomFundamentals.market_cap_cmpt.latest,
'entval_cmpt': CustomFundamentals.entval_cmpt.latest,
'RBICSFocus_l2_name': CustomFundamentals.RBICSFocus_l2_name.latest
}, screen=(top_decile)
)
from datetime import datetime
#import timeit
start = datetime.now()
data = run_pipeline(pipe, start_date='2021-04-26', end_date='2021-04-26')
end = datetime.now()
print("Time taken:", end-start)
Time taken: 0:01:50.254969
This version is not unusable as the call takes so long for a single days worth of data