[Feature Request] Cash Flow Statement Quarterly Data

Costantino · February 21, 2018, 12:12pm

The interim data are almost always quarterly, except for the Cash Flow Statement.
In this case the IB Reuters Reports use cumulative periods: 3M, 6M, 9M.

In order to have data that are comparable, if I set 'quarterly' in the query, the period must be the same for each record.

This is expecially important for strategies migrated from Quantopian to Quantrocket. In Quantopian are the Fundamental Data always quarterly.

Brian · February 21, 2018, 11:19pm

This is probably best handled by an additional parameter to filter by statement lengths. Looking at the entire dataset, the statement lengths can vary widely both for the Cash Flow statements and Income statements. For quarterly statements the statement length might be 3M, 13W, 14W, depending on the company.

Costantino · February 22, 2018, 9:01am

I think the best thing would be to standardize the data, in order to get comparable periods for all the companies and the Income and Cash Flow statements.
Instead of interim (or in addition to), we need the new parameter 'quarterly'.

Another problem with the Cash Flow statements from IB/Reuters is that the interim data are cumulated: 3M, 6M, 9M and 12M. Therefore to get the data only for the last 3 months like in the Income Statement, one has to subtract the value from the previous period (if period > 3M).

Costantino · February 23, 2018, 3:52pm

I forget to mention, that we also need a Zipline-Extensions, for example
from zipline_extensions.pipeline.data import ReutersQuartelyFinancials

Soon or later, I'd like to migrate my Quantopian algorithms to Quanrocket and as Quantopian uses quartely data for the Fundamentals Pipeline, the new extension ReutersQuartelyFinancials would make such a migration a lot easier.

Costantino · February 26, 2018, 4:59pm

To get the most recent measure also the timeframe Trailing Twelve Months (TTM) should be added (of course only for the Income and the Cash Flow Statements).

As my algos use maily quarterly and TTMs data, actually QuantRocket is unfortunately no option for me.
Any plan to add support for "homogeneous timeframes" (annual, quartely and TTM) in fundamentals?

Thanks!
Costantino

Brian · February 27, 2018, 1:45pm

It's likely that we will integrate new datasets, including fundamentals. It's probably less likely that we would modify the timeframes of the Reuters dataset, as doing so would blur the line between the software (QuantRocket) and the data vendor (Reuters) and I'm not sure blurring that line is a good idea.

Costantino · February 28, 2018, 12:45pm

I understand your point, but it's not really necessary to "blur the line". I mean you can save the Reuters data in the database as they are and then leave the user the option to "normalise" them in the query result (or manage a second db with normalised data to speed up the queries).

Except for the annual data, the interim data (which are the most interessant, because more recent) are practically unusable because of a lot of different periods lenghts and units.

For example in my NYSE db for Income Statement there are 13 differents periods!

f = io.StringIO()
download_reuters_financials(["RTLR"],f, universes=["nyse"], interim=True)
financials = pd.read_csv(f, parse_dates=["SourceDate", "FiscalPeriodEndDate"])
(financials['StatementPeriodLength'].map(str) + financials['StatementPeriodUnit']).unique()

array(['12W', '16W', '3M', '6M', '13W', '12M', '14W', '5M', '4M', '1M',
'52W', '2M', '9M'], dtype=object)

For the Cash Flow Statement even worse with 23 different periods!:
['16W', '28W', '40W', '52W', '12M', '3M', '6M', '9M', '26W', '39W', '13W', '12W', '24W', '36W', '53W', '5M', '4M', '7M', '8M', '27W', '10M', '2M', '15M']

Actually is very difficult to perform a simple query like for example "give me the last 4 quarterly result for Total Revenue".

I think the point should be addressed, because it could be the "added value" provided by Quantrocket.
As a programmer I've already written a Java Service that download the Reuters data from IB in XML format and put them in a relational database but the difficult task (and therefore the "added value") would be in the post-process/homogenization of data. That "added value" would contribute to justify a paid subscription for QuantRocket.

Another use case is the following one: I test my algo in Quantopian because of their free long fundamentals data history and then trade it live with QuantRocket. Actually it's not possible without a lot of work, because the Quantopian data are quarterly and yours (in the simplest case) annually.

QuantRocket looks very promising to me and I hope you will consider my request constructively.
We can also ask to the other members of the QuantRocket community, which is their opinion about this point.

Costantino · February 28, 2018, 1:39pm

FYI I understand your concerns... I don't said it's an easy task, maybe it's even not doable...
Below the frequency distribution of the interim timeframes Reuters fundamentals... really crazy
Looking forward to hearing your thoughts!

Income Statements
3M	8244
13W	614
6M	395
12W	34
12M	26
14W	16
16W	14
9M	6
4M	3
1M	2
2M	2
52W	1
5M	1


Cash Flow Statements
12M	2629
9M	2279
6M	2021
3M	1830
39W	188
52W	149
13W	145
26W	143
53W	16
40W	11
16W	7
28W	6
12W	4
7M	4
24W	3
36W	3
10M	2
5M	2
8M	2
15M	1
27W	1
2M	1
4M	1

Brian · February 28, 2018, 2:02pm

I hear you. Normalization of interim periods isn't out of the question, but some of the considerations are:

Right now we still have a lot of work to do to support live trading and real time data. In order of priorities, doing anything about interim fundamentals would be at least a number of months away.
Priority depends on demand. If it turns out everyone requires normalized quarterly fundamentals for QuantRocket to be usable, obviously it will have higher priority. If lots of people don't use fundamentals or can work with annual periods or are willing to normalize within their own code, then it will have lower priority.
Since QuantRocket is not the data vendor, it may be better to crowdsource/opensource the normalization rules; i.e. at the very least let pioneering users explore the data and help influence how to safely and correctly normalize it.
In the end, it may (or may not) be easier/better to integrate a different fundamentals dataset that already has normalized quarterly periods.

Costantino · March 2, 2018, 12:46pm

Hi Brian,

I really appreciate your approach to the question and I agree with you that implementing the live trading and follow the road map should the remain the priority.

I think, it's a good idea to opensource the normalization rules: I candidate myself to explore the data and trying to write a ruleset to correctly normalize it.
I've already started a jupyter notebook to this aim (I coudn't attach it to this post)

Let me know how you would like to proceed!

P.S.: In my background I'm mainly a java programmer but enthusiastic to improve my python knowledge

Costantino · March 2, 2018, 2:24pm

Before I can share with you the complete Jupyter Notebook (or shoud I use pastebin.com?), here are the most important code snippets:

# Convert all periods to weeks
cas_weekly = cas.loc[cas['StatementPeriodUnit']=='M']['StatementPeriodLength']
cas_weekly *= 4
cas.update(cas_weekly)
cas['StatementPeriodUnit']='W'

def normalise(financials, conid, length=12):
    data = financials[financials['ConId']==conid]
    for coa in data['CoaCode'].unique():
        coa_data = data[data['CoaCode'] == coa].sort_values('FiscalPeriodEndDate', ascending=False)    
        temp = coa_data[['Amount', 'StatementPeriodLength']]
        temp = temp - temp.shift(-1)
        temp = temp[temp['StatementPeriodLength'] == length]
        coa_data.update(temp)
        coa_data = coa_data[coa_data['StatementPeriodLength'] == length]
        data.update(coa_data)
    data = data[data['StatementPeriodLength'] == length]
    return data

At least for the Cash Flow Statements works quite well, the only problem is for ADR Stocks.
Is there a way to filter them out?

P.S.: I get a SettingWithCopyWarning data.update(coa_data) but I don't undestand why

mach8 · March 3, 2018, 2:01pm

Most ADRs have "ADR" in the LongName field in the securities master. So the best way to filter them is to export your universe to CSV, open it in Excel or similar, filter and delete the ADR rows, then re-create your universe from the filtered CSV.

Brian · March 5, 2018, 9:46pm

@Costantino it should now be possible to upload .ipynb files.

Costantino · March 7, 2018, 1:48pm

Here the .ipynb files use to test the normalisation:
Fundamentals CAS.ipynb (262.3 KB)
Fundamentals INC.ipynb (197.0 KB)

After excluding companies with ADR in the name and no sector, there were still a few names that couldn't be normalised because of missing data. For the Cash Flow Statement:

KONINKLIJKE PHILIPS NVR- NY
CHINA EASTERN AIRLINES-ADS
UNILEVER N V -NY SHARES
TSAKOS ENERGY NAVIGATION LTD
AMIRA NATURE FOODS LTD
GLOBANT SA
PAGSEGURO DIGITAL LTD
HUAMI CORPORATION
CORP AMERICA AIRPORTS SA
FOUR SEASONS EDUCATION CAYMA

For the Income Statements:

CHINA EASTERN AIRLINES-ADS
UNILEVER N V -NY SHARES
AMIRA NATURE FOODS LTD
CORP AMERICA AIRPORTS SA

Costantino · June 14, 2018, 3:34pm

Hi,

I've just seen the new page about Reuters: Reuters Financials Data Guide
Nice!

There is any update regarding the problem with the differents timeframe discussed in this threads?

Thanks!