Retrieving historical data from Alpaca/Polygon.io

"Vendor lockin" is a mischaracterization. QR does a lot more than provide you a data file. It also curates the data, updates it daily, keeps track of security master integration, makes the data available for backtesting, etc. I suspect that's why they don't yet offer the ability to ingest arbitrary third party data, especially at minute resolutions. Again, that's a lot of functionality to support. I also hope 3rd party ingestion will be supported at some point in the future, but even if it is, I'd be surprised if it came with second resolution support.

Having QR automatically manage all the data and broker interaction for me is already worth the license in my opinion. And they're completely upfront about what data support comes along with the license.

You seem to be forgetting that QR already supports 1 second data from IB, its brutally slow to fetch because IB severely limits how fast you can fetch as compared to polygon, but clearly QR already supports 1 second resolutions, its doing it with IB afterall.

In fact all that would be needed would be to 1) allow polygon to populate the master database, 2) allow fetching from polygon in addition to IB and int he same manner IB allows it. Outside of that, considering 1s resolution is already supported by the underlying tech and proven with IB, it should be perfectly feasable.

My guess is that the reason we dont see third-party ingestion of data is because while it would be trivial to implement normally it is more difficult to implement in a way that preserves QR vendor lock in (that is allowing it while still being able to prevent free users from using it). Its hard to see any reason other than vendor lockin that it wouldnt be supported when 1s resolution is already supported from IB.

IB only provides the last 6 months for any bar size <= 30s. So yes, current infrastructure supports it, for very short backtesting.

I see I misunderstood what you meant by vendor lockin. You mean not giving the product away for free. In which case, I am all for vendor lockin. QR wouldn't exist if it was free. Note that Quantopian no longer exists. It was free.

No vendor lockin does not mean "not giving away the product for free"... in this case vendor lockin appears to be done with the intent of not giving it away for free, but that is not what vendor lockin means.

Vendor lockin here means they require you to use their data and not ingest third-party data. It is vendor lockin regardless of if the product is given away for free or not. But it appears that to provide third-party ingestion the creator would have to open up the ability for someone to illegally hack the system and be able to use it for free, this is secondary to the vendor lockin, not analogous to it.

I don't understand your argument. The free version can expose (or not) any functionality QR chooses to. Everything is hackable, with enough effort, since we get the code locally (one of the key reasons for using QR, btw. If you really have alpha, you're crazy uploading raw strategy code to third parties in my opinion). But it's a lot easier just to pay the license fee. If your trading doesn't cover the cost of a QR license, you probably shouldn't be trading :slight_smile:

Anyway, I think we've pretty much exhausted the subject. Good luck with your trading!

I am only trying to understanding the reasoning here, without the owner/developer stating so I can only speculate. But it does appear that the effort to obfuscate is focused around the master DB, and thus the reason third-party ingestion is blocked is because it would have to expose some of what has been obfuscated.

Regardless of anything else what matters nad the crux of the problem is simple.. there is no third party ingestion, or native support for historic polygon data. Whatever the reason or anything else we discuss for me personally those are most haves to continue using the service, they arent available so I will have to stop using the service and seek other solutions.

This is not meant as an attack on the author or you or animosity in any way. Its a simple statement of needs that arent satisfied, but would be and are satisfied by open source platforms (though we would have to reinvent some of what QR does to get there with our own system, and thats fine)... so it is what it is. If the author wishes to provide third party ingestion or polygon historic data in the near future id be happy to stick around, if not, best of luck to him and hope him all the success in the world.

We actually implemented a Polygon historical data integration in a development branch earlier this year but abandoned it when we obtained the minute US stock dataset. The abandoned integration mirrors the Interactive Brokers integration, collecting data from the API and saving it to a SQLite database. Basically exactly what you're asking for.

I don’t plan to make this feature generally available because the only purpose would be for collecting historical second or tick data, and I don’t want to mislead customers by implying that they can save massive amounts of data to a SQLite database and have good performance. Many trading platforms overpromise and underdeliver, and I think that's a bad business strategy.

That said, if you are interested in using this integration, we could explore the possibility of dusting it off and making it available on a custom basis for an additional fee, with the understanding that you would be responsible for loading a small enough amount of data to maintain the level of performance you require. Reach out privately if you’re interested.

As for loading arbitrary custom data, that is frequently requested, and QuantRocket will surely support it in some capacity in the future, but we're not there yet. The challenge is that many users like QuantRocket because it makes working with data easy, but it is basically not possible to create an API that can deliver that same ease of use against "arbitrary" data. An API must be designed in tandem with the underlying data, not operating blindly. This doesn't mean a platform can't support custom data and still deliver good user experience, but it means there is no one-size-fits-all solution and we are still sorting out the best mix of solutions.

2 Likes

Thats fair, and I'd be happy to take any risk. we only need small windows at any one time randomized, so should work well. I would just need to know what the fee would be and i can bring that idea back to the team and see what they think, but sounds like it should work for me, thanks. Just let me know.

On a related note, I was hoping we could use Polygon to import their available fundamental data. It's not nearly as rich as Sharadar or Morningstar, but it's included with a subscription to Alpaca, and includes many of the key data points such as marketcap, etc...

1 Like

I second that as a nice-to-have as well. While not a deal breaker for me that would be a welcome addition.

@Brian Any idea when you can get back to me on a time frame and price on this?

I'm going to need the historical Polygon downloader as well if the usstock data set doesn't included extended hours in intraday data. This is a must for me.

1 Like

@the, it looks like this discussion is dead on arrival because Polygon does not offer second-level aggregation in their historical data endpoint:

$ curl 'https://api.polygon.io/v2/aggs/ticker/AAPL/range/1/second/2020-10-14/2020-10-14?apiKey=***'
{"status":"ERROR","error":"Invalid time span. The only supported resolutions are minute|hour|day|week|month|quarter|year"}

@Brian Do you have the advanced level subscription ($500 a month)? Or are you testing on free level?

second-level as far as I know isnt provided at the free tier or even the lowest level of paid tier. It is only available at the higher priced subscriptions.

Can you post the output you get from the above query? The error message matches their documentation, which indicates that aggregates are only available down to minute level. They provide historical ticks or historical aggregates (bars) of 1 minute or larger, but not historical second bars AFAIK. The historical ticks is a different API endpoint than the one we have the draft integration for.

ahh, it is possible it was historical ticks and not historical aggregates, i need to check. but yea i will run it in a bit just to make certain. Was fairly certain it was 1 second.

@Brian Ok just tested. Seems your right I was getting tick data not second data and even with my paid account that endpoint wont provide second data.

Is ingesting tick data as historical data an option?

In the meantime can you tell me what data set you support that has minute level access to TQQQ and SQQQ? Sharadar seems to only have daily, EDI I cant even get an account on (they dont have any sort of public pricing model or clear setup) and IBKR does provide it (which is where im getting it from now) but I find using IBKR even for minute and daily level data causes QR to become unstable, If i queue up stocks to IBKR to pull down and something else triggers a IBKR query the whole system becomes unstable (even other containers, not sure why) and i have to restart... its making the system unsuable at this point (and no i have no second data right now on QR)

Tick data could possibly be loaded as custom data, once that’s supported (hopefully soon).

Any ballpark figure on when that could be supported?

Not in the 2.4 release which will be out in a week or two but potentially in the release after that. Subject to change.

@Brian this would be hughe to get added. I've since added in a Polygon downloader to my local installation, and it's been a pain to say the least. At this point I'm only using Polygon to download the extended hours minute data, and then I'm stitching that data together with the intraday minute data from the us-stock data set at the time I request it within a backtest. While this totally works, I really would prefer to use Polygon as my primary data provider for all historic and real-time data.

Maybe even taking this a step further would be to simply make it easily to integrate any custom pricing data provider so that you don't have to built integrations for each. i.e. in my case, my plan is to fallback to iqfeed.net if Polygon.io goes away for any reason as I know they're still a funded startup.