Bug Fix - utf-8 error from loading master file

Hi Team,

I’m not sure what is the correct way to submit a pull request for a bug fix . Please advice the proper way of doing it.

I encounter the below error when executing the python snippet of loading master file:

from finclab.master import get_securities
df_master = get_securities(exchanges=['XNYS', "XSHE"], fields="*")

Error Message: ‘utf-8’ codec can’t decode byte 0xe2 in position 1023: unexpected end of data

It is caused by the problematic string is below – note the string ‘\xe2’ at the end:
b’ww.sec.gov/cgi-bin/br … \xe2’

To fix this error, the below suggestion would work:

** Target File: quantrocket/cli/utils/files.py**
** Target Line: 30**

FROM:
chunk = chunk.decode(“utf-8”)
TO:
chunk = chunk.decode(“utf-8”, ‘ignore’)

Best regards,

Peter

Can you provide a specific Sid and field that has the problematic string?

It is an error triggered during loading the below row – believe the Sid is one of the fields:

b’0,OTCM,“BECTON DICKINSON AND CO”,“Health Care”,PUBLIC,“Preferred Stock”,“Surgical and Medical Instruments and Apparatus”,3841,Manufacturing,“Surgical, Medical, And Dental Instruments And Supplies”,“Measuring, Analyzing, And Controlling Instruments; Photographic, Medical And Optical Goods; Watches And Clocks”,BDXA\r\nFIBBG00GPWC5T0,SWP,OTCM,US,USD,STK,0,America/New_York,“STANLEY BLACK & DECKER I”,1,1,1,2020-05-14,0000093556,“United States of America”,“United States of America”,USD,2018-08-28,Delisted,2017-05-22,Inactive,“Household Products - Durables”,0,US,US,39193,“Stanley Black & Decker Inc”,2020-05-14,SBDKU,OTCM,PSGM,America/New_York,“Stanley Black & Decker Inc”,XNYS,2017-05-12,“2020-05-18 04:47:18”,5139129,STP,“Stapled Security”,Units,“Hand and Edge Tools, Except Machine Tools and Handsaws”,3423,Manufacturing,“Cutlery, Handtools, And General Hardware”,“Fabricated Metal Products, Except Machinery And Transportation Equipment”,"",“NEW YORK”,BBG00GPWC5T0,0,Pfd,“STANLEY BLACK & DECKER I”,"SWK 5 \xe2\x85’

I’m not able to reproduce an issue with this sid, or, so far, any other sid. If you have example code to reproduce the issue, please post it. I noticed you’re importing from finclab which is non-standard, so I’m not sure what you’re doing but please see if you can re-create the issue in a standard JupyterLab console or notebook.