Restarting Docker in OCI

clehman7 · July 8, 2025, 5:58pm

I'm experiencing connection issues with my Jupyter Notebook running in OCI and I've been unable to successfully complete backtests in OCI that were successful previously. The error messages indicated a memory issue.

How do I restart / reboot Docker inside of OCI to eliminate to clear memory? FYI, I use the free instance and have 24 GBs of memory. Any other ideas are welcome.

Thank you.

Brian · July 8, 2025, 7:06pm

You would generally want to run

docker --context cloud stats

to see which container is using lots of memory (assuming "cloud" is your docker context name for OCI), then restart one or more containers with

docker --context cloud compose restart zipline # or jupyter, moonshot, etc

However, if you're getting a pandas error message that mentions memory, that's different and would imply that you're loading too much data or have a bug in your code.

clehman7 · July 8, 2025, 11:01pm

I received the below error. Is it normal for a Cloud Instance running Docker to gradually use up memory over time that needs to be cleaned up?

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[110], line 2
      1 from quantrocket.zipline import backtest
----> 2 backtest("Growth_Factor_Strategy",
      3          progress="M", # Use for long running test. 'D'= daily, 'W'=weeky, 'M'=monthly, 'Q'=quarterly,'A'=annually
      4          start_date="2009-01-01", end_date="2025-01-01",
      5          filepath_or_buffer="Strategy_results.csv") 

File /opt/conda/lib/python3.11/site-packages/quantrocket/zipline.py:863, in backtest(strategy, data_frequency, capital_base, bundle, start_date, end_date, progress, params, filepath_or_buffer)
    859     _params["progress"] = progress
    861 response = houston.post("/zipline/backtests/{0}".format(strategy), params=_params, timeout=60*60*96)
--> 863 houston.raise_for_status_with_json(response)
    865 filepath_or_buffer = filepath_or_buffer or sys.stdout
    866 write_response_to_filepath_or_buffer(filepath_or_buffer, response)

File /opt/conda/lib/python3.11/site-packages/quantrocket/houston.py:225, in Houston.raise_for_status_with_json(response)
    223     e.json_response = {}
    224     e.args = e.args + ("please check the logs for more details",)
--> 225 raise e

File /opt/conda/lib/python3.11/site-packages/quantrocket/houston.py:217, in Houston.raise_for_status_with_json(response)
    212 """
    213 Raises 400/500 error codes, attaching a json response to the
    214 exception, if possible.
    215 """
    216 try:
--> 217     response.raise_for_status()
    218 except requests.exceptions.HTTPError as e:
    219     try:

File /opt/conda/lib/python3.11/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1016     http_error_msg = (
   1017         f"{self.status_code} Server Error: {reason} for url: {self.url}"
   1018     )
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/zipline/backtests/Growth_Factor_Strategy?start_date=2009-01-01&end_date=2025-01-01&progress=M', {'status': 'error', 'msg': 'the system killed the worker handling the request, likely an Out Of Memory error; please add more memory or try a smaller request'})

Thanks Brian.

clehman7 · July 8, 2025, 11:35pm

@Brian I did discover that I was running out of Memory so I restarted my OCI Docker instance and that gave back a lot of memory space. However, I attempted to run a backtest in the OCI environment and I received the below error:

HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/zipline/backtests/Growth_Factor_Strategy?start_date=2009-01-01&end_date=2025-01-01&progress=M', {'status': 'error', 'msg': 'no active software subscription found, can only use the free usstock bundle'})

How do a reactivate my subscription?

Thanks.

kevinkurek · July 9, 2025, 4:19am

Some of my suggestions here might help if you're using jupyter notebook inside VS Code. My memory was eaten alive quickly. You can ssh into the OCI instance and look at the PIDs eating the memory beyond the backtest (which seems like the bulk of your problem). My OCI instance hovers around 7-8GB used just running without anything, so I typically have 15GB to play with. Outside of that if it's a large backtest I'll use segmented backtests but it's in Moonshot.

Brian · July 9, 2025, 12:36pm

Does the "no active software subscription" message persist today? You can run quantrocket license get in a JupyterLab Terminal to see if your license shows up as expected. If not, you can re-enter it.

clehman7 · July 9, 2025, 2:33pm

Thank you @kevinkurek. I'll check it out!

clehman7 · July 9, 2025, 2:52pm

@Brian the "no active software subscription" message did NOT persist today. Thank you.

Can I schedule a periodic restart of Docker in Cron for my OCI instance? If yes, what would that look like?

I'm concerned it may not be straightforward because when I was manually doing the restart, via SSH into my OCI instance, there were some issues with a "ModuleNotFoundError: No module named 'distutils'" which resulted in some version compatibility gymnastics. Grok ultimately helped me complete the restart but I'm concerned scheduling the restart could be problematic if it's possible I'm going to run into version compatibility issues on a regular basis.

If I just need to periodically run a manual restart I can do that but it's definitely a bit inconvenient.

Any guidance would be great. Thanks.

clehman7 · July 9, 2025, 3:04pm

@kevinkurek I don't know if this is related but I received a message from Oracle over the weekend apologizing for any disruptions or performance issues that may have been experienced in OCI if you use the "US East (Asburn)" instance (which I do use). I have been running OCI without issue for 3 months but then attempted a backtest and ran into memory issues in Docker on Monday.

kevinkurek · July 9, 2025, 8:09pm

@clehman7 no I don't use OCI's East but appreciate the insight!
Some recommended generating a swap file for instances that were getting out of memory errors but this wasn't my exact problem.

$ ssh [email protected]
$ lscpu                                  # CPU count
$ free -h                                # verify no swap yet. E.g Mem: 23Gi
$ sudo fallocate -l 4G /swapfile         # create swapfile
$ sudo chmod 600 /swapfile               # set permissions
$ sudo mkswap /swapfile                  # format swap
$ sudo swapon /swapfile                  # enable swap
$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab # persist across reboots
$ swapon --show                          # confirm swap
$ free -h                                # verify Swap: E.g Mem: 23Gi & Swap 4.0Gi

Brian · July 10, 2025, 12:49pm

You can clear memory usage by restarting Docker containers from your local machine (with docker compose restart <service>). That would be the recommended way, rather than to SSH into your Oracale instance and restart the entire Docker service (although that's possible).

Before you schedule Docker service restarts (which would be done on the host system crontab), I would suggest monitoring docker stats to see which container is using excessive memory, then troubleshooting that particular container. That’s a more targeted solution.