IBKR fails after a few days on several deployments

Very odd behaviour, after a while (a few days running) the IBKR doesn't start anymore. Invariably with the below error:

ibg1:
  msg: '500 Server Error: INTERNAL SERVER ERROR for url: http://houston/ibg1/gateway?wait=True:
    b''{"status": "error", "msg": "IB Gateway should be running but is not accepting
    connections. Please open the IB Gateway GUI to troubleshoot, see http://qrok.it/h/ibgui
    for help. IBC log output is below: privateLabel = ib\\n\\njava.version = 1.8.0_152\\n\\ninstaller.groupId
    = \\n\\njava.ext.dirs = /opt/i4j_jres/1.8.0_152-tzdata2019c/lib/ext:/usr/java/packages/lib/ext\\n\\nsun.boot.class.path
    = /opt/i4j_jres/1.8.0_152-tzdata2019c/lib/resources.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/rt.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/sunrsasign.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/jsse.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/jce.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/charsets.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/lib/jfr.jar:/opt/i4j_jres/1.8.0_152-tzdata2019c/classes\\n\\njava.vendor
    = Oracle Corporation\\n\\nfile.separator = /\\n\\ntwslaunch.autoupdate.serviceImpl
    = com.ib.tws.twslaunch.install4j.Install4jAutoUpdateService\\n\\njava.vendor.url.bug
    = http://bugreport.sun.com/bugreport/\\n\\ninstall4jType = ${installer:installerType}\\n\\nsun.io.unicode.encoding
    = UnicodeLittle\\n\\nsun.cpu.endian = little\\n\\nsun.locale.formatasdefault =
    true\\n\\ninstallerVersion = 2.82.6\\n\\nsun.cpu.isalist = \\n\\n------------------------------------------------------------\\n\\n2023-05-02
    04:20:32:921 IBC: Using default main window manager: null\\n\\n2023-05-02 04:20:32:922
    IBC: Using default config dialog manager\\n\\n2023-05-02 04:20:32:952 IBC: CommandServer
    is starting with port 7460\\n\\n2023-05-02 04:20:33:066 IBC: CommandServer listening
    on addresses: 172.18.0.17,127.0.0.1; port: 7460\\n\\n2023-05-02 04:20:33:066 IBC:
    CommandServer started and is ready to accept commands\\n\\nException in thread
    \\"main\\" java.lang.reflect.InvocationTargetException\\n\\n\\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)\\n\\n\\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\\n\\n\\tat
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\\n\\n\\tat
    java.lang.reflect.Method.invoke(Method.java:498)\\n\\n\\tat sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:388)\\n\\n\\tat
    sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)\\n\\nCaused
    by: java.awt.AWTError: Can\''t connect to X11 window server using \'':0\'' as
    the value of the DISPLAY variable.\\n\\n\\tat sun.awt.X11GraphicsEnvironment.initDisplay(Native
    Method)\\n\\n\\tat sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:65)\\n\\n\\tat
    sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:115)\\n\\n\\tat
    java.security.AccessController.doPrivileged(Native Method)\\n\\n\\tat sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:74)\\n\\n\\tat
    java.lang.Class.forName0(Native Method)\\n\\n\\tat java.lang.Class.forName(Class.java:264)\\n\\n\\tat
    java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:103)\\n\\n\\tat
    java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)\\n\\n\\tat
    sun.awt.X11.XToolkit.<clinit>(XToolkit.java:126)\\n\\n\\tat java.lang.Class.forName0(Native
    Method)\\n\\n\\tat java.lang.Class.forName(Class.java:264)\\n\\n\\tat java.awt.Toolkit$2.run(Toolkit.java:860)\\n\\n\\tat
    java.awt.Toolkit$2.run(Toolkit.java:855)\\n\\n\\tat java.security.AccessController.doPrivileged(Native
    Method)\\n\\n\\tat java.awt.Toolkit.getDefaultToolkit(Toolkit.java:854)\\n\\n\\tat
    ibcalpha.ibc.IbcTws.createToolkitListener(Unknown Source)\\n\\n\\tat ibcalpha.ibc.IbcTws.initialize(Unknown
    Source)\\n\\n\\tat ibcalpha.ibc.Agent.premain(Unknown Source)\\n\\n\\t... 6 more\\n\\nFATAL
    ERROR in native method: processing of -javaagent failed\\n"}\n'''

When I open the VNC it is black (no connection)

When I run in a new environment (move to a different installation), after a few days, same error. Is the environment auto-updating java and that causes the error? @Brian I'm down to our last environment which I expect to fail in a few days. Starts to become urgent for me.

Is it possible the system is under load or using lots of memory at the time this happens? You can run docker stats to see. That would seem like the most plausible explanation to me.

When you start IB Gateway, the container starts xvfb which acts as a virtual display for IB Gateway. It sounds like maybe that is not successfully starting. If you list the processes in the ibg container by running

docker compose exec ibg1 ps auwx

after you've started IB Gateway, you should see something like

/usr/bin/Xvfb :0 -ac -screen 0 1280x800x16 +extension RANDR

in the output, and my guess is you will not when this happens. Normally that kind of hit-and-miss misbehavior of something not starting would point to load (most often high memory usage).

As a stopgap, when it happens, re-deploying to a new environment is the most radical step. Restarting containers is a less aggressive approach that would normally work too. The typical first step should be, run docker stats, see if certain containers are using lots of memory, and if so restart them with docker compose restart <container>. If you're not sure what's a lot of memory per container, post the output of docker stats here.

QuantRocket doesn't update any libraries except when you choose to switch to a new version, so it shouldn't have anything to do with that.