0

I am running a NodeJS application in Cluster mode: pm2 start server.js -i max

Assume I have 2 clusters (0 and 1)... Our application is a simple NodeJS backend app running Express. However, there is the occasion where there is an unhandled error that causes the app to go down (thus PM2 needs to restart). I am noticing that whenever PM2 restarts on cluster mode, however, it brings ALL clusters down temporarily to restart them ALL...but I simply need the worker thread to restart, not all of the threads that did not error.

Our app needs a few seconds to "reboot" before it can accept connections (needs to connect to DB). Therefore, we don't want the restarting cluster to be 'online' until about 30 seconds after the restart. We tried using --listen-timeout 30000 but the issue still appears to be that all clusters are restarting

Notes:

  • Running on Node 18.7.0
  • Nginx reverse proxy

I created a fake endpoint that has a hardcoded unhandled error to test. Whenever I hit this endpoint, and re-check the status pm2 status it shows all clusters increasing in their "restart quantity" value.

We encountered this issue on PM2 5.2.0 and also tried updating to 5.3.0 and made no difference.

We tried using --listen-timeout 30000 but the issue still appears to be that all clusters are restarting

The app has try/catch error handlers implemented, but the point is that PM2 isn't working as we'd expect. On the off chance we miss an error, we need PM2 to not bring everything down.

0

You must log in to answer this question.

Browse other questions tagged .