You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wanted to start a discussion to see if anyone has ideas on how to improve the reliability of the RPC server.
The recent issues with batch requests caused the behavior where the RPC server stops responding and does not auto-recover. PM2 does not detect the issue and does not restart the service automatically. The node operator has to notice there is an issue and go call pm2 restart. Not ideal when it happens in the middle of the night.
How can we make it more resilient? Can the node detect the problem and restart itself? Currently, the death of the RPC server does not crash the whole service, so PM2/docker cannot know if it needs to be restarted.
Maybe needs more investigation to understand why the server stops responding. without throwing exception
The text was updated successfully, but these errors were encountered:
Is the reason for the RPC dying identified? Because that might be the more important problem to solve. I agree that just normal restarts might not be a good path. This usually means less attention to the node itself...
Monitoring of the RPC dying could be something improved, so that pm2 knows what components of a app die... not sure if this is easy to do within 1 app on pm2... sort of like systemctl allows for processes dependencies and other services.
Wanted to start a discussion to see if anyone has ideas on how to improve the reliability of the RPC server.
The recent issues with batch requests caused the behavior where the RPC server stops responding and does not auto-recover. PM2 does not detect the issue and does not restart the service automatically. The node operator has to notice there is an issue and go call pm2 restart. Not ideal when it happens in the middle of the night.
How can we make it more resilient? Can the node detect the problem and restart itself? Currently, the death of the RPC server does not crash the whole service, so PM2/docker cannot know if it needs to be restarted.
Maybe needs more investigation to understand why the server stops responding. without throwing exception
The text was updated successfully, but these errors were encountered: