-
Notifications
You must be signed in to change notification settings - Fork 14
Set memory and cpu limit to last-50k #306
Comments
ok so the MemoryPressure complaint is coming from the Node level - so #298 should have been applied at that level, not to a deployment. I'm surprised the However, "Unlike pods and services, a node is not inherently created by Kubernetes: it is created externally by cloud providers like Google Compute Engine, or it exists in your pool of physical or virtual machines." - https://kubernetes.io/docs/concepts/architecture/nodes/#manual-node-administration My Best Guess: So I think this is meant to be configured from our GKE rather than with So my questions before proceeding are:
cc @fevo1971: Sorry to bombard you with more Nodewatcher Devops ruckus but I'd really appreciate your expert eyes on this when you get a chance so that we can approach something more stable with our dashboards cluster. Thanks! |
will take some time later today to figure out what's happening here, and what limit's we are actually running into here. will keep you posted! |
Thank you! 🙏 |
Yes, you are right, the final limit to this is the memory of the node the pod is running on. In our case ~2.7gb. We can set the limits in the container spec:
When i look at the chart it looks to me like the app is leaking memory, gets killed (at ~1.8gb), and gets restarted (until it allocates the <=2gb again) and so on. Do we know for sure that this is not a bug in the software? From what i understand the app is reading data from the rpc-node and writing it into the database, is it expected to require +2gb of memory? And if so, do we have an idea on what the required limit is, so we can set up a new nodepool accordingly. |
My "findings", based mostly on what you can read https://tech.residebrokerage.com/debugging-node-js-memory-problems-d450787d9253 I have a local node with ~700 blocks with activity such as governance proposal.
We can see the following (comparing snapshot 1 and 3): I then did the same, removing any task other than Another experiment I did, my Kusama node got stuck at block 700, so I've let nodewatcher go up until block 700, and just let it "Waiting for finalization or a max lag of 1 blocks.", while taking heap snapshots every minute or so. Note that this is a heap without any task at all (not even The heap increases slower than with tasks, but most importantly, as soon as it waits, it seems that GC is doing a big job. I did the same test with all tasks again, and we can see the same behaviour: My goal was to see if I could reproduce our problem easily --> definitely. |
Just realized, we're launching our node with |
Also ts-node is not recommended to be used in production. TypeStrong/ts-node#104 We should build with tsc and run node lib/index.js |
it should be so i think, but look at the console the nodewatcher deployment hasn't used more than 1GB memorey in the past 30 days.... |
The same as what was done for nodewatcher (#298 and #299 ) deployment should be done for the job:
2 pods where recently evicted with the following in their describe:
and the other
edit: the jobs logs show nothing (1 has no log at all, the other has no error and just stops logging abruptly tasks.)
The text was updated successfully, but these errors were encountered: