-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use random:uniform instead of os:pid when constructing node name in nodetool #868
Conversation
…in append_node_suffix Borrowing from https://github.com/basho/node_package/blob/4.0/priv/base/nodetool#L195, this will help reduce the risk of hitting the atom table limit, as was reported by one of our customers who was calling riak-admin continuously and frequently enough to trigger the atom table overflow.
This looks good, thanks. I'll probably merge soon. But I wanted to note that in OTP 23+ nodetool is no longer used and this issue does not exist. Obviously still worth it to be fixed for those using pre-23, just wanted to mention it :) |
@tsloughter Indeed, I read that note and slightly pondered if it's worth while bothering. But, on reflection, it seems it still does :) |
Hm, the shelltestrunner tests and the tests on windows fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should random
be replaced by rand
as random
is deprecated?
Yes, definitely should replace with the newer stuff where available. |
Instances of nodetool generate random node name suffxes to facilitate running multiple simultaneous calls in parallel. However, each time nodetool connects to the target node, a new atom is created on the latter. If this happens frequently and/or long enough, it will eventually crash the node as it hits the atom table limit. As a workaround, if the caller can guarantee calls are serialized and isolated in time, defining an env variable $NODETOOL_NODE_PREFIX will create identical atoms for node name prefix, thus avoiding generation of new atoms. The proposed change is complimentary to erlware#868, aiming to address the issue, reported by one of our customers, in which a riak node hit the atom table limit (yes, all of 1M+ entries) and crashed. A postmortem showed the table filled with `[email protected]`, accumulated over a period of time resulting from calls to `riak admin status` every 5 min. Note that I did not attempt to do any changes that may need to be done, to the same effect, in extended_bin_windows, as it's not straightforward for me which they would be (my knowledge of scripting in Windows is some 30 year old).
@tsloughter After the approval, what is the current state of this PR? Is there anything I can do to help? |
Hey, sorry about that. I don't know what the hell is going on with CI... there is at least 1 other PR that should be passing CI but isn't that I also want to merge and cut a release with. |
Could you repush so it kicks of CI again? There isn't even a "rerun" option anywhere like there usually is... |
Once it's in, there's a more substantial #871. |
Borrowing from https://github.com/basho/node_package/blob/4.0/priv/base/nodetool#L195, this is to help reduce the risk of hitting the atom table limit, as was reported by one of our customers who was calling riak-admin continuously and frequently enough to trigger the atom table overflow.