-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: sometimes debug-port value for child process can be broken #1524
Conversation
In case of long running cluster, if worker processes is restarting periodically, --debug-port value may become out of range [1024, 65535].
Is there a good way to test this? |
May be we can start every test with For example this in v1.x branch will fail:
|
@Olegas that probably wouldn't be accepted. Could we spin off a child process with those arguments (same file) and test it that way? |
Ok, I'll test it this way. |
@Olegas sorry if I was vague in my last response, but it'd be spawning a new child process of the same file (kind of like this one). |
@brendanashworth please take a look, I've added a test case. |
@brendanashworth also I've noted |
@Olegas yes, nothing new should be going on to v1.x at this stage, we've moved to master |
@@ -282,7 +283,7 @@ function masterInit() { | |||
function createWorkerProcess(id, env) { | |||
var workerEnv = util._extend({}, process.env); | |||
var execArgv = cluster.settings.execArgv.slice(); | |||
var debugPort = process.debugPort + id; | |||
var debugPort = (process.debugPort + id) % debugPortRange + 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a little neater to turn the number literal into a constant and use it in the range calculation above.
Aside, it's something of a (not consistently enforced) convention to name constants kFoo
, i.e.:
const kDebugPortStart = 1024;
const kDebugPortRange = (65536 - kDebugPortStart);
@bnoordhuis Fixed |
} | ||
} else { | ||
// iojs --debug-port=65535 test-cluster-debugport-overflow.js master | ||
spawn(process.argv[0], ['--debug-port=65535', __filename, 'master']).on('close', function(code){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nits: line > 80 columns and missing space before brace. If you assign the arguments to a var args
first, you stay under the limit.
Ditto for the line in lib/cluster.js; eyeballing it, it looks to be just over 80 columns.
LGTM sans style errors. |
@bnoordhuis fixed. Splitted line in lib/cluster into 2 statements. |
@@ -285,6 +287,8 @@ function masterInit() { | |||
var debugPort = process.debugPort + id; | |||
var hasDebugArg = false; | |||
|
|||
debugPort = debugPort % kDebugPortRange + kDebugPortStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it earlier today but there is a subtle change in behavior here that's also a potential bug.
Say debugPort equals 12345. That's within the legal range but your commit changes it to 12345 % 65536 + 1024 = 13369.
Worse, with debugPort == 65000, it's changed to 65000 % 65536 + 1024 = 66024 and that's not a legal port number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second is not the case, cause' kDebugPortRange is not 65536, it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... 64512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugPort is equals to 12345 only in cluster's master. First worker created will receive 12346. After my fix it'll receive 12346 % kDebugPortRange + 1024. Yes, behaviour is changed, but I'm wondering was it documented? One have no any control on worker's debugPort value, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, valid point but consider debugPort == 64511: 64511 % (65536 - 1024) + 1024 = 65536.
Apropos the behavior change, I think node-inspector assumes consecutive ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll rewrite this another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I use generators here? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnoordhuis 64511 % (65536 - 1024) + 1024 == 65535.
range == 64512
x % range ∈ [ 0, range )
port == x % range + 1024;
port ∈ [ 0 + 1024, range + 1024 )
port ∈ [ 1024, 64512 + 1024 )
port ∈ [ 1024, 65536 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I use generators here?
Fwiw the answer is yes if it's nicer. :)
@bnoordhuis I've rewrited it from scratch. I think we don't need worker ID in debug port generation. We need just monotonic sequence in allowed range here. |
if (cluster.isMaster) { | ||
cluster.fork().on('exit', function(code) { | ||
process.exit(code); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: semicolon
I was pinged to this by @cjihrig on my related PR at nodejs/node-v0.x-archive#14816; thanks, Colin. Looking forward to a resolution however it comes. I believe this commit will result in debug port collisions in some circumstances (after nextDebugPort has wrapped around and started doubly assigning the port numbers of long-lived processes). |
Before this, the worker would die because the debug port couldn't be listened to because its out-of-range, right? After this PR, the worker will die whenever it happens to be using a debug port that is already in use by another process? That's an improvement, because the former problem is unresolvable, whereas the second will probably be better when the next replacement worker starts: it might get an unused port. I guess there is no reasonable way to modify the debug protocol to include the target pid, and tunnel multiple debug sessions to the master's debug port, and have it then re-direct the session to an ephemeral-but-known debug port of it's worker? Since we are now maintaining a fork of the debug protocol, this might be more approachable now. I reiterate nodejs/node-v0.x-archive#14816 (comment) And in response to nodejs/node-v0.x-archive#14816 (comment), @cjihrig, I'm not certain what, if anything, would break if the worker ID for a new fork was always the lowest free. It would certainly break all kinds of unit tests (mine, also io.js I assume). Its a bit subtle figuring out when a worker ID is "unused", as well. At some points ("point" meaning "versions of node"), in master, cluster workers are removed from Despite that, I like the idea, but I wory its a breaking change: worker IDs are never recycled now, and its easy to have code that relies on this. Having an auxiliary |
What if we'll maintain a list of used debug ports in master and check getDebugPort() value against it? |
Just pick next in case of conflict |
@sam-github, unless I'm missing something, I don't see how this patch will result in a worker dying upon being handed a duplicate debug port, because, as you said, which ports are in use aren't tracked. Unless dupe debug ports will reliably and mercifully cause crashes in some way I haven't tracked down, they'll run with possibly incorrect (and developer-insanity-producing) behavior. If I can get a little time I'll just check this by manually assigning the same debug port to multiple workers. @Olegas, I think that would be great, but I have two reservations: One, it's not necessarily trivial to keep track of what ports are in use and guarantee that there are no collisions. I could be wrong about that, but I feel a bit more confident in my second reservation: that it might be difficult to ensure no collisions in a timely manner, since, after all, this is overhead that will be incurred on every call to fork. |
TL;DRThis PR does not change any behavior. It only fixes a bug. AFAICS no need to change any docs. And it is not an imaginary bug. I hit this in production environment. Details...Before, in Node 0.10.x every process by default use port 5858 for debugger session. And all was fine, until we start using a cluster. When cluster is used, we can connect to specified worker cause' all they share same debug port. So, the fix was introduced - All went fine, until long living cluster starts to recycle it's children for some reason (some internal logic or in case of error and process crash). Debug port must be in range from 1024 to 65535. When cluster is alive for a long period of time and worker processes dying and restarting, after a while we can see a cluster can't start new worker, cause' it is trying to start new worker with debug port out of allowed range. And it can't be fixed, except a full cluster restart. With this PR, debug port numbers will be generated cyclically, starting from 5859 up to 65535 and then again from 1024 to 65535 and so on. I need not to go through 60000 debug ports to find one I need. Every worker has a PID. In my app, it is exposed as a special response header (X-Worker-ID). Knowing a pid, I can get a process and it's args. When I got args, I know the port number. I think what extending a debug protocol is not a good idea here, cause' it is far more complex task and puts a knowledge of clustering mechanisms into debugging. |
This a change in behaviour, the debug port is no longer base + worker ID, it now wraps. Please document. |
Ok. Is https://github.com/nodejs/io.js/blob/master/doc/api/debugger.markdown a right place for it? |
I guess so. Though I don't see any docs at all for the new debug options. Maybe add a section after advanced usage. |
@sam-github fwiw, I'm regularly going through over 60,000 workers in a production environment, never attaching a debugger, and this is still a nuisance. Killing and restarting master on debug port out of bounds error is the workaround. The fact that this is a problem in environments in which the debugger (and thus debug port) will never be used makes me question the approach of attempting to retain unique debug port, since a correct implementation will be slow, and forking is already painful. That said, this PR isn't correct insofar as retaining unique debug port, but it changes crashing behavior into rarely-colliding debug port behavior, it would solve my particular problem, and I'd welcome it with open arms and a Ritter Sport. |
This may sound like a crazy idea, but what if we just didn't set |
@cjihrig, that's what my open PR on Node does. I don't need that flag, and I don't think failing to set it is a crazy idea. |
I'm proposing either removing these lines, or putting them behind a flag. |
@bajtos do you have thoughts? I'm ok with not setting it unless debug was enabled by CLI, but it means that only one node process can be debugged at a time. Its an open question how many people are benefitting from debuggable cluster workers, thought. |
@sam-github I don't have a strong opinion, I think this is really a task of picking a lesser evil. If we decide to disable auto-increment, the we should take care to correctly handle the case where the master process is started in debug mode and cluster workers inherit the debug mode flag too. IIRC, in v0.10, running I think @cjihrig proposal (#1524 (comment)) is a good solution:
|
@Olegas Ah, I misread. You are now incrementing the supplied port number by 1, and loop only when the port number reaches 65535. I don't think that anything significantly better could be done here, though we could try to first populate the 49152-65535 port range, then fallback to 49151-1024 (decreasing). What do you think? |
Worker debug ports are no longer set by default as of 309c0f4. |
Is this still an issue then? |
Technically, yes. You can still run into the same issue, but you would have to do so after explicitly turning on debugging. |
I think we can close this now. And I plan to add some docs to cluster about this "issue" |
Okay, closing per #1524 (comment). |
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
Currently, each cluster worker is assigned an ever increasing --debug-port argument. A long running cluster application that does not use the debugger can run into errors related to the port range. This commit mitigates the problem by only setting the debug port if the master is started with debug arguments, or the user explicitly defines debug arguments for the worker. This commit also adds a new debug port offset counter that is only incremented when a worker is created that utilizes debugging. Fixes: nodejs/node-v0.x-archive#8159 Refs: #1524 PR-URL: #1949 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Oleg Elifantiev <[email protected]>
In case of long running cluster, if worker processes is restarting
periodically, --debug-port value may become out of range [1024, 65535].