-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation API: fix "no path to region" errors for non-global regions #24644
Conversation
effbc77
to
c6fa86f
Compare
c6fa86f
to
a11aab8
Compare
Another way I considered fixing this would be to have the That would prevent this class of bug in general at the cost of making the internals a lot more fiddly, so I thought that might not be the best approach. I'm definitely open to being argued otherwise though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update this comment?
Couldn't link to it in the review.
That comment expressed the correct intent but incorrectly described the behavior before this bug fix. It's correct now. |
In #16872 we added support for unix domain sockets, but this required mutating the `Config` when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients that mutation would happen multiple times and the address would be incorrectly parsed. When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the private `url` field and that causes the URL parsing to be skipped for the new client. This results in the region always being set to the string literal `"global"` (because of mTLS handling code introduced all the way back in 4d3b75d), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader. Arguably the "right" way of fixing this would be for `ClientConfig` not to change the API client's region to `"global"` in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the private `url` field so that the new address is re-parsed. Fixes: #24635 Fixes: #24609 Ref: #16872 Ref: #23785 Ref: 4d3b75d
a11aab8
to
ed755f5
Compare
Sorry @mismithhisler I didn't noticed that a file was in conflict because of a move. Can you 👍 again? |
In #16872 we added support for unix domain sockets, but this required mutating the `Config` when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients that mutation would happen multiple times and the address would be incorrectly parsed. When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the private `url` field and that causes the URL parsing to be skipped for the new client. This results in the region always being set to the string literal `"global"` (because of mTLS handling code introduced all the way back in 4d3b75d), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader. Arguably the "right" way of fixing this would be for `ClientConfig` not to change the API client's region to `"global"` in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the private `url` field so that the new address is re-parsed. Fixes: #24635 Fixes: #24609 Ref: #16872 Ref: #23785 Ref: 4d3b75d
…ddress (#24644) (#24682) In #16872 we added support for unix domain sockets, but this required mutating the `Config` when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients that mutation would happen multiple times and the address would be incorrectly parsed. When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the private `url` field and that causes the URL parsing to be skipped for the new client. This results in the region always being set to the string literal `"global"` (because of mTLS handling code introduced all the way back in 4d3b75d), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader. Arguably the "right" way of fixing this would be for `ClientConfig` not to change the API client's region to `"global"` in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the private `url` field so that the new address is re-parsed. Fixes: #24635 Fixes: #24609 Ref: #16872 Ref: #23785 Ref: 4d3b75d Co-authored-by: Tim Gross <[email protected]>
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead.
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead.
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead.
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead.
In #24644 we set the region to "e2e" but forgot to setup the TLS certificate names appropriately. Swap the region out for a variable instead. Co-authored-by: Tim Gross <[email protected]>
In #16872 we added support for unix domain sockets, but this required mutating the
Config
when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients (as in the autoscaler) that mutation would happen multiple times and the address would be incorrectly parsed.When making
alloc log
,alloc fs
, oralloc exec
calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the privateurl
field and that causes the URL parsing to be skipped for the new client.This results in the region always being set to the string literal
"global"
(because of mTLS handling code introduced all the way back in 4d3b75d), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader.Arguably the "right" way of fixing this would be for
ClientConfig
not to change the API client's region to"global"
in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the privateurl
field so that the new address is re-parsed.Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
Ref: https://hashicorp.atlassian.net/browse/NET-11858
Testing & Reproduction steps
To reproduce, stand up a cluster with
region = "example"
with at least one client, in an environment where you have line-of-sight to all nodes (ex. local development environment should work fine). Deploy a job to that client and runnomad alloc logs :alloc_id
.