Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: recover from client ACL lookup failures #6423

Merged
merged 1 commit into from
Oct 15, 2019
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Oct 4, 2019

This fixes a bug in the CLI handling of node lookup failures when
querying allocation and FS endpoints.

Allocation and FS endpoint are handled by the client; one can query the
relevant client directly, or query a server to have it forwarded
transparently to relevant client. Querying the client directly is
benefecial to avoid loading servers with IO.

As an optimization, the CLI attempts to query the client directly, but
then falls back to using server forwarding path if it encounters network
or connection errors (e.g. clients are locked down or in a separate
inaccessible network).

Here, we fix a bug where if the CLI fails to find to lookup the client
details because it lacks ACL capability or other unexpected reasons, the
CLI will not go through fallback path.

Fixes #5754

This fixes a bug in the CLI handling of node lookup failures when
querying allocation and FS endpoints.

Allocation and FS endpoint are handled by the client; one can query the
relevant client directly, or query a server to have it forwarded
transparently to relevant client.  Querying the client directly is
benefecial to avoid loading servers with IO.

As an optimization, the CLI attempts to query the client directly, but
then falls back to using server forwarding path if it encounters network
or connection errors (e.g. clients are locked down or in a separate
inaccessible network).

Here, we fix a bug where if the CLI fails to find to lookup the client
details because it lacks ACL capability or other unexpected reasons, the
CLI will not go through fallback path.
@preetapan preetapan added this to the 0.10.1 milestone Oct 4, 2019
@cgbaker
Copy link
Contributor

cgbaker commented Oct 4, 2019

this resolves #5754, correct?

@notnoop
Copy link
Contributor Author

notnoop commented Oct 4, 2019

Yes, thanks @cgbaker for the reference!

@notnoop notnoop requested a review from nickethier October 15, 2019 13:03
errCh <- err
return nil, nil
}
nodeClient, _ := a.client.GetNodeClientWithTimeout(alloc.NodeID, ClientConnTimeout, q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment about logging this error

@@ -244,6 +175,40 @@ func (a *AllocFS) Stream(alloc *Allocation, path, origin string, offset int64,
return frames, errCh
}

func queryClientNode(c *Client, alloc *Allocation, reqPath string, q *QueryOptions, customizeQ func(*QueryOptions)) (io.ReadCloser, error) {
nodeClient, _ := c.GetNodeClientWithTimeout(alloc.NodeID, ClientConnTimeout, q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could debug log this error? I see a future engineer trying to debug why an api call isn't able to hit the client directly and this error would probably be the key bit of knowledge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to best handle it. We currently don't log anything in api package, and introducing logging might be confusing for cli tools, specially if eventually requests succeeds through server.

@notnoop notnoop merged commit bf91e83 into master Oct 15, 2019
@notnoop notnoop deleted the b-direct-node-failure branch October 15, 2019 21:10
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ACL tokens of type 'client' are unable to use the nomad logs command?
4 participants