Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: ignore websocket statuses 1000, 1001 and 1005 correctly #19172

Merged
merged 2 commits into from
Nov 27, 2023

Conversation

pkazmierczak
Copy link
Contributor

These are "close" messages and not actual errors.

Fixes #19164.

These are "close" messages and not actual errors. Fixes #19164.
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of experience with WS, but I trust in you and @philrenaud testing this 😄

Just missing a CHANGELOG entry.

"close 1005", // CLOSED_NO_STATUS
}, func(s string) bool { return strings.Contains(codedErr.Error(), s) })
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the following when stopping/cancelling an action from the UI on this branch:

{
  "@level": "info",
  "@message": "task exec session starting",
  "@module": "client",
  "@timestamp": "2023-11-26T23:38:31.084156-05:00",
  "action": "echo time",
  "alloc_id": "faa9dbd2-828f-e586-a04c-b0e7923073d9",
  "command": null,
  "exec_id": "f45b2b00-adc1-c9da-252e-01ec12181de5",
  "task": "task",
  "tty": true
}
{
  "@level": "info",
  "@message": "task exec session ended with an error",
  "@module": "client",
  "@timestamp": "2023-11-26T23:38:35.181010-05:00",
  "code": 500,
  "error": "msgpack encode error: io: read/write on closed pipe"
}
{
  "@level": "debug",
  "@message": "alloc exec channel closed with error",
  "@module": "http",
  "@timestamp": "2023-11-26T23:38:35.181245-05:00",
  "error": "websocket: close 1005 (no status)"
}
{
  "@level": "debug",
  "@message": "request complete",
  "@module": "http",
  "@timestamp": "2023-11-26T23:38:35.181884-05:00",
  "duration": 4105460042,
  "method": "GET",
  "path": "/v1/job/actions-demo/action?namespace=default&action=echo%20time&allocID=faa9dbd2-828f-e586-a04c-b0e7923073d9&task=task&group=group&tty=true&ws_handshake=true"
}

This is a nice improvement over what I used to get:

{
  "@level": "info",
  "@message": "task exec session starting",
  "@module": "client",
  "@timestamp": "2023-11-26T23:43:43.882529-05:00",
  "action": "echo time",
  "alloc_id": "a190347c-dd97-27e5-bb9a-5aed4dff33cb",
  "command": null,
  "exec_id": "f120709a-6adc-535c-4fe5-722490312ed3",
  "task": "task",
  "tty": true
}
{
  "@level": "info",
  "@message": "task exec session ended with an error",
  "@module": "client",
  "@timestamp": "2023-11-26T23:43:47.969123-05:00",
  "code": 500,
  "error": "msgpack encode error: io: read/write on closed pipe"
}
{
  "@level": "debug",
  "@message": "alloc exec channel closed with error",
  "@module": "http",
  "@timestamp": "2023-11-26T23:43:47.969413-05:00",
  "error": "websocket: close 1005 (no status)"
}
{
  "@level": "error",
  "@message": "request failed",
  "@module": "http",
  "@timestamp": "2023-11-26T23:43:47.969760-05:00",
  "code": 500,
  "error": "websocket: close 1005 (no status)",
  "method": "GET",
  "path": "/v1/job/actions-demo/action?namespace=default&action=echo%20time&allocID=a190347c-dd97-27e5-bb9a-5aed4dff33cb&task=task&group=group&tty=true&ws_handshake=true"
}

Copy link
Contributor

@philrenaud philrenaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works! The non-error error is now reported as a non-error :)

Thanks, @pkazmierczak !

@pkazmierczak pkazmierczak merged commit 742651f into main Nov 27, 2023
19 checks passed
@pkazmierczak pkazmierczak deleted the b-websocket-close-messages branch November 27, 2023 08:33
tgross added a commit that referenced this pull request Feb 9, 2024
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
tgross added a commit that referenced this pull request Feb 12, 2024
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
tgross added a commit that referenced this pull request Feb 12, 2024
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
tgross added a commit that referenced this pull request Feb 12, 2024
In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.
tgross added a commit that referenced this pull request Feb 12, 2024
…6.x (#19950)

In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this pull request Feb 12, 2024
…5.x (#19949)

In #19172 we added a check on websocket errors to see if they were one of
several benign "close" messages. This change inadvertently assumed that other
messages used for close would not implement `HTTPCodedError`. When errors like
the following are received:

> msgpack decode error [pos 0]: io: read/write on closed pipe"

they are sent from the inner loop as though they were a "real" error, but the
channel is already being closed with a "close" message.

This allowed many more attempts to pass thru a previously-undiscovered race
condition in the two goroutines that stream RPC responses to the websocket. When
the input stream returns an error for any reason (for example, the command we're
executing has exited), it will unblock the "outer" goroutine and cause a write
to the websocket. If we're concurrently writing the "close error" discussed
above, this results in a panic from the websocket library.

This changeset includes two fixes:
* Catch "closed pipe" error correctly so that we're not sending unnecessary
  error messages.
* Move all writes to the websocket into the same response streaming
  goroutine. The main handler goroutine will block on a results channel, and the
  response streaming goroutine will send on that channel with the final error when
  it's done so it can be reported to the user.

Co-authored-by: Tim Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors don't surface to websockets for alloc exec/actions
3 participants