Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to exec into task: malformed HTTP response #10922

Closed
scalp42 opened this issue Jul 21, 2021 · 12 comments · Fixed by #10958
Closed

failed to exec into task: malformed HTTP response #10922

scalp42 opened this issue Jul 21, 2021 · 12 comments · Fixed by #10958

Comments

@scalp42
Copy link
Contributor

scalp42 commented Jul 21, 2021

Nomad version

Nomad v1.1.2

Operating system and Environment details

MacOS / Ubuntu

Issue

We just upgraded Nomad from 1.0.4 to 1.1.2 and we can't alloc exec into tasks anymore.

Nomad Server logs (if appropriate)

Can't find any error on server side.

Nomad Client logs (if appropriate)

failed to exec into task: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"
@scalp42
Copy link
Contributor Author

scalp42 commented Jul 21, 2021

Just tested using client v1.1.1 and it's working fine.

$> which nomad; nomad version ; nomad alloc exec -job prod-platform-task-scheduler sh
/usr/local/bin/nomad
Nomad v1.1.2
failed to exec into task: malformed HTTP response "\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x80\x00\x04\x00\x01\x00\x00\x00\x05\x00\xff\xff\xff\x00\x00\x04\b\x00\x00\x00\x00\x00\u007f\xff\x00\x00\x00\x00\b\a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"

With Nomad client v1.1.1:

$> ./nomad-1.1.1.sh alloc exec -job prod-platform-task-scheduler sh
#

@johnalotoski
Copy link

Confirmed -- works on Nomad 1.1.1, fails to exec on 1.1.2:

# Nomad v1.1.1
$ nomad --version
Nomad v1.1.1

$ nomad exec -task $TASK -namespace $NS $ALLOC /bin/bash
bash-4.4$

# Nomad v1.1.2
$ nomad --version
Nomad v1.1.2

$ nomad exec -task $TASK -namespace $NS $ALLOC /bin/bash
failed to exec into task: unexpected EOF

@notnoop
Copy link
Contributor

notnoop commented Jul 27, 2021

Thanks for reporting the bug. I'll look into it this week.

@notnoop
Copy link
Contributor

notnoop commented Jul 27, 2021

So I have tried to reproduce this without much luck sadly. I would appreciate more context here:

  • The versions of the servers, clients, and CLI involved? Is there a deviance between the components?
  • Whether there is a proxy or TLS involved? What configurations are used?

In my setup, I have tried a 3 server 2 client clusters, all running 1.1.2, with and without mutual TLS.

Thanks again!

@evandam
Copy link

evandam commented Jul 27, 2021

Hi @notnoop, we're seeing this with Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244) for both the server and client/CLI.

I see what you mean, it's hard to reproduce for me as well. Running Nomad locally or logging into a Nomad client and running nomad alloc exec ... works fine, but seemingly not from a remote client if it makes sense. We're seeing the issue only when accessing a Nomad server behind an AWS ALB.

CLI v1.1.2 (macOS) -> AWS ALB -> Nomad server v1.1.2 (Ubuntu 18.04)

I can confirm this does work when the CLI version is v1.1.1 and Nomad server is still 1.1.2.

I'll continue to poke around a bit to see if I can find a way to reliably reproduce the issue.

@notnoop
Copy link
Contributor

notnoop commented Jul 27, 2021

@evandam That is very useful info! My hunch is that it's related attempting http/2 connections change in https://github.com/hashicorp/nomad/pull/10778/files#r653842473 .

Can you try compiling latest main branch but with the following diff:

diff --git a/api/api.go b/api/api.go
index d1f985dbe..a69c32ead 100644
--- a/api/api.go
+++ b/api/api.go
@@ -248,6 +248,7 @@ func defaultHttpClient() *http.Client {
        transport.TLSClientConfig = &tls.Config{
                MinVersion: tls.VersionTLS12,
        }
+       transport.ForceAttemptHTTP2 = false

        return httpClient
 }

As a matter of convenience, I have built binaries as CI artifacts:

I'm also going to test having Nomad behind AWS ALB as well. Thanks for the pointers.

@evandam
Copy link

evandam commented Jul 27, 2021

@notnoop this works perfectly with the macOS binary you built 😄

@notnoop
Copy link
Contributor

notnoop commented Jul 27, 2021

That's excellent to hear! I'll research the cause of http/2 being problematic here - and we will have a release with a fix very soon! Thank you so much for the quick validation!

@scalp42
Copy link
Contributor Author

scalp42 commented Jul 27, 2021

thanks a lot @notnoop 💛

@notnoop
Copy link
Contributor

notnoop commented Jul 28, 2021

A quick update. I have reproduced the failure with ALB as well. Found out that the websocket libraries don't support http/2: gorilla/websocket#417 , and it doesn't seem to fallback to http/1 gracefully . I'm inclined to rollback to defaulting to http/1 after all. Thanks for your patience.

@surajthakur
Copy link

Thanks @notnoop for the update.
Linking my comments from here: #10657
I did find its not the ACL issue. In my two projects in two different cloud providers having similar setup (using HAProxy as load balancer), one worked with changes in ACL allowing nodeID permission, but the other didn't work, could not really get the causing issue.

notnoop pushed a commit that referenced this issue Jul 28, 2021
* api: revert to defaulting to http/1

PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.

Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.

Fixes #10922
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants