-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot change ingress container from http to tcp (or vice versa) when using Consul Service Mesh #14802
Comments
Hi @brian-athinkingape! I was able to reproduce the behavior you're seeing exactly. Thank you so much for providing a solid minimal example, it really helps a lot! The tl;dr is that you've hit a known design issue between Consul and Nomad around gateways, which is described by my colleague @shoenig in #8647 (comment) There's a workaround roughly described in hashicorp/consul#10308 (comment). I'm going to show that workaround first and then get into the nitty-gritty of why this is happening below. WorkaroundRead the current
Transform this into: {
"Kind": "ingress-gateway",
"Name": "test-ingress",
"TLS": {
"Enabled": false
}
} Write the new config and delete the
Now the second job works:
ReproductionRunning job2 hits the error you reported:
A clue to what's going on is that job2 isn't registered at all, which means that it's happening in the initial job submission and not part of allocation setup after we've scheduled the workload. That narrows down the behavior to this block I was a little confused by why we'd be doing this in the job register code path at all and not on the client node after an allocation is placed, but then I did some digging and found this comment #8647 (comment) from my colleague @shoenig which discusses the "multi-writer" problem we have. Ultimately Consul owns the configuration entry and it's global, so multiple Nomad clusters could be writing to it. One way to imagine the problem is to consider what would happen if you ran both job1 and job2 at the same time! We wouldn't have any way of updating Consul correctly in this case. So ultimately this issue is a duplicate of #8647 and something we need to fix, which I realize isn't very satisfying in the short term. A challenging part of figuring out what to do as an operator is that the Consul CLI and UI isn't super clear on the data you need. The ingress gateway isn't exposed in the |
Although this is technically a duplicate there could be unique bits to it. I'm going to keep this open and mark it for roadmapping, and crosslink to it from #8647. |
Thanks, we used the workaround to resolve the issue on our production system for now, looking forward to when this can be resolved! |
Nomad version
Nomad v1.3.5 (1359c25)
Operating system and Environment details
Ubuntu 22.04 on AWS (on a fresh EC2 instance), amd64
Consul v1.13.2
Revision 0e046bbb
Build Date 2022-09-20T20:30:07Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.18, build b40c2f6
Issue
If I run an ingress container with the
http
protocol, I'm unable to edit it to usetcp
even after I stop the job. Even if I runnomad system gc
andnomad system reconcile summaries
, it still doesn't work. I'm also unable to edit the consul config to useIf I swap all instances of
http
andtcp
I get the same errors.Reproduction steps
Expected Result
I should be able to run job2 as normal.
Actual Result
Job file (if appropriate)
proxy-defaults.hcl
service-defaults.hcl
job1.nomad:
job2.nomad:
The text was updated successfully, but these errors were encountered: