-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with nomad job deployment (raw_exec mode, v1.0.1) #9700
Comments
This information should be in the Nomad server logs. Can you maybe explain why you weren't able to get any usable logs? You should be able to fetch them via |
I'm going to close this issue as it's been a while since we heard from you. But if you do get those logs, please feel free to re-open! |
Hi Tim, Sorry for the late response. The issue is still present. Perhaps I am missing some deployment "flag" using SDK API or something like that?
I checked the log. Nothing (empty) How do I debug this situation. Any help is appreciated! Thank you, Added UI screenshot and job definition: Jon definition (from the nomad's point of view) is as follows:
|
Are you redirecting the logs somewhere then? Can you share your systemd unit file and server/client config? |
Hi Tim, I didn't mean "there is no log at all", sorry for that (my mistake). The configuration (server, client)
Client (agent -client)file: /etc/nomad-client/nomad-client.hcl
systemd: /usr/lib/systemd/system/nomad-client.service
EnvironmentFile=/etc/sysconfig/nomad-client Every node different "dc". node1:
node2:
node3:
Nomad Server (agent -server)file: /etc/nomad-server/nomad-server.hcl
Clean start (data/alloc deleted) nomad-client log (3 node cluster the same output..):
Microservice deployment:admin:4646/ui/jobs
nomad-client log (after deployment)
Monitor (on the node1)http://localhost:4646/ui/clients/183f311e-5494-d894-f495-9dabafded5be/monitor?level=trace Please check it here: https://pastebin.com/xRCdEphS PCMMManager job definition from adminuiPlease check it here: https://pastebin.com/Kq0zXAwi From Wireshark (request/response)REQUEST
Check the body: https://pastebin.com/Fx9L9eyD RESPONSE
Thank you for your time and effort. Regards, |
The logs I was looking for were the server logs. You've posted one of those in the pastebin, the relevant section of which is here: server log
I don't see the plan created by the evaluation here. Also, HCL is a lot nicer way to read the jobspec, but something that jumps out at me from the JSON is that you've got |
Please keep in mind everything works as expected in version
Data center option is something I added yesterday (this behavior was detected before that). Usually when some constraint prevents a deployment it is visible and straightforward via 'nomad status jobname'
Response from the server:
There is no evaluation as I can see:
In my opinion the most important is to get some usable information from the backend. Thank you for your time and effort. Regards, |
It would help if you provided the jobspec that you initially had problems with, rather than one where you can't isolate the problem to the upgrade.
That's from an eval you just ran, and not one that's been GC'd? If so, that's strange... do you have logs from the rest of the servers? The logs you have here are only for one of the servers, and it doesn't look like it's the one where the plan was applied (I'd expect to see the number of allocations that were placed). |
No additional logs (that's all I can get) Ok, I removed Spread Stanza completely from the Job (builder). Created new job: PCMMManagerNoSpreadStanza
Node01 - trace
Node02 - trace
Node03 - trace
Job definition
From java deployer (via nomad sdk):
I checked the "evaluationID" after a few second, immediate after deployment
Exactly, It has not been GC'd, no doubt about it. So again the same behavior and result: Status = pending Code snippet from the java client using nomad-sdk
Thank you for your time and effort! Regards, |
@ivanprostran I think you might have copied the wrong logs... all 3 nodes are showing identical logs. Also, it looks like you're running the server and client together in the same agent? It shouldn't matter for purposes of this issue, but just to be clear that's not a recommended production configuration. |
I opened it from the GUI (3 tabs 3 clients DEBUG) The setup is : 3 node cluster
Regards, |
Here process list (same output for both node2/node3..)
|
Please pull the logs from the journal. |
Clean start (alloc/data empty) node01 journalctl -u nomad-server
journalctl -u nomad-client
After job deployment (no change in the log!!)
|
@ivanprostran we need the log info from all the servers in the cluster, and we need them at the debug level so we can see what's going on. A |
Hi Tim, This is testing environment for the moment. Thank you for your time and effort. Regards, |
Problem is solved: Thank you for your time and effort. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
nomad-sdk version 0.11.3.0
Server(agent) version: Nomad v1.0.1 (c9c68aa)
Operating system and Environment details
/etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
Linux blade1.lab.bulb.hr 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Issue
Recent update from nomad v.0.9.6 to nomad v.1.01 breaks a job deployment. Unfortunately I couldn't get any usable info from nomad agent about "pending or dead" status. I also checked trace monitor from web-ui but without success.
Please could you give some advice on how to get reject/pending reason from the agent?
I use "raw_exec" driver (non-privileged user, driver.raw_exec.enable" = "1") F or deployment I use nomad-sdk (version 0.11.3.0)
Reproduction steps
Job deployment via nomad-sdk api fails 100% (pending or dead status checked from web-ui or command line)
Job file
You can find the job definition (from the nomad's point of view) here:
https://pastebin.com/ZXiaM9RW
Job status:
Nomad Client logs (if appropriate)
Nomad Server logs (if appropriate)
Unfortunately I couldn't get any usable logs.
Thank you for your effort and time.
Regards,
Ivan
The text was updated successfully, but these errors were encountered: