Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display OS name in nomad node status command. #12388

Merged
merged 2 commits into from
Mar 28, 2022

Conversation

shishir-a412ed
Copy link
Contributor

@shishir-a412ed shishir-a412ed commented Mar 25, 2022

I would really like the ability to display OS name when listing the nodes in my cluster using nomad node status command 🙂
In heterogeneous clusters, we have machines running on different OS e.g. ubuntu, centos, RHEL and having that information available in CLI output makes it easy to awk and grep.

Keeping this as a separate flag for the following reasons:

  • Maintaining backward compatibility and not break nomad users who have scripts built on top of existing output.
  • Ability to opt-in, if you need this info rather than forced into it. Since this requires a call to Node Info which could get computationally expensive if there are a lot of nodes.

Sample output:

$ nomad node status
ID        DC   Name   Class   Drain  Eligibility  Status
f73e3993  dc1  linux  <none>  false  eligible     ready
$ nomad node status -os
ID        DC   Name   Class   OS      Drain  Eligibility  Status
f73e3993  dc1  linux  <none>  ubuntu  false  eligible     ready
$ nomad node status -verbose
ID                                    DC   Name   Class   Address    Version    Drain  Eligibility  Status
f73e3993-4979-cb60-0bf9-e07a405e38b0  dc1  linux  <none>  127.0.0.1  1.2.6-dev  false  eligible     ready

Let me know if there are any man pages, UI docs that needs to be added.

Signed-off-by: Shishir Mahajan [email protected]

@mikenomitch
Copy link
Contributor

This seems like a nice addition to me.

Could you add this to the Status Options in website/content/docs/commands/node/status.mdx?

Signed-off-by: Shishir Mahajan <[email protected]>
@shishir-a412ed
Copy link
Contributor Author

This seems like a nice addition to me.

Could you add this to the Status Options in website/content/docs/commands/node/status.mdx?

Fixed.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @shishir-a412ed!

Example output for Linux:

$ nomad node status
ID        DC   Name           Class   Drain  Eligibility  Status
708b459c  dc1  nomad-client0  <none>  false  eligible     ready
$ nomad node status -os
ID        DC   Name           Class   OS      Drain  Eligibility  Status
708b459c  dc1  nomad-client0  <none>  ubuntu  false  eligible     ready

For macOS:

$ nomad node status
ID        DC   Name                   Class   Drain  Eligibility  Status
4963ada5  dc1  timgross-C02XW9DSJHD2  <none>  false  eligible     ready
$ nomad node status -os
ID        DC   Name                   Class   OS      Drain  Eligibility  Status
4963ada5  dc1  timgross-C02XW9DSJHD2  <none>  darwin  false  eligible     ready

And against a custom node where I intentionally broke the attribute, to demonstrate it handles this case gracefully:

$ nomad node status -os
darwin
ID        DC   Name                   Class   OS      Drain  Eligibility  Status
5f21316d  dc1  timgross-C02XW9DSJHD2  <none>  <none>  false  eligible     ready

@tgross tgross merged commit eea1b1f into hashicorp:main Mar 28, 2022
@tgross tgross added this to the 1.3.0 milestone Mar 28, 2022
@tgross tgross mentioned this pull request Mar 28, 2022
@shishir-a412ed
Copy link
Contributor Author

@tgross @mikenomitch

@chuckyz made a nice suggestion about including os in -json, and we can do:

nomad node status -json | jq '.[]|select(.os=="ubuntu")'

I am digging through the code, but let me know if you guys have suggestions/concerns?

@tgross
Copy link
Member

tgross commented Mar 28, 2022

Hi @shishir-a412ed I think what makes that tricky is that (as you've no doubt noticed in your PR here), the API output for /v1/nodes doesn't include the node attributes the way that /v1/node/:node_id does. The Attributes field in the /v1/node/:node_id can be quite long, but I don't know that we want to have one-off attributes added either. Going to ping @schmichael and @DerekStrickland for their thoughts on the shape of the API there.

@DerekStrickland
Copy link
Contributor

@tgross My first thought is that maybe we could have a repeatable or CSV flag of -attribute(s) that includes arbitrary node attributes dynamically in the output. I think functionally that would work and lead to less special case handling. IIRC I've seen both approaches already in the API. Not certain which approach the CLI takes but multiple -attribute flags feels cleaner though more verbose.

@chuckyz
Copy link
Contributor

chuckyz commented Mar 28, 2022

I'll chime in with my 2c. There's a few things like evals in the CLI where -json more or less equals "just give me heckin' everything."

That's why I feel that just wildly adding an Os field into the json output is probably sensible as a default. I really like @DerekStrickland 's suggestion too though!

@shishir-a412ed shishir-a412ed deleted the node_status_os branch March 28, 2022 18:58
@roberteckert
Copy link

While talking w/ @shishir-a412ed about this change a few days ago, we discussed the possibility of adding a flag of this very nature, but ultimately decided to keep the change more limited in scope. For my additional two more cents, I also think it's a good idea.

@shishir-a412ed
Copy link
Contributor Author

@tgross My first thought is that maybe we could have a repeatable or CSV flag of -attribute(s) that includes arbitrary node attributes dynamically in the output. I think functionally that would work and lead to less special case handling. IIRC I've seen both approaches already in the API. Not certain which approach the CLI takes but multiple -attribute flags feels cleaner though more verbose.

I have a little contrarian view to this, and I feel having a generic flag for something like -attribute or -metadata can open a Pandora's box and users can easily shoot themselves in the foot by overcrowding the output. Another thing I have seen in docker (moby/moby project) is something like --storage-opts became a dumping ground for adding all kind of storage related options. Having said that, It does give operator flexibility as long as they know exactly what they are doing!

I ll leave this open if someone is really interested in picking this up. Regarding a CSV flag vs multiple -attribute flags, I am personally in favor of multiple -attribute flags (in spite of the fact they are verbose). E.g if you see urfave/cli golang CLI library, they support multiple flags. IIRC, you guys use an internal CLI library that was written by Mitchell, so not sure what options it supports.

@tgross
Copy link
Member

tgross commented Mar 30, 2022

I think @shishir-a412ed is right to worry that we effectively end up trying to reimplement graphQL or similar the more customizable we try to make this output. I think it's worth noting that there's a distinction between the output of nomad node status and nomad node status :node_id. The "detail view" API has all the fields already.

@shishir-a412ed
Copy link
Contributor Author

shishir-a412ed commented Mar 30, 2022

@tgross @DerekStrickland Will you guys be open to -q/--quiet flag to nomad node status command. Docker uses this in a lot of places and it comes pretty handy! e.g.

  • docker ps -q - Just list the container IDs
  • docker images -q - Just list the image IDs
  • docker volume ls -q - Just list the volume IDs

Then I can do e.g. docker inspect $(docker ps -q) to iterate over and inspect the running containers.

The general idea behind -q (the quiet flag) is: I want to list only the IDs so I can iterate over them and do something. E.g. in nomad scenario, I want to list the IDs using nomad node status -q so that I can then iterate over them and call nomad node status :node_id to get the full node object.

Let me know your thoughts?

@tgross
Copy link
Member

tgross commented Mar 31, 2022

I kind of like that idea. We'd want it only for the various "list" commands, and it'd be incompatible with flags like -verbose or -json, so we'd need to assert that at the time we parse args.

@DerekStrickland
Copy link
Contributor

I like it too. I was literally wishing for this yesterday with some allocation testing I was doing.

@shishir-a412ed
Copy link
Contributor Author

@tgross @DerekStrickland Opened a PR here: #12426

@cnwobi
Copy link

cnwobi commented Apr 2, 2022

👋🏿 I want to start contributing to Nomad, but I can't seem to find the resources to start. Is there a Slack channel? Is there a need for more contributors?

@DerekStrickland
Copy link
Contributor

Hi @cnwobi! Contributions are always welcome! Here is the in repo contributing readme, and here is a link to a forum thread with some extra details about how to get a Vagrant cluster setup and running.

We don't have a public Slack channel, but we are always watching issues closely. I'd recommend you look at the backlog for existing issues that aren't assigned and find one that is relevant to what you are doing with Nomad so that you can work on an item that you already have some context around. If you see an issue you want to work on, add a comment stating that and someone should reply and let you know if the issue is something we think would be good for Nomad and whether we are already planning to work on it internally. That way you end up doing working on something more likely to get merged and you don't duplicate effort.

Thank you for your interest in contributing! We're excited to see what you come up with.

@shishir-a412ed
Copy link
Contributor Author

@tgross @DerekStrickland I was just testing this with our main production cluster, and looks like on large clusters (~7000 nodes) nomad node status -os could take ~35 mins to run, which will be just unacceptable to the user!

On average, each API call takes ~300 milliseconds:

Calling Nodes.Info()
0.339178478
Calling Nodes.Info()
0.271110939
Calling Nodes.Info()
0.282213931

We are looking at (300 ms * 7000 no of nodes) / 1000 (seconds) / 60 (minutes) = 35 minutes.

There are two ways we can approach this:

  1. Process this in batches using goroutines and sync.WaitGroup to reduce total computation time.
  2. Add os to NodeListStub struct so Nodes.List() just return the os info in one single call.

I think (2) is more efficient. Let me know what you guys think?

@tgross
Copy link
Member

tgross commented Apr 5, 2022

I agree (2) is the way to go here. But I'm hesitant to start adding random attributes to the NodeListStub. Maybe we have a map of "important" attributes there?

@shishir-a412ed
Copy link
Contributor Author

shishir-a412ed commented Apr 5, 2022

@tgross I was just suggesting os. I still feel os is a key attribute, and worth exposing. We are already using the Attributes map to fetch Version when building NodeListStub so looks like we already have access to Attributes

@shishir-a412ed
Copy link
Contributor Author

shishir-a412ed commented Apr 5, 2022

So something like this will work.

$ nomad node status -os
ID        DC   Name   Class   OS      Drain  Eligibility  Status
2c642a5c  dc1  linux  <none>  ubuntu  false  eligible     ready

@tgross
Copy link
Member

tgross commented Apr 6, 2022

I was just suggesting os. I still feel os is a key attribute, and worth exposing.

I don't disagree, I'm more looking for long-term maintenance concerns here. Once we've added one field from the attributes it's easier to justify adding more in the future and then we end up with a mishmash of some attributes in an "attributes" map and some attributes in fields. So adding a map of "select attributes" lets us manage future sprawl in the API.

@shishir-a412ed
Copy link
Contributor Author

@tgross I tried to create a more generic version here. If you can take a look, when you get a chance.

Though the map is getting set on the server side:

    2022-04-06T20:27:46.793Z [INFO]  client: node registration complete
map[os.name:ubuntu]

It's not being available on client side.

$ nomad node status -os
HELLO HELLO node attributes.
&{Address:127.0.0.1 ID:492826c1-35f2-f6fd-0ecb-754dd1b80225 Attributes:map[] Datacenter:dc1 Name:linux NodeClass: Version:1.2.6-dev Drain:false SchedulingEligibility:eligible Status:ready StatusDescription: Drivers:map[docker:0xc0003cbd00 exec:0xc0003cbd40 java:0xc0003cbdc0 mock_driver:0xc0003cbd80 qemu:0xc0003cbcc0 raw_exec:0xc0003cbc80] NodeResources:<nil> ReservedResources:<nil> LastDrain:<nil> CreateIndex:7 ModifyIndex:9}

ID        DC   Name   Class   OS      Drain  Eligibility  Status
492826c1  dc1  linux  <none>  <none>  false  eligible     ready

I probably missed something trivial. Is the map being local map a problem here? I was hoping that nomad will take care of serialization/de-serialization when passing data between client and server. If there is a better place to initialize the map to make it work, let me know.

@tgross
Copy link
Member

tgross commented Apr 7, 2022

I was hoping that nomad will take care of serialization/de-serialization when passing data between client and server.

Hi @shishir-a412ed The structs in nomad/structs get serialized to message pack and are passed around in agent-to-agent RPCs, and then whichever agent your API client is talking to converts that into the api/structs package and serializes that into JSON. (There are a few cases where this conversion between the two is "manual", like in command/agent/job_endpoint.go, and we're planning on making those automatic when we have a minute.)

But in that patch you're creating the &NodeStub adding the map, and the rebinding the s var to entirely new &NodeStub, which throws away the you made before. So that's not going to make it down to the API. Otherwise that more-or-less looks right. Probably a good idea to open a draft PR (feel free to ping me and @DerekStrickland on it) once you've got it smoke-tested.

@shishir-a412ed
Copy link
Contributor Author

@tgross Thanks! I have opened a PR here: #12497.
Added you and @DerekStrickland for review. Let me know if there are any docs that needs to be updated.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

7 participants