Backport of ui: handle node pool requests to older regions into release/1.6.x #18024

hc-github-team-nomad-core · 2023-07-21T13:17:07Z

Backport

This PR is auto-generated from #18021 to be assessed for backporting due to the inclusion of the label backport/1.6.x.

The below text is copied from the body of the original PR.

When accessing a region running a version of Nomad without node pools an error was thrown because the request is handled by the nodes endpoint which fails because it assumes pools is the node ID.

Fixes #18007

Document and test that if a namespace does not provide an `allow` or `deny` list than those are treated as `nil` and have a different behaviour from an empty list (`[]string{}`).

Cannot set a user for raw_exec tasks, because doing so does not work with the 0700 root owned client data directory that we setup in the e2e cluster in accordance with the Nomad hardening guide.

* docs: add plugin docs for pledge task driver Add pledge driver to the set of Community drivers. * docs: cr feedback

Post 1.6.0 rc.1 release

This adds a quick smoke test of our binaries to verify we haven't exceeeded the maximum GLIBC (2.17) version during linking which would break our ability to execute on EL7 machines.

* e2e: setup nomad for pledge driver * e2e: add some e2e tests for pledge task driver

)

Support for UDS sockets was added to Windows 10.

An ACL policy with a block without label generates unexpected results. For example, a policy such as this: ``` namespace { policy = "read" } ``` Is applied to a namespace called `policy` instead of the documented behaviour of applying it to the `default` namespace. This happens because of the way HCL1 decodes blocks. Since it doesn't know if a block is expected to have a label it applies the `key` tag to the content of the block and, in the example above, the first key is `policy`, so it sets that as the `namespace` block label. Since this happens internally in the HCL decoder it's not possible to detect the problem externally. Fixing the problem inside the decoder is challenging because the JSON and HCL parsers generate different ASTs that makes impossible to differentiate between a JSON tree from an invalid HCL tree within the decoder. The fix in this commit consists of manually parsing the policy after decoding to clear labels that were not set in the file. This allows the validation rules to consistently catch and return any errors, no matter if the policy is an invalid HCL or JSON.

ACL permissions for the search endpoints are done in three passes. The first (the `sufficientSearchPerms` method) is for performance and coarsely rejects requests based on the passed-in context parameter if the user has no permissions to any object in that context. The second (the `filteredSearchContexts` method) filters out contexts based on whether the user has permissions either to the requested namespace or again by context (to catch the "all" context). Finally, when iterating over the objects available, we do the usual filtering in the iterator. Internal testing found several bugs in this filtering: * CSI plugins can be searched by any authenticated user. * Variables can be searched if the user has `job:read` permissions to the variable's namespace instead of `variable:list`. * Variables cannot be searched by wildcard namespace. This is an information leak of the plugin names and variable paths, which we don't consider to be privileged information but intended to protect anyways. This changeset fixes these bugs by ensuring CSI plugins are filtered in the 1st and 2nd pass ACL filters, and changes variables to check `variable:list` in the 2nd pass filter unless the wildcard namespace is passed (at which point we'll fallback to filtering in the iterator). Fixes: CVE-2023-3300 Fixes: #17906

Post 1.6.0 release

Before this commit, it was only used for fingerprinting, but not for CPU stats on nodes or tasks. This meant that if the auto-detection failed, setting the cpu_total_compute didn't resolved the issue. This issue was most noticeable on ARM64, as there auto-detection always failed.

* volume-status : show namespace the volume belongs to

"Ineligible" and "Draining" are not determined by the node status, but are rather inferred from other fields.

Apply the same logic as Consul service health checks when building the HTTP URL so that query params in `path` are preserved.

Add new `nomad.client.allocs.memory.max_allocated` metric to report the value of the task `memory_max` resource value.

* e2e: add tests for using private registry with podman driver This PR adds e2e tests that stands up a private docker registry and has a podman tasks run a container from an image in that private registry. Tests - user:password set in task config - auth_soft_fail works for public images when auth is set in driver - credentials helper is set in driver auth config - config auth.json file is set in driver auth config * packer: use nomad-driver-podman v0.5.0 * e2e: eliminate unnecessary chmod Co-authored-by: Daniel Bennett <[email protected]> * cr: no need to install nomad twice * cl: no need to install docker twice --------- Co-authored-by: Daniel Bennett <[email protected]>

… job and back to the index (#17915) * Boot the user off the job if it gets deleted * de-yoink * watching the job watcher * Unload record so history.back has to refire a (failing) request * Acceptance tests for boot-out and notification

)

The CSI specification says that we "SHOULD" send no more than one in-flight request per *volume* at a time, with an allowance for losing state (ex. leadership transitions) which the plugins "SHOULD" handle gracefully. We mostly successfully serialize node and controller RPCs for the same volume, except when Nomad clients are lost. (See also container-storage-interface/spec#512) These concurrency requirements in the spec fall short because Storage Provider APIs aren't necessarily safe to call concurrently on the same host even for _different_ volumes. For example, concurrently attaching AWS EBS volumes to an EC2 instance results in a race for device names, which results in failure to attach (because the device name is taken already and the API call fails) and confused results when releasing claims. So in practice many CSI plugins rely on k8s-specific sidecars for serializing storage provider API calls globally. As a result, we have to be much more conservative about concurrency in Nomad than the spec allows. This changeset includes four major changes to fix this: * Add a serializer method to the CSI volume RPC handler. When the RPC handler makes a destructive CSI Controller RPC, we send the RPC thru this serializer and only one RPC is sent at a time. Any other RPCs in flight will block. * Ensure that requests go to the same controller plugin instance whenever possible by sorting by lowest client ID out of the plugin instances. * Ensure that requests go to _healthy_ plugin instances only. * Ensure that requests for controllers can go to a controller on any _live_ node, not just ones eligible for scheduling (which CSI controllers don't care about) Fixes: #15415

When accessing a region running a version of Nomad without node pools an error was thrown because the request is handled by the nodes endpoint which fails because it assumes `pools` is the node ID.

…mortal-muskrat

github-actions · 2023-07-21T13:33:53Z

Ember Test Audit comparison

	release/1.6.x	`d097b51`	change
passes	1498	1498	0
failures	1	1	0
flaky	0	0	0
duration	000ms	000ms	-000ms

lgfa29 and others added 30 commits July 11, 2023 10:59

np: update docs and add test for nil lists (#17899)

99fb36e

Document and test that if a namespace does not provide an `allow` or `deny` list than those are treated as `nil` and have a different behaviour from an empty list (`[]string{}`).

website: ignore user .env.* files (#17898)

01cb47b

docs: clarify update stagger description and alternatives (#17896)

ef89b69

docs: clarify using user on raw_exec driver (#17897)

80b9ff6

docs: update podman driver docs with v0.5.0 changes (#17824)

a4d0dcd

e2e: do not set a user for raw_exec tasks (#17901)

fd50f2b

Cannot set a user for raw_exec tasks, because doing so does not work with the 0700 root owned client data directory that we setup in the e2e cluster in accordance with the Nomad hardening guide.

docs: add plugin docs for pledge task driver (#17823)

8253ec8

* docs: add plugin docs for pledge task driver Add pledge driver to the set of Community drivers. * docs: cr feedback

changelog: fix link to unsupported changelog file. (#17913)

3cfa267

Prepare release 1.6.0-rc.1

3656de6

Generate files for 1.6.0-rc.1 release

335bb8b

Prepare for next release

09c89e7

Merge pull request #17914 from hashicorp/post-1.6.0-rc.1-release

3c481d3

Post 1.6.0 rc.1 release

Smoke test binaries for EL7 compatiblity (#17706)

995b936

This adds a quick smoke test of our binaries to verify we haven't exceeeded the maximum GLIBC (2.17) version during linking which would break our ability to execute on EL7 machines.

ci: add copywrite action to check file headers. (#17889)

74335b3

e2e: add some e2e tests for pledge task driver (#17909)

159bf51

* e2e: setup nomad for pledge driver * e2e: add some e2e tests for pledge task driver

Tells the token to be 2 seconds faster in a 5.5 second test (#17924)

4379418

fix allowed metric values documentation for the Nomad APM plugin (#17928

e53955b

)

docs: fix QEMU driver HCL plugin options example formatting. (#17930)

7f5d39f

docs: note windows requirement for workload identity (#17950)

1e7726c

Support for UDS sockets was added to Windows 10.

Fix typos (#17962)

3a68793

copywrite: add placeholder for OSS/ENT ignore split. (#17965)

3abb112

test: enable exec test previously disabled due to CircleCI (#17975)

81c14de

qemu: fix log lines to use correct QEMU capitalization. (#17961)

81aa274

qemu: add test to cover task store functions. (#17967)

0015d25

redact token before passing to sentinel

d23aaed

Generate files for 1.6.0 release

573cab2

Prepare for next release

bc8b4bd

Merge release 1.6.0 files

47fb70b

lgfa29 and others added 18 commits July 19, 2023 10:55

Merge pull request #17986 from hashicorp/post-1.6.0-release

a04245d

Post 1.6.0 release

docs: updating to specify mTLS rpc endpoints (#17963)

0d343f2

changelog: add entry for #17731 into 1.4.11 (#17994)

bd3ef90

volume-status : show namespace the volume belongs to (#17911)

ca26673

* volume-status : show namespace the volume belongs to

ui: fix Topology node state filter (#17940)

969ea54

"Ineligible" and "Draining" are not determined by the node status, but are rather inferred from other fields.

nsd: retain query params in HTTP health checks (#17936)

e664f14

Apply the same logic as Consul service health checks when building the HTTP URL so that query params in `path` are preserved.

metrics: report task memory_max value (#17938)

ce0f60f

Add new `nomad.client.allocs.memory.max_allocated` metric to report the value of the task `memory_max` resource value.

Adds N V for variable creation as a keyboard shortcut (#17932)

51393dd

Copy change to include the nomad/jobs all-access variable prefix (#17933

6bed12f

)

Copy button added to variables title (#17935)

94112d8

ui: handle node pool requests to older regions

a5d3a42

When accessing a region running a version of Nomad without node pools an error was thrown because the request is handled by the nodes endpoint which fails because it assumes `pools` is the node ID.

changelog: add entry for #18021

e4df47f

backport of commit e4df47f

87f7ea2

Merge e4df47f into backport/b-ui-node-pool-error-handling/mentally-im…

d097b51

…mortal-muskrat

hc-github-team-nomad-core force-pushed the backport/b-ui-node-pool-error-handling/mentally-immortal-muskrat branch from eee7115 to 503fdb1 Compare July 21, 2023 13:17

hc-github-team-nomad-core requested review from a team as code owners July 21, 2023 13:17

hc-github-team-nomad-core merged commit f3474a6 into release/1.6.x Jul 21, 2023

hc-github-team-nomad-core force-pushed the backport/b-ui-node-pool-error-handling/mentally-immortal-muskrat branch 2 times, most recently from d097b51 to 9a0f8d4 Compare July 21, 2023 13:17

vercel bot temporarily deployed to Preview – nomad-storybook-and-ui July 21, 2023 13:22 Inactive

vercel bot temporarily deployed to Preview – nomad July 21, 2023 13:26 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of ui: handle node pool requests to older regions into release/1.6.x #18024

Backport of ui: handle node pool requests to older regions into release/1.6.x #18024

hc-github-team-nomad-core commented Jul 21, 2023

github-actions bot commented Jul 21, 2023 •

edited

Loading

Backport of ui: handle node pool requests to older regions into release/1.6.x #18024

Backport of ui: handle node pool requests to older regions into release/1.6.x #18024

Conversation

hc-github-team-nomad-core commented Jul 21, 2023

Backport

github-actions bot commented Jul 21, 2023 • edited Loading

Ember Test Audit comparison

github-actions bot commented Jul 21, 2023 •

edited

Loading