Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of allocrunner: prevent panic on network manager into release/1.4.x #16926

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #16921 to be assessed for backporting due to the inclusion of the label backport/1.4.x.

The below text is copied from the body of the original PR.


Check the task group network length before trying to access the first element.

I haven't been able to reproduce the problem but the fix seems clear enough.

Closes #16863

angrycub and others added 30 commits February 10, 2023 15:33
The plugin loader loads task and device driver plugins which are not
used on server nodes.
* Add information about templating using `env` function to refer to environment variables.
New IDE from jetbrains gets its own config directory.
Made a breaking change in go-set (String() signature), need to update
both these dependencies together and also fix a thing in structs.go
Bumps [json5](https://github.com/json5/json5) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/json5/json5/releases)
- [Changelog](https://github.com/json5/json5/blob/main/CHANGELOG.md)
- [Commits](json5/json5@v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: json5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
#15470)

Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](SamVerschueren/decode-uri-component@v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/docker/cli](https://github.com/docker/cli) from 20.10.23+incompatible to 23.0.1+incompatible.
- [Release notes](https://github.com/docker/cli/releases)
- [Commits](docker/cli@v20.10.23...v23.0.1)

---
updated-dependencies:
- dependency-name: github.com/docker/cli
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.52.0 to 1.53.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.52.0...v1.53.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/containernetworking/plugins](https://github.com/containernetworking/plugins) from 1.1.1 to 1.2.0.
- [Release notes](https://github.com/containernetworking/plugins/releases)
- [Commits](containernetworking/plugins@v1.1.1...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/containernetworking/plugins
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
#16059)

Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.22.12 to 3.23.1.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](shirou/gopsutil@v3.22.12...v3.23.1)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…func `nodeCSIVolumeNames` (#16138)

* Fix  unbold header and remove unused var in  func
Signed-off-by: dttung2905 <[email protected]>

* Add CHANGELOG file
Signed-off-by: dttung2905 <[email protected]>

* Apply suggestions from review <Charlie Voiselle>

---------

Signed-off-by: dttung2905 <[email protected]>
Co-authored-by: Charlie Voiselle <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
This PR fixes the CNI plugin fingerprinter to take into account the fact
that the cni_path config can be a multi-path (e.g. `/foo:/bar:/baz`).

Accumulate plugins from each of the possible path elements. If scanning
any of the named directory fails, the fingerprinter fails.

Fixes #16083

No CL/BP - has not shipped yet.
* Warn when Items key isn't directly accessible

Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.

- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number

---------
Co-authored-by: Charlie Voiselle <[email protected]>
Co-authored-by: Luiz Aoqui <[email protected]>
…16151)

* artifact: protect against unbounded artifact decompression

Starting with 1.5.0, set defaut values for artifact decompression limits.

artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation

artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.

* artifact: assert limits cannot be nil in validation
The panic bug for upgrades with older servers that shipped in 1.4.0 was fixed in
1.4.1, which makes the versions described in the warning in the upgrade guide
misleading. Clarify the upgrade guide.
* docs: remove cores/memory beta label, update driver cpu docs

* docs: fixup cr stuff
This PR wraps the cgroups.IsCgroup2UnifiedMode() helper method from
runc in a defer/recover block because it might panic in some cases.

Upstream fix in: opencontainers/runc#3745

Closes #16179
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.

The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.

This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
….18 (#16198)

Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.12 to 1.6.18.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](containerd/containerd@v1.6.12...v1.6.18)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The RPC TLS enforcement test was frequently failing with broken connections. The
most likely cause was that the tests started to run before the server had
started its RPC server. Wait until it self-elects to ensure that the RPC server
is up. This seems to have corrected the error; I ran this 3 times without a
failure (even accounting for `gotestsum` retries).

Also, fix a minor test bug that didn't impact the test but showed an incorrect
usage for `Status.Ping.`
* api: return error on parse failure

* docs: clarify anonymous policy with task api
shoenig and others added 19 commits April 12, 2023 14:13
* [no ci] deps: update docker to 23.0.3

This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby)
up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to
reference the package in its new home, which is docker/docker/libnetwork (it is
no longer an independent repository). Some minor nearby test case cleanup as well.

* add cl
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.

This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
* docs: add node meta command docs

Fixes #16758

* it helps if you actually add the files to git

* fix typos and examples vs usage
If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`,
the node drain is marked as complete prematurely, even though drain monitoring
will continue to report allocation migrations. This impacts the UI or API
clients that monitor node draining to shut down nodes.

This changeset updates the behavior to wait until the client status of all
drained allocs are terminal before marking the node as done draining.
)

* Remove the newline after .hbs copyright headers

* Trying with the whitespace control char
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
Examples in the documentation frequently include tokens, including Vault tokens
which end up triggering GitHub's secret scanner. Remove these from consideration
so that we don't get false positive reports.
This PR eliminates code specific to looking up and caching the uid/gid/user.User
object associated with the nobody user in an init block. This code existed before
adding the generic users cache and was meant to optimize the one search path we
knew would happen often. Now that we have the cache, seems reasonable to eliminate
this init block and use the cache instead like for any other user.

Also fixes a constraint on the podman (and other) drivers, where building without
CGO became problematic on some OS like Fedora IoT where the nobody user cannot
be found with the pure-Go standard library.

Fixes github.com/hashicorp/nomad-driver-podman/issues/228
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <[email protected]>
* Honor value for distinct_hosts constraint
* Add test for feasibility checking for `false`
---------
Co-authored-by: Michael Schurter <[email protected]>
Remove unneeded service injection. This service is not being used in
this controller and currently only exists in `main`, causing
`release/1.5.x` to break.
@hc-github-team-nomad-core hc-github-team-nomad-core requested a review from a team April 18, 2023 20:39
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/fix-network-manager/widely-smiling-narwhal branch from 721eb8c to 446348b Compare April 18, 2023 20:39
@hc-github-team-nomad-core hc-github-team-nomad-core requested a review from a team as a code owner April 18, 2023 20:39
@hc-github-team-nomad-core hc-github-team-nomad-core requested review from sarahethompson and emilymianeil and removed request for a team April 18, 2023 20:39
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit f832aad into release/1.4.x Apr 18, 2023
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/fix-network-manager/widely-smiling-narwhal branch from 33b86b4 to 2f35300 Compare April 18, 2023 20:39
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/fix-network-manager/widely-smiling-narwhal branch April 18, 2023 20:39
@hashicorp-cla
Copy link

hashicorp-cla commented Apr 18, 2023

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


14 out of 15 committers have signed the CLA.

  • hc-github-team-nomad-core
  • hashicorp-copywrite[bot]
  • Juanadelacuesta
  • the-nando
  • jrasell
  • shoenig
  • tgross
  • angrycub
  • gulducat
  • pkazmierczak
  • schmichael
  • philrenaud
  • IamTheFij
  • lgfa29
  • NOBLES5E

Have you signed the CLA already but the status is still pending? Recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.