Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: fix some failing long-running tests related to password requirements #9421

Merged
merged 12 commits into from
Jun 12, 2024

Conversation

jesse-amano-hpe
Copy link
Contributor

@jesse-amano-hpe jesse-amano-hpe commented May 24, 2024

Ticket

Description

Addressing CircleCI test failures on Slurm clusters, Debian package install, and e2e "local" and perf tests.

Test Plan

CI tests pass

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@jesse-amano-hpe jesse-amano-hpe requested a review from a team as a code owner May 24, 2024 07:27
@cla-bot cla-bot bot added the cla-signed label May 24, 2024
Copy link

netlify bot commented May 24, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 2f79c70
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6669f8c5c323d300083a3189
😎 Deploy Preview https://deploy-preview-9421--determined-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see znode tests are still failing too. i'm running https://app.circleci.com/pipelines/github/determined-ai/determined/56052/workflows/4c820a30-5f45-407c-b11b-115cfc9aead6 to see if it fixes it. if it does i can merge into your branch.

- run:
name: Set initial user password
command: |
echo "export INITIAL_USER_PASSWORD=${INITIAL_USER_PASSWORD}" >> "$BASH_ENV"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this do anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sets the variable in the environment of future commands run within the same job.

@@ -1932,6 +1933,11 @@ jobs:
- install-devcluster
- start-devcluster:
target-stage: db
- run: |
sudo mkdir -p /etc/systemd/system/determined-master.service.d
echo "[Service]" | sudo tee /etc/systemd/system/determined-master.service.d/password.override.conf >/dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tee -a and throwing away the duped output >/dev/null instead of than just >>

Suggested change
echo "[Service]" | sudo tee /etc/systemd/system/determined-master.service.d/password.override.conf >/dev/null
echo "[Service]" >>/etc/systemd/system/determined-master.service.d/password.override.conf

even better might be a heredoc to write the entire conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell pipes and redirects (like >>) create new processes, but there isn't a way to apply a temporary owner to them; sudo echo narf >> poit runs echo as a superuser, but opens the poit file with the current session's user, which in this case wouldn't have permission to write to /etc/systemd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail in weird ways if sudo is configured to require a password.

But more importantly, why not do the whole file in one block so there's less potential "use -a except on the first line" confusion? Either create it as a template file and sed the password into the template while outputting to the override location, or perhaps printf so you don't have nested quote problems?

printf '[Service]\nEnvironment="DET_SECURITY_INITIAL_USER_PASSWORD=%s"\n' "$INITIAL_USER_PASSWORD" \
| sudo tee /etc/systemd/system/determined-master.service.d/password.override.conf > /dev/null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, ...drat, wish I'd thought of that. I'll give that a try when I can and maybe patch it in as a separate very-smol PR

@@ -103,7 +105,9 @@ def resource_manager(
def test_cluster_down() -> None:
name = "test_cluster_down"

with resource_manager(Resource.CLUSTER, name, {}, ["no-gpu"]):
with resource_manager(
Resource.CLUSTER, name, {"initial-user-password": conf.USER_PASSWORD}, ["no-gpu"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth just adding this to

@contextlib.contextmanager
def resource_manager(
    resource: Resource,
    name: str,
    kwflags: Optional[Dict[str, str]] = None,
    boolean_flags: Optional[List[str]] = None,
    positional_arguments: Optional[List[str]] = None,
) -> Iterator[None]:
+    kwflags.update({"initial-user-password": conf.USER_PASSWORD})
    """Context manager to bring resources up and down."""
    resource_up(resource, name, kwflags, boolean_flags, positional_arguments)
    try:
        yield
    finally:
        resource_down(resource, name)

tools/slurm/scripts/slurmcluster.yaml Show resolved Hide resolved
Copy link

codecov bot commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.99%. Comparing base (e3d01c1) to head (2f79c70).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9421   +/-   ##
=======================================
  Coverage   48.98%   48.99%           
=======================================
  Files        1238     1238           
  Lines      160518   160518           
  Branches     2783     2783           
=======================================
+ Hits        78629    78638    +9     
+ Misses      81714    81705    -9     
  Partials      175      175           
Flag Coverage Δ
backend 43.87% <ø> (+0.01%) ⬆️
harness 63.85% <ø> (ø)
web 44.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 5 files with indirect coverage changes

@jesse-amano-hpe jesse-amano-hpe force-pushed the jta/fix-ci-slurm-e2e-initial-password branch from c9101b0 to 2f79c70 Compare June 12, 2024 19:36
@jesse-amano-hpe jesse-amano-hpe enabled auto-merge (squash) June 12, 2024 20:40
@jesse-amano-hpe jesse-amano-hpe merged commit f0d26db into main Jun 12, 2024
119 of 122 checks passed
@jesse-amano-hpe jesse-amano-hpe deleted the jta/fix-ci-slurm-e2e-initial-password branch June 12, 2024 20:55
Copy link
Contributor

@dannysauer dannysauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me from an infra perspective. My only note is that the variable and config refers to initial "user" password - is that setting the admin and determined user both to the same initial password? I don't remember how that works. :) As long as that's the case, I'm happy. My other in-line note is more trivial style opinion than anything functional.

@jesse-amano-hpe
Copy link
Contributor Author

This looks fine to me from an infra perspective. My only note is that the variable and config refers to initial "user" password - is that setting the admin and determined user both to the same initial password? I don't remember how that works. :) As long as that's the case, I'm happy. My other in-line note is more trivial style opinion than anything functional.

Yes, it sets the admin and determined user passwords to the same value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants