Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cml-runner times out when trying to check status of spawned instance #536

Closed
thatGreekGuy96 opened this issue May 19, 2021 · 12 comments
Closed
Assignees
Labels
bug Something isn't working p0-critical Max priority (ASAP)

Comments

@thatGreekGuy96
Copy link

Hey everyone!
We've been using cml for a bout a month now to deploy ec2 runners and run tests on them. We have run into a weird problem today where the deploy runners command times out because it cannot talk to the ec2 instances. I'm attaching the logs below.

As far as I can tell the runners are getting deployed and have public ipv4 addresses assigned to them. However, when I try to connect using EC2 connect i get this error:

image

Does someone know what could be going on here? Has something been updated on the cml side that we should know about? Any help would be greatly appreciated!

2_Deploy Cloud Instances (1).txt

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented May 19, 2021

systemd[1]: Starting cml service...
cml.sh[3019]: /usr/bin/cml.sh: 14: exec: cml-runner: not found
systemd[1]: cml.service: Main process exited, code=exited, status=127/n/a
systemd[1]: cml.service: Failed with result 'exit-code'.
systemd[1]: Failed to start cml service.
cloud-init[2049]: Job for cml.service failed because the control process exited with error code.

@0x2b3bfa0
Copy link
Member

👋🏼 Welcome, @thatGreekGuy96! As far as I can tell, we haven't introduced any breaking change. At least, not deliberately.

Said that, it looks like the machine image doesn't have CML available in the executable path. Which AWS region are you on?

@0x2b3bfa0 0x2b3bfa0 added bug Something isn't working awaiting-response Waiting for user feedback labels May 19, 2021
@0x2b3bfa0 0x2b3bfa0 self-assigned this May 19, 2021
@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented May 19, 2021

@0x2b3bfa0 we have removed the await inside CML depending now exclusively on TF 🤔

@thatGreekGuy96
Copy link
Author

👋🏼 Welcome, @thatGreekGuy96! As far as I can tell, we haven't introduced any breaking change. At least, not deliberately.

Said that, it looks like the machine image doesn't have CML available in the executable path. Which AWS region are you on?

We're in eu-west-2. As I said in my initial post, everything was working fine until this morning.

@thatGreekGuy96
Copy link
Author

Update: I've retried, while also adding --cloud-startup-script=$(echo 'sudo apt update && sudo apt-get install --yes ec2-instance-connect' | base64 --wrap 0) to the cml-runner command. I can now connect using EC2 instance connect but the github action still fails.

If you have indeed made updates, is there a way for me to explicitly specify which version of the cml-runner command I'm using? So that I can roll back to the one that was working for us.

@0x2b3bfa0
Copy link
Member

@0x2b3bfa0 we have removed the await inside CML depending now exclusively on TF 🤔

Nothing to do with this; please refer to the error message above.

@thatGreekGuy96
Copy link
Author

@0x2b3bfa0 here is the AMI used by the instances in case it helps

image

@DavidGOrtega DavidGOrtega added the p0-critical Max priority (ASAP) label May 19, 2021
@DavidGOrtega
Copy link
Contributor

@thatGreekGuy96 we have identified the potential issue. hopefully it will be fixed within the next hour

@0x2b3bfa0 0x2b3bfa0 removed the awaiting-response Waiting for user feedback label May 19, 2021
@DavidGOrtega
Copy link
Contributor

@thatGreekGuy96 I can confirm that the issue should be now fixed. Could you please confirm it?
Thank you for your support 🙏

@thatGreekGuy96
Copy link
Author

Yup it seems to be working now! out of curiosity, what was the problem?

@0x2b3bfa0
Copy link
Member

@thatGreekGuy96, the cloud runner initialization script was installing the @dvcorg/cml package from an unprivileged user account which didn't have enough permissions to install some native addons required by the mmmagic package — which relies on libmagic under the hood.

We fixed it with the iterative/terraform-provider-iterative@5f62a02 commit.

gyp WARN EACCES current user ("nobody") does not have permission to access the dev dir "/root/.cache/node-gyp/12.20.1"
gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp"
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack Error: EACCES: permission denied, mkdir '/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp'
gyp ERR! System Linux 5.4.0-1035-aws
gyp ERR! command "/usr/bin/node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic
gyp ERR! node -v v12.20.1
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2021-05-19T08_25_12_292Z-debug.log

@0x2b3bfa0
Copy link
Member

If you have indeed made updates, is there a way for me to explicitly specify which version of the cml-runner command I'm using? So that I can roll back to the one that was working for us.

We update CML regularly, but always preserving backwards compatibility; that's why we don't provide any mechanism for users to pin specific versions. Apart from this unfortunate incident, our releases should be pretty stable. 🤞🏼 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p0-critical Max priority (ASAP)
Projects
None yet
Development

No branches or pull requests

3 participants