cml-runner times out when trying to check status of spawned instance #536

thatGreekGuy96 · 2021-05-19T09:50:57Z

Hey everyone!
We've been using cml for a bout a month now to deploy ec2 runners and run tests on them. We have run into a weird problem today where the deploy runners command times out because it cannot talk to the ec2 instances. I'm attaching the logs below.

As far as I can tell the runners are getting deployed and have public ipv4 addresses assigned to them. However, when I try to connect using EC2 connect i get this error:

Does someone know what could be going on here? Has something been updated on the cml side that we should know about? Any help would be greatly appreciated!

2_Deploy Cloud Instances (1).txt

The text was updated successfully, but these errors were encountered:

0x2b3bfa0 · 2021-05-19T09:55:37Z

systemd[1]: Starting cml service...
cml.sh[3019]: /usr/bin/cml.sh: 14: exec: cml-runner: not found
systemd[1]: cml.service: Main process exited, code=exited, status=127/n/a
systemd[1]: cml.service: Failed with result 'exit-code'.
systemd[1]: Failed to start cml service.
cloud-init[2049]: Job for cml.service failed because the control process exited with error code.

0x2b3bfa0 · 2021-05-19T09:59:47Z

👋🏼 Welcome, @thatGreekGuy96! As far as I can tell, we haven't introduced any breaking change. At least, not deliberately.

Said that, it looks like the machine image doesn't have CML available in the executable path. Which AWS region are you on?

DavidGOrtega · 2021-05-19T10:05:43Z

@0x2b3bfa0 we have removed the await inside CML depending now exclusively on TF 🤔

thatGreekGuy96 · 2021-05-19T10:06:11Z

👋🏼 Welcome, @thatGreekGuy96! As far as I can tell, we haven't introduced any breaking change. At least, not deliberately.

Said that, it looks like the machine image doesn't have CML available in the executable path. Which AWS region are you on?

We're in eu-west-2. As I said in my initial post, everything was working fine until this morning.

thatGreekGuy96 · 2021-05-19T10:09:49Z

Update: I've retried, while also adding --cloud-startup-script=$(echo 'sudo apt update && sudo apt-get install --yes ec2-instance-connect' | base64 --wrap 0) to the cml-runner command. I can now connect using EC2 instance connect but the github action still fails.

If you have indeed made updates, is there a way for me to explicitly specify which version of the cml-runner command I'm using? So that I can roll back to the one that was working for us.

0x2b3bfa0 · 2021-05-19T10:14:58Z

@0x2b3bfa0 we have removed the await inside CML depending now exclusively on TF 🤔

Nothing to do with this; please refer to the error message above.

thatGreekGuy96 · 2021-05-19T10:21:21Z

@0x2b3bfa0 here is the AMI used by the instances in case it helps

DavidGOrtega · 2021-05-19T11:13:51Z

@thatGreekGuy96 we have identified the potential issue. hopefully it will be fixed within the next hour

DavidGOrtega · 2021-05-19T11:39:43Z

@thatGreekGuy96 I can confirm that the issue should be now fixed. Could you please confirm it?
Thank you for your support 🙏

thatGreekGuy96 · 2021-05-19T12:48:05Z

Yup it seems to be working now! out of curiosity, what was the problem?

0x2b3bfa0 · 2021-05-19T15:45:34Z

@thatGreekGuy96, the cloud runner initialization script was installing the @dvcorg/cml package from an unprivileged user account which didn't have enough permissions to install some native addons required by the mmmagic package — which relies on libmagic under the hood.

We fixed it with the iterative/terraform-provider-iterative@5f62a02 commit.

gyp WARN EACCES current user ("nobody") does not have permission to access the dev dir "/root/.cache/node-gyp/12.20.1"
gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp"
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack Error: EACCES: permission denied, mkdir '/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp'
gyp ERR! System Linux 5.4.0-1035-aws
gyp ERR! command "/usr/bin/node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic
gyp ERR! node -v v12.20.1
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2021-05-19T08_25_12_292Z-debug.log

0x2b3bfa0 · 2021-05-19T15:48:06Z

If you have indeed made updates, is there a way for me to explicitly specify which version of the cml-runner command I'm using? So that I can roll back to the one that was working for us.

We update CML regularly, but always preserving backwards compatibility; that's why we don't provide any mechanism for users to pin specific versions. Apart from this unfortunate incident, our releases should be pretty stable. 🤞🏼 😅

0x2b3bfa0 added bug Something isn't working awaiting-response Waiting for user feedback labels May 19, 2021

0x2b3bfa0 self-assigned this May 19, 2021

DavidGOrtega added the p0-critical Max priority (ASAP) label May 19, 2021

0x2b3bfa0 assigned DavidGOrtega May 19, 2021

0x2b3bfa0 removed the awaiting-response Waiting for user feedback label May 19, 2021

0x2b3bfa0 closed this as completed May 19, 2021

0x2b3bfa0 mentioned this issue Apr 13, 2022

Accept scripts without requiring additional Base64 encoding iterative/terraform-provider-iterative#91

Closed

0x2b3bfa0 mentioned this issue Jun 21, 2023

[Snyk] Security upgrade semver from 7.3.7 to 7.5.2 #1389

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cml-runner times out when trying to check status of spawned instance #536

cml-runner times out when trying to check status of spawned instance #536

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021 •

edited

Loading

0x2b3bfa0 commented May 19, 2021

DavidGOrtega commented May 19, 2021 •

edited

Loading

thatGreekGuy96 commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

DavidGOrtega commented May 19, 2021

DavidGOrtega commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

cml-runner times out when trying to check status of spawned instance #536

cml-runner times out when trying to check status of spawned instance #536

Comments

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021 • edited Loading

0x2b3bfa0 commented May 19, 2021

DavidGOrtega commented May 19, 2021 • edited Loading

thatGreekGuy96 commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

DavidGOrtega commented May 19, 2021

DavidGOrtega commented May 19, 2021

thatGreekGuy96 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021

0x2b3bfa0 commented May 19, 2021 •

edited

Loading

DavidGOrtega commented May 19, 2021 •

edited

Loading