-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cml-runner times out when trying to check status of spawned instance #536
Comments
systemd[1]: Starting cml service...
cml.sh[3019]: /usr/bin/cml.sh: 14: exec: cml-runner: not found
systemd[1]: cml.service: Main process exited, code=exited, status=127/n/a
systemd[1]: cml.service: Failed with result 'exit-code'.
systemd[1]: Failed to start cml service.
cloud-init[2049]: Job for cml.service failed because the control process exited with error code. |
👋🏼 Welcome, @thatGreekGuy96! As far as I can tell, we haven't introduced any breaking change. At least, not deliberately. Said that, it looks like the machine image doesn't have CML available in the executable path. Which AWS region are you on? |
@0x2b3bfa0 we have removed the await inside CML depending now exclusively on TF 🤔 |
We're in |
Update: I've retried, while also adding If you have indeed made updates, is there a way for me to explicitly specify which version of the cml-runner command I'm using? So that I can roll back to the one that was working for us. |
Nothing to do with this; please refer to the error message above. |
@0x2b3bfa0 here is the AMI used by the instances in case it helps |
@thatGreekGuy96 we have identified the potential issue. hopefully it will be fixed within the next hour |
@thatGreekGuy96 I can confirm that the issue should be now fixed. Could you please confirm it? |
Yup it seems to be working now! out of curiosity, what was the problem? |
@thatGreekGuy96, the cloud runner initialization script was installing the We fixed it with the iterative/terraform-provider-iterative@5f62a02 commit. gyp WARN EACCES current user ("nobody") does not have permission to access the dev dir "/root/.cache/node-gyp/12.20.1"
gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp"
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack Error: EACCES: permission denied, mkdir '/usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic/.node-gyp'
gyp ERR! System Linux 5.4.0-1035-aws
gyp ERR! command "/usr/bin/node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/lib/node_modules/@dvcorg/cml/node_modules/mmmagic
gyp ERR! node -v v12.20.1
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2021-05-19T08_25_12_292Z-debug.log |
We update CML regularly, but always preserving backwards compatibility; that's why we don't provide any mechanism for users to pin specific versions. Apart from this unfortunate incident, our releases should be pretty stable. 🤞🏼 😅 |
Hey everyone!
We've been using
cml
for a bout a month now to deploy ec2 runners and run tests on them. We have run into a weird problem today where the deploy runners command times out because it cannot talk to the ec2 instances. I'm attaching the logs below.As far as I can tell the runners are getting deployed and have public ipv4 addresses assigned to them. However, when I try to connect using EC2 connect i get this error:
Does someone know what could be going on here? Has something been updated on the cml side that we should know about? Any help would be greatly appreciated!
2_Deploy Cloud Instances (1).txt
The text was updated successfully, but these errors were encountered: