Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job fails, logs unaccessible "The specified log stream does not exist." #12

Closed
chrisgorgo opened this issue Aug 27, 2017 · 11 comments
Closed
Assignees
Labels

Comments

@chrisgorgo
Copy link
Contributor

https://openneuro.org/datasets/ds001038/versions/00001?app=BARACUS&version=3&job=da564531-76cd-402f-809b-37994f1c3223

"download all logs" fails with "Failed - Server problem"

Originally reported by @poldrack

@chrisgorgo chrisgorgo added the bug label Aug 27, 2017
@poldrack
Copy link

poldrack commented Aug 27, 2017 via email

@chrisgorgo
Copy link
Contributor Author

Original error:

{"original":null,"response":{"req":{"_query":["root=true"],"method":"GET","url":"https://openneuro.org/crn/logs/BARACUS/550dc4be-f272-40e1-84c9-6eef6a9bae53/b5e72d25-351a-401e-be6d-e034efe3046f?root=true","header":{"Authorization":"ya29.Gl20BCi9fjeHC-wQvvHqHmzJ1bvuRLvyF1MgcyVd4XjMp99s16TRRQeMGKgjaXHaCJrOU6YA1-wtqwhj_L2H5qv58y6Xwvz5-1t4I5tLV4q21h4pC7jfVZJMFxHS2kI"},"_header":{"authorization":"ya29.Gl20BCi9fjeHC-wQvvHqHmzJ1bvuRLvyF1MgcyVd4XjMp99s16TRRQeMGKgjaXHaCJrOU6YA1-wtqwhj_L2H5qv58y6Xwvz5-1t4I5tLV4q21h4pC7jfVZJMFxHS2kI"},"_callbacks":{"$end":[null]},"xhr":{},"_timeout":0},"xhr":{},"text":"{\"error\":\"The specified log stream does not exist.\"}","statusText":"","statusCode":500,"status":500,"statusType":5,"info":false,"ok":false,"clientError":false,"serverError":true,"error":{"status":500,"method":"GET","url":"https://openneuro.org/crn/logs/BARACUS/550dc4be-f272-40e1-84c9-6eef6a9bae53/b5e72d25-351a-401e-be6d-e034efe3046f?root=true"},"accepted":false,"noContent":false,"badRequest":false,"unauthorized":false,"notAcceptable":false,"notFound":false,"forbidden":false,"headers":{"date":"Sun, 27 Aug 2017 18:28:32 GMT","etag":"W/\"34-SEhvhGl5EQrBtWE07r+Bilrc4yc\"","server":"nginx/1.13.3","access-control-allow-origin":"*","x-powered-by":"Express","strict-transport-security":"max-age=31536000","access-control-allow-methods":"GET, POST, OPTIONS, PUT, PATCH, DELETE","content-type":"application/json; charset=utf-8","status":"500","access-control-allow-headers":"content-type, Authorization","content-length":"52"},"header":{"date":"Sun, 27 Aug 2017 18:28:32 GMT","etag":"W/\"34-SEhvhGl5EQrBtWE07r+Bilrc4yc\"","server":"nginx/1.13.3","access-control-allow-origin":"*","x-powered-by":"Express","strict-transport-security":"max-age=31536000","access-control-allow-methods":"GET, POST, OPTIONS, PUT, PATCH, DELETE","content-type":"application/json; charset=utf-8","status":"500","access-control-allow-headers":"content-type, Authorization","content-length":"52"},"type":"application/json","charset":"utf-8","body":{"error":"The specified log stream does not exist."}},"status":500}

@chrisgorgo
Copy link
Contributor Author

This seems to influence logs from all jobs, which means this is a high priority bug.

@nellh nellh self-assigned this Aug 27, 2017
@nellh
Copy link
Contributor

nellh commented Aug 27, 2017

Looks like the log stream names created by Batch have changed. "FreeSurfer/eb251a5f-6314-457f-a36f-d11665451ddb/bf4fc87d-2e87-4309-abd9-998a0de708de" is now "FreeSurfer/default/bf4fc87d-2e87-4309-abd9-998a0de708de"

Logs are not lost but our handler will need to be updated to support both formats.

@chrisgorgo
Copy link
Contributor Author

It would be great to get a fix rolled out to production soon. We cannot debug jobs at the moment. Thank you!

@chrisgorgo
Copy link
Contributor Author

FYI - this only seems to be happening on prod not on dev.

@chrisgorgo
Copy link
Contributor Author

This seems to have regressed:
https://openneuro.org/datasets/ds001107/versions/00001?app=MRIQC&version=45&job=71f063e2-f2af-4923-94e7-acd0f73cce75

Download link is also a HTML redirect:
https://openneuro.org/logs/MRIQC/default/aae7ab39-12e2-4206-8864-cfd4b7fa649a.json

Additional error from dev console:

VM192:1 GET https://openneuro.org/crn/logs/MRIQC/default/aae7ab39-12e2-4206-8864-cfd4b7fa649a?root=true 500 ()

@chrisgorgo chrisgorgo reopened this Oct 11, 2017
@nellh
Copy link
Contributor

nellh commented Oct 11, 2017

I think this is the memory problem from #131 and not related to this issue.

@chrisgorgo
Copy link
Contributor Author

MRIQC (without ICA option which was the case here) has much lower memory footprint so I'm surprised it run out of memory. Interference from another job on the same node?

@chrisgorgo
Copy link
Contributor Author

This has regressed yet again in https://openneuro.org/datasets/ds001091/versions/00001?app=antsCorticalThickness&version=1

Furthermore the job status visual labels are not showing
image

It also seems that plenty of jobs failed recently: https://openneuro.org/admin/jobs

@nellh
Copy link
Contributor

nellh commented Oct 25, 2017

Closing this as this is different than the issue causing the recent errors (ECS instability) and that should be fixed.

@nellh nellh closed this as completed Oct 25, 2017
nellh added a commit that referenced this issue Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants