Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect_mono failed with Read error (39) : premature end #680

Closed
Tracked by #311
eu9ene opened this issue Jun 17, 2024 · 3 comments · Fixed by #738
Closed
Tracked by #311

collect_mono failed with Read error (39) : premature end #680

eu9ene opened this issue Jun 17, 2024 · 3 comments · Fixed by #738
Assignees
Labels
taskcluster Issues related to the Taskcluster implementation of the training pipeline

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented Jun 17, 2024

https://firefox-ci-tc.services.mozilla.com/tasks/IujzQFOHSyarfuGZTqYOHg/runs/0/logs/public/logs/live.log

[task 2024-06-17T20:26:28.119Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 17 MB...    
[task 2024-06-17T20:26:28.285Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 71 MB...    
[task 2024-06-17T20:26:28.452Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 125 MB...    
[task 2024-06-17T20:26:28.619Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 176 MB...    
[task 2024-06-17T20:26:28.786Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 225 MB...    
[task 2024-06-17T20:26:28.953Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 276 MB...    
[task 2024-06-17T20:26:28.996Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 326 MB...    
[task 2024-06-17T20:26:29.119Z]                                                                                
[task 2024-06-17T20:26:29.119Z] 
[task 2024-06-17T20:26:29.286Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 34 MB...    
[task 2024-06-17T20:26:29.453Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 90 MB...    
[task 2024-06-17T20:26:29.620Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 144 MB...    
[task 2024-06-17T20:26:29.787Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 200 MB...    
[task 2024-06-17T20:26:29.954Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 250 MB...    
[task 2024-06-17T20:26:30.065Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 299 MB...    ches/file.17.out.zst : Read error (39) : premature end 
@eu9ene
Copy link
Collaborator Author

eu9ene commented Jun 17, 2024

@eu9ene eu9ene added the taskcluster Issues related to the Taskcluster implementation of the training pipeline label Jun 25, 2024
@bhearsum
Copy link
Collaborator

bhearsum commented Jul 9, 2024

Something definitely went wrong with the download of file.17.out.zst here. It, and another file from the log:

https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EhFkOBKCTOi5Vt5wMLJyOA/artifacts/public/build/file.4.out.zst resolved to 136533813 bytes with sha256 93e25e98fb56696dd562c99bfb63ceac87cd27e1a9b324b6f1089989fae24c05 in 7.914s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UBpNNGbKRdK19sBoRLRarA/artifacts/public/build/file.17.out.zst resolved to 133492367 bytes with sha256 c353ff408dbcde93825824c2c935bcc0d0f0d67cf4b97a8589e66a7366b17ffc in 7.970s

Yet when I download myself, I get a match for 4, but different results for 17:

~/tmp/2024-07-09 ❯ sha256sum *   
4999a845ad51b64a7041cf2152c4a926c3ba6275bb05cd7cb82644f2e410137d  file.17.out.zst
93e25e98fb56696dd562c99bfb63ceac87cd27e1a9b324b6f1089989fae24c05  file.4.out.zst
~/tmp/2024-07-09 ❯ ls -l
total 266780
-rw-rw-r-- 1 bhearsum bhearsum 136633331 Jun 17 11:16 file.17.out.zst
-rw-rw-r-- 1 bhearsum bhearsum 136533813 Jun 17 11:21 file.4.out.zst

Taskgraph is responsible for these downloads. I've filed taskcluster/taskgraph#538 for this.

@bhearsum
Copy link
Collaborator

Upstream issue is fixed; I'll keep this open until we pick up a taskgraph version with the fix.

bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 16, 2024
This picks up some fixes that are expected to fix mozilla#680.
bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 22, 2024
This picks up some fixes that are expected to fix mozilla#680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.
bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 23, 2024
This picks up some fixes that are expected to fix mozilla#680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.
bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 23, 2024
This picks up some fixes that are expected to fix mozilla#680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.
bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 23, 2024
This picks up some fixes that are expected to fix mozilla#680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.
@bhearsum bhearsum self-assigned this Jul 24, 2024
bhearsum added a commit to bhearsum/firefox-translations-training that referenced this issue Jul 24, 2024
This picks up some fixes that are expected to fix mozilla#680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.
bhearsum added a commit that referenced this issue Jul 24, 2024
* chore: bump taskgraph to 10.0.1

This picks up some fixes that are expected to fix #680.

I'm picking up other dependency updates as well, most notably to redo (2.x -> 3.x). That major bump is just because it's dropping Python 2.x support, which doesn't affect us.

* fix: ensure mkdir /builds always succeeds in base docker image

This may be implicitly done because it is referenced in a `VOLUME`. See https://taskcluster-taskgraph.readthedocs.io/en/latest/reference/migrations.html#x-10-x.

* fix: don't try to decompress fetched python wheels or npz files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
taskcluster Issues related to the Taskcluster implementation of the training pipeline
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants