Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert CPU generic workers #629

Merged
merged 2 commits into from
May 27, 2024
Merged

Revert CPU generic workers #629

merged 2 commits into from
May 27, 2024

Conversation

eu9ene
Copy link
Collaborator

@eu9ene eu9ene commented May 24, 2024

I'm reverting it due to many random errors in data/cleaning steps: https://firefox-ci-tc.services.mozilla.com/tasks/groups/BFgoyXuvRHe0kK4J9d-Lgw

Testing it here: https://firefox-ci-tc.services.mozilla.com/tasks/groups/aKUqlLsyR3O3Kaa5TU-LMg

UPD: it works well, all the cleaning steps completed

@eu9ene eu9ene marked this pull request as ready for review May 24, 2024 17:43
@eu9ene eu9ene requested a review from a team as a code owner May 24, 2024 17:43
@eu9ene eu9ene requested review from jcristau and bhearsum May 24, 2024 17:43
Copy link
Collaborator

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, the failures in https://firefox-ci-tc.services.mozilla.com/tasks/groups/BFgoyXuvRHe0kK4J9d-Lgw are caused by exceeding the max run time. Bumping https://github.com/mozilla/firefox-translations-training/blob/56040c94b9a745a9594491b57a373ba231739cc9/taskcluster/kinds/clean-corpus/kind.yml#L66 seems like it would fix them.

Nevertheless, I understand if you want to be cautious here and revert anyways. I'll call out that landing this will lose memory monitoring for CPU workers in case that matters at the moment.

Approving this as it's 100% your call as far as I'm concerned.

@eu9ene
Copy link
Collaborator Author

eu9ene commented May 27, 2024

@bhearsum I clearly see a lot of random weird errors unrelated to max run time when using those new workers and all of those errors were immediately gone after switching to the old ones. I think there's an infrastructure issue, probably a network-related one.

@eu9ene eu9ene merged commit d41f07a into release May 27, 2024
51 checks passed
@bhearsum
Copy link
Collaborator

I filed #640 as a followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants