Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use active_run_lock(run) to protect updates of aggregated data. #2020

Merged
merged 1 commit into from
May 18, 2024

Conversation

vdbergh
Copy link
Contributor

@vdbergh vdbergh commented May 18, 2024

We changed the active_run_locks to be reentrant.

We changed the active_run_locks to be reentrant.
@ppigazzini ppigazzini added enhancement server server side changes labels May 18, 2024
Copy link
Collaborator

@ppigazzini ppigazzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on DEV

@ppigazzini ppigazzini merged commit a51582a into official-stockfish:master May 18, 2024
19 checks passed
@vdbergh
Copy link
Contributor Author

vdbergh commented May 18, 2024

@ppigazzini Can you check in the server log if there are messages like this

if run["finished"]:
info = "Update_task: task {}/{} belongs to a finished run".format(
run_id, task_id
)
print(info, flush=True)
self.set_inactive_task(task_id, run)
self.buffer(run, True)
return {"task_alive": False, "info": info}

I am trying to debug this https://tests.stockfishchess.org/actions?action=log_message&user=&text=cores

@vondele
Copy link
Member

vondele commented May 18, 2024

Very rare, and not very recent:

$ sudo journalctl -u fishtest@6543 --since "30 days ago" | grep "belongs to a finished run"
May 05 07:23:32 tests.stockfishchess.org pserve[28672]: Update_task: task 663723214b68b70d8580c189/890 belongs to a finished run
May 18 09:32:22 tests.stockfishchess.org pserve[22090]: Update_task: task 66458ee993ce6da3e93b5b90/1079 belongs to a finished run

But matching the two events you highlight

@vdbergh
Copy link
Contributor Author

vdbergh commented May 18, 2024

Thanks. I will fix this.

@vondele
Copy link
Member

vondele commented May 18, 2024

This could possibly happen if the worker sends an update_task, the rundb is updated, but the full transaction fails (e.g. nginx timeout), in which the worker sends the update_task again?

@ppigazzini
Copy link
Collaborator

extended to 180 days ago

$ sudo journalctl -u fishtest@6543 --since "180 days ago" | grep "is not equal to 0"
May 05 07:19:11 tests.stockfishchess.org pserve[28672]: The run object 663723214b68b70d8580c189 does not validate: run['cores'] (value:7) is not equal to 0
May 18 09:28:04 tests.stockfishchess.org pserve[22090]: The run object 66458ee993ce6da3e93b5b90 does not validate: run['cores'] (value:7) is not equal to 0

@vdbergh
Copy link
Contributor Author

vdbergh commented May 18, 2024

This could possibly happen if the worker sends an update_task, the rundb is updated, but the full transaction fails (e.g. nginx timeout), in which the worker sends the update_task again?

No this happens because for performance reasons update_task() and request_task() are not synchronized. But with PR #2020 they are at least partially synchronized so that we can now do something in request_task(). See #2021 .

@vdbergh vdbergh mentioned this pull request May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement server server side changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants