Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access kolibri's job storage database to reconcile tasks #179

Merged
merged 20 commits into from
Jan 18, 2024

Conversation

bjester
Copy link
Member

@bjester bjester commented Dec 12, 2023

Summary

  • Implements tracking of the Work Manager process, thread, and request IDs in the database: job_storage.jobs.worker_*
  • Small simplifications performed on existing code
  • Refactors task request and query building out of Task.java into a shared utility
  • Adds SQLite interface for reading and updating a database
  • Adds SQLite implementation for interacting with select few columns on job_storage.jobs table
  • Adds Sentinel class which can cross reference jobs in Kolibri's job_storage database with Work Manager's tasks
  • Adds Reconciler class which can process results from Sentinel and re-queue tasks for any issues observed
  • Adds Task.reconcile static method that uses Sentinel and Reconciler to reconcile tasks
  • Adds a ReconcileWorker which processes task reconciliation

WIP / TBD

  • The Reconciler implements exclusive locking, as I intended for reconciliation to be able to run both in a task and at startup of an app process, but currently it's only running in the task, pending results of QA
  • The Reconciler implements a database transaction, but that is currently commented out, as it encountered a 'database is locked' error too many times

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all making sense to me so far, even if the precise details escape or confuse me on occasion!

* @param stateRef The job status in the Kolibri database for which to find jobs
* @return A future that will complete when all jobs have been checked, with a list of jobs
*/
public CompletableFuture<Bundle[]> check(StateMap stateRef) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely find reading overloaded implementation definitions a bit confusing - but also get why it's useful, I am just unused to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely went a little overboard with the method overloading, but it is one of my favorite features. I like it because you can break down larger functions into smaller bits and tie them all together by overloading the name. It's also helpful for making optional parameters of course

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what is happening, and it makes sense to me. The sequential processing does seem like a sure fire way of stopping weird concurrency issues.

CompletableFuture<List<Result>> chain = CompletableFuture.completedFuture(allResults);

for (Bundle job : jobs) {
chain = chain.thenComposeAsync((results) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, am I reading it correctly that now each job is being processed sequentially here, whereas previously there could be parallel processing of individual jobs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! The thenComposeAsync accepts a callback that itself returns another future, sort of like promise chaining, making everything process sequentially. This particular change seemed to provide the most improvement, and it makes the logs easier to read too.

My only worry was how we managed the DB connection without a try-with resources statement, but rather rely on the future completing in order to close the connection. CoPilot indicated there could be issues mixing futures and the try-with. I was also happy that CoPilot had nothing to say other than 'no issues could be detected and the code is well structured [....]'. With the addition of some timeouts, it should also prevent any stalling should it re-occur.

future.addListener(() -> {
if (future.isCancelled()) {
Log.i(TAG, "Interrupting python thread");
threadFuture.cancel(true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor simplification here as I learned more about how these futures and runnables work.

@rtibbles rtibbles self-assigned this Jan 15, 2024
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squinting, I think I get the high level of the diff here. I am going to check it out locally, to read through the code to make sure I have a good picture in my head.

My hope is that some of this code can get trimmed somewhat once we move to chaquopy, so this diff is a bit of a bump that won't be completely permanent.

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of issues from manual QA to follow up on here - but it's not clear where the issues lie just yet, and this PR is big enough and robust enough for us to merge this and iterate further, here, and in Kolibri.

@rtibbles rtibbles merged commit 28c9e0a into develop Jan 18, 2024
4 checks passed
@rtibbles rtibbles deleted the task-reconcile branch January 18, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants