Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job system and scanner job refactoring #102

Closed
aaronleopold opened this issue Mar 18, 2023 · 2 comments
Closed

job system and scanner job refactoring #102

aaronleopold opened this issue Mar 18, 2023 · 2 comments
Labels
chore enhancement but more tedious

Comments

@aaronleopold
Copy link
Collaborator

aaronleopold commented Mar 18, 2023

There are two main areas relating to jobs and the scanner job specifically that need addressing:

  1. Jobs are kind of stateless right now. Sure, they have access to a minimal subset of context, but are stateless with respect to the jobs own state (e.g. what task it is currently working on, how many are left, etc). This lack of state prevents me from properly implementing features like:

    • pausing and resuming jobs
    • persisting job state to disk/db
      • We could probably use some sort of save file? Just spit the state to a file to save and then read it and deserialize to load
    • handling failures and retries without having to re-do the entire job
      • this would include shutting down the server while a job is running
  2. The current implementation of the scanner and the jobs is not overly flexible and simply won't scale down the road, as exemplified by testers with enormous libraries. It's definitely quick, but I'd rather sacrifice on a little speed for memory efficiency for those larger libraries. I think part of the issue is that for really large libraries, there are simply too many threads being spawned, degrading memory efficiency and performance.

Brainstorming / Initial Ideas

Job System

I already have the concept of a JobWrapper which handles the kickoff of the actual job process as well as simple things like duration and what not. I think this should be extended to also manage some sort of state for the job, which can be mutated with on_progress callbacks, perhaps. The state, in general, needs to store the following:

  • What tasks were determined necessary for the job (i.e. a task list)
  • What tasks were completed
  • What tasks have yet to be completed

This needs to be MORE than just a numeric value, as it currently is.

A shutdown signal should also be added to the JobWrapper which listens for a signal of two kinds:

  1. Cancel all jobs (all jobs get cancelled, should also clear the queue)
  2. Cancel job by ID (only a job by a given ID should be cancelled)
@aaronleopold
Copy link
Collaborator Author

I also think while I'm tackling this I might introduce a slight language change to the scan modes and how they work:

In-order scanning

In-order scanning processes one series at a time and inserts its media one-by-one as soon as they are discovered. This means that you can access new media files as soon as they are scanned, even if the rest of the series has not been scanned yet

Parallel scanning

Parallel scanning processes multiple series at once, up to 10 series in a batch. This significantly reduces the overall scanning time, but you may not be able to access some media files until the entire batch is processed.

For example, if you have a new library with 200 series, each containing 50 books, an in-order scan will start from the first series and insert all of its media one at a time before moving on to the next series. On the other hand, a parallel scan will divide the 200 series into two chunks of 100 series and process up to 10 series in parallel for each chunk before inserting all the media in a single batch per chunk.

@aaronleopold aaronleopold added this to the 0.1.0 milestone Mar 22, 2023
@aaronleopold aaronleopold added the chore enhancement but more tedious label Apr 9, 2023
@github-project-automation github-project-automation bot moved this to Backlog in v0.1.x Jan 20, 2024
@aaronleopold aaronleopold moved this from Backlog to In Progress in v0.1.x Jan 28, 2024
@aaronleopold
Copy link
Collaborator Author

Closing as completed, but is is in experimental

@github-project-automation github-project-automation bot moved this from In Progress to Done in v0.1.x Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore enhancement but more tedious
Projects
Status: Done
Development

No branches or pull requests

1 participant