[Extensions] Enable extensions to send a request to acquire a LockModel from JobScheduler's lock service during their job runner execution #363

joshpalis · 2023-01-24T19:24:19Z

Is your feature request related to a problem?

When a plugin wants to execute their job, Job Scheduler provides a JobExecutionContext, which provides information for the job (jobId, expected execution time, job document version, job index name) and a lockService object from which a plugin can leverage to retrieve a lock for their job execution.

The lockservice prevents a ScheduledJobRunner from running more than 1 job at a time. Lets take Anomaly Detector Job for instance :

There's a 1 to 1 relationship with a detector and a job, therefore the lock service is locking the detector from running more than 1 real-time analysis task.

As per AD API documentation :

When you start a real-time detector, the anomaly detection plugin creates a job or if the job already exists updates it.
When you start or a restart a real-time detector, the plugin creates a new real-time task that records run-time information like detector configuration snapshot, real-time job states (initializing/running/stopped), init progress, and so on. A single detector can only have one real-time job (job ID is the same as detector ID), but it can have multiple real-time tasks because each restart of a real-time job creates a new real-time task.

Solution

Now that extensions are running on a separate JVM, it is necessary to enable extensions to send a request to Job Scheduler to acquire this lock model to use during their job execution. Given that the lock service is used only to restrict the ScheduledJobRunner from being executed more than once for a given interval, the right course of action is to have the extension make the requests to acquire the lock service rather than having Job Scheduler send the lock model as part of the runJob invocation.

Tasks :

Register an API endpoint in Job Scheduler to acquire a lockmodel entry from the lock service metadata index, the request parameters should include the jobId (In the case of AD, the jobId is the same as the detector Id)
Upon receiving the acquire lock request, JS should utilize the node client to query the lock service metadata index for an existing entry keyed with the provided jobId. This information should then be pulled from the metadata index, and sent back to the caller as part of the rest response
Create multi-node integration tests for this API
Create rest integration tests for this API

The text was updated successfully, but these errors were encountered:

joshpalis added the enhancement New feature or request label Jan 24, 2023

joshpalis self-assigned this Jan 24, 2023

joshpalis mentioned this issue Feb 7, 2023

[Extensions] Exposes a GetLock REST API to enable extensions to acquire a lock model for their job execution opensearch-project/job-scheduler#311

Merged

5 tasks

joshpalis closed this as completed Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Extensions] Enable extensions to send a request to acquire a LockModel from JobScheduler's lock service during their job runner execution #363

[Extensions] Enable extensions to send a request to acquire a LockModel from JobScheduler's lock service during their job runner execution #363

joshpalis commented Jan 24, 2023 •

edited

Loading

[Extensions] Enable extensions to send a request to acquire a LockModel from JobScheduler's lock service during their job runner execution #363

[Extensions] Enable extensions to send a request to acquire a LockModel from JobScheduler's lock service during their job runner execution #363

Comments

joshpalis commented Jan 24, 2023 • edited Loading

Is your feature request related to a problem?

Solution

joshpalis commented Jan 24, 2023 •

edited

Loading