Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two analyses at a time doesnt work #753

Closed
brianlball opened this issue Apr 9, 2024 · 1 comment
Closed

Two analyses at a time doesnt work #753

brianlball opened this issue Apr 9, 2024 · 1 comment
Assignees

Comments

@brianlball
Copy link
Contributor

Running 2 analyses Resque queues in the web-background container can cause out of order ResqueJobs::RunAnalysis jobs.
For example, when an LHS analysis is submitted to OSAF, it is actually two jobs. The first is analysis_type='lhs' followed by analysis_type='batch_run'. see batch_run_methods. They are both Resque.enqueue'd right after each other, but there are no hooks or requirements that one goes before the other. With only 1 Resque Worker in the web-background container, only one job in the analyses queue is run at a time, the first being the non-batch_run analysis_type.

If there are 2 Resque workers in the web-background container, either by changing start-web-background.sh or helm-chart, batch_run.perform can start before lhs.perform can complete which results in datapoints trying to be run before they are made.

This is how it should work, where LHS generates the datapoints and batch_run then data_point.submit_simulation:

orig_order

VS

This is what can happen with 2 analysis queues right now, batch_run has no datapoints to submit because lhs has not created them yet:

2wb_order

@brianlball
Copy link
Contributor Author

A solution is to use resque hooks to set a redis flag in the first analysis_type:

before_perform_assign_started
if analysis_type != 'batch_run'
Resque.redis.set("analysis:#{analysis_id}:started", true)

Then remove it when the job is done:

self.after_perform_remove_started
if analysis_type != 'batch_run'
Resque.redis.del("analysis:#{analysis_id}:started")

And have batch_run check that the right conditions are met before proceeding; if not, then requeue the job without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant