Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kurtwheeler/fix survey jobs #773

Merged
merged 3 commits into from
Nov 2, 2018
Merged

Kurtwheeler/fix survey jobs #773

merged 3 commits into from
Nov 2, 2018

Conversation

kurtwheeler
Copy link
Contributor

Issue Number

#724 (comment)

Purpose/Implementation Notes

Surveyor jobs were broken because the model didn't change but new fields on the SurveyJob model were referenced. We didn't know this because there was only one unit test for the Foreman involving SurveyJobs and it was so limited it didn't trigger any of the issues.

The surveyor dispatcher was also broken because it referenced a variable that wasn't defined.

I've updated the model, added migrations, changed the way survey jobs are queued to match the pattern of downloader and processor jobs so everything is consistent, added a lot more unit tests to the Foreman for survey jobs, and updated the surveyor dispatcher code to actually work.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)

Functional tests

I temporarily modified the message queue like so:

if job_type in list(SurveyJobTypes):
            nomad_response = nomad_client.job.dispatch_job(nomad_job, meta={"JOB_NAME": job_type.value,
                                                                            "JOB_ID": str(job.id)})
            job.nomad_job_id = nomad_response["DispatchedJobID"]
            job.save()

so that the surveyor disptacher jobs could queue surveyor jobs but those surveyor jobs wouldn't queue additional work themselves.

Then I setup my local nomad like so:

$ ./update_models.sh
$ ./prepare_image.sh -i foreman -d wkurt -s foreman
$ docker push wkurt/dr_foreman:latest
$ DOCKERHUB_REPO=wkurt sudo -E ./run_nomad.sh

And submitted a SURVEYOR_DISPATCHER job like so:

$ nomad job dispatch -meta FILE=s3://data-refinery-test-assets/small_for_staging.txt SURVEYOR_DISPATCHER

I made sure my job queued some work:

$ nomad status | grep dispatch
SURVEYOR_256/dispatch-1541175648-0d7848b6         batch                50        running  2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-2ee254a4         batch                50        dead     2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-4e0e70e7         batch                50        running  2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-6603b19f         batch                50        running  2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-90e55f18         batch                50        dead     2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-9751e391         batch                50        dead     2018-11-02T12:20:48-04:00
SURVEYOR_256/dispatch-1541175648-df11bcb2         batch                50        running  2018-11-02T12:20:48-04:00
SURVEYOR_DISPATCHER/dispatch-1541175646-4bc2296d  batch                50        dead     2018-11-02T12:20:46-04:00

And that the jobs it queued completed successfully:

$ nomad status SURVEYOR_256/dispatch-1541175648-9751e391
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
a65492ee  834913d6  jobs        0        run      complete  12m27s ago  12m18s ago

$ nomad logs a65492ee
2018-11-02 16:20:56,835 local data_refinery_common.message_queue INFO: Queuing DOWNLOADER nomad job to run job GEO with id 1.
2018-11-02 16:20:56,849 local data_refinery_common.message_queue INFO: Queuing DOWNLOADER nomad job to run job GEO with id 2.
2018-11-02 16:20:56,866 local data_refinery_common.message_queue INFO: Queuing DOWNLOADER nomad job to run job GEO with id 3.
2018-11-02 16:20:56,886 local data_refinery_common.message_queue INFO: Queuing DOWNLOADER nomad job to run job GEO with id 4.

Checklist

  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Copy link
Contributor

@dongbohu dongbohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two comments.

@@ -68,6 +68,8 @@ def send_job(job_type: Enum, job) -> bool:
raise ValueError(NONE_JOB_ERROR_TEMPLATE.format(job_type.value, "Processor", job_id))
elif job_type in list(Downloaders):
nomad_job = NOMAD_DOWNLOADER_JOB
elif job_type in list (SurveyJobTypes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to remove the space after between list and (.

run_nomad.sh Outdated

echo "Registering surveyor_dispatch.nomad$TEST_POSTFIX"
nomad run foreman/nomad-job-specs/surveyor_dispatcher.nomad"$TEST_POSTFIX"
for job_spec in $(ls -1 foreman/nomad-job-specs | grep "\.nomad$TEST_POSTFIX$"); do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be simplified into:

for job_spec in foreman/nomad-job-specs/*.nomad$TEST_POSTFIX; do

@kurtwheeler kurtwheeler reopened this Nov 2, 2018
@kurtwheeler kurtwheeler merged commit ba350cd into dev Nov 2, 2018
@kurtwheeler kurtwheeler deleted the kurtwheeler/fix-survey-jobs branch November 2, 2018 17:44
kurtwheeler added a commit that referenced this pull request Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants