Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/job cancellation orchestrator #56

Merged
merged 12 commits into from
Jul 21, 2024

Conversation

williams-jack
Copy link
Collaborator

Feature/Problem Description

The client (aka, a job's creator) should be notified in the following scenarios:

  • When someone has cancelled a job through the deleteJob endpoint.
  • When an image has failed to build and a job config in the holding pen was not able to be enqueued.

Solution (Changes Made)

  • POST requests made to the response endpoint for a job with the GradingJobResult and the job's key.

@williams-jack
Copy link
Collaborator Author

Remaining question: how do we deal with jobs that failed to be enqueued post-image build? Should they occur in their own individual transactions, or should we do them all in a single transaction?

The latter (current implementation) gives us no way to send a failed enqueuing notification back to the client.

@blerner
Copy link
Collaborator

blerner commented Jun 26, 2024

This feels incomplete somehow. "A Dockerfile failed to build" is not the same thing as "A student submission was graded", and I don't think you should shoehorn a human-readable sentence into something that ought to be machine-parseable data. Surely Docker would give some error feedback as to what didn't build, and surely the SHA hash should be sent back... So this shouldn't be a GradingJobResult, but rather a GraderConfigCreationResult, with some new data definition in it.

For the remaining question, I'm not sure I understand. The holding-pen should not release its occupants until the grader is built successfully, and then all the contents should be released together. What "enqueuing notification" do you mean, here?

@williams-jack
Copy link
Collaborator Author

New data definition makes sense to me, I agree it could be much more informative.

To elaborate a bit more on the latter, I'm getting at the fact that once an image has been built, when we release the jobs to the queue (e.g., the handleCompletedImageBuild function call in the service) we are running the createOrUpdateJob functionality on each one individually. The entire collection of these operations is all being run in a single transaction. If one of these fails to be released to the queue, I think it makes sense to notify the client that this has happened.

That said, because they are all being released in a single transaction, this feels like a difficult thing to implement. So really this raises my question, is releasing all held jobs in the pen a single transaction, or should each individual release operation be its own?

@blerner
Copy link
Collaborator

blerner commented Jun 27, 2024

You have two functions named createOrUpdateJob, so I'm not sure which one you mean for me to look at here. (In create-or-update-job.ts and in grading-queue-controller.ts) Either way, I'm not quite sure what your concern is, precisely.

  • There's no real response you can send to Bottlenose (or any other response url), that Bottlenose could do anything with. At best it would have to store the error message in the database somewhere, and then hope the professor looked for it somehow...
  • But better, there should just be a page on Orca to view the holding pen and see why things are stuck in there.
  • And, every time a grader config is created, the holding pen should be examined and emptied if possible.
  • If you're really trying to compute "Of X jobs, M succeeded and N failed", create variables M=0, N=0 outside the transaction, and in the transaction loop over things, save them, and increment M or N accordingly. Then complete the transaction, and report the final M and N values.

@williams-jack williams-jack marked this pull request as ready for review July 4, 2024 19:02
Allows image build service to notify client
when image build fails. Additionally, swapped local port mappings in migration script and init script
to be compatible with the separate postgres instance running for Bottlenose
for local development specifically.
feat: build key for responding with build result
@williams-jack williams-jack merged commit 3f494f8 into main Jul 21, 2024
2 checks passed
@blerner blerner deleted the feat/job-cancellation-orchestrator branch August 22, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants