Deduplication of failed jobs #193

jhkennedy · 2024-11-21T01:26:19Z

When deduplicating, for performance reasons (see #119 ), we are moving away from querying the HyP3 API for all jobs associated with a reference scene and instead:

querying the S3 bucket directly for published scenes, which encompasses SUCCEEDED jobs (shipped in Release v0.5.8 #184)
querying the hyp3-its-live dynamodb table directly using the status_code index for PENDING and RUNNING jobs (should always be a small number and performant)

With that strategy, we won't account for FAILED jobs, potentially submitting multiple requests for the same job. Generally, we don't see many "unlucky" jobs that experience intermittent failure for all three attempts, so this is likely to submit jobs that will always fail.

There's a significant number of failed jobs, even at a nominal failure rate of 0.5%, in the 544-day search window, so searching for FAILED jobs is probably not a good option from a performance standpoint.

Since we shouldn't receive the same message that many times, I don't see this causing a significant cost increase (though we'll want to check in on this assumption at some point), but feel it's worth documenting.

Going to leave this open and labeled wontfix This will not be worked on as a reference

The text was updated successfully, but these errors were encountered:

jhkennedy added the wontfix This will not be worked on label Nov 21, 2024

jtherrmann mentioned this issue Nov 26, 2024

Move query_jobs_by_status_code function from its-live-monitoring to HyP3 API/SDK #199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplication of failed jobs #193

Deduplication of failed jobs #193

jhkennedy commented Nov 21, 2024

Deduplication of failed jobs #193

Deduplication of failed jobs #193

Comments

jhkennedy commented Nov 21, 2024