[dataset] Gracefully handle all-None Chunkset #194
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This updates the behavior of the create items task when the user-provided
create_items
function returnsNone
for every asset in the chunkfile.Previously, we would only write an ndjson file to storage if there were any valid items returned by
create_items
. If theresults
were empty (no items were returned for the entire chunkfile) then we'd skip writing the output file.This was incompatible with the
ingest-items
task, which assumes the file exists when it comes to reading. It would try to read the file and error, causing the run to fail.We have (at least) two choices on how to reconcile this:
Option 1 feels much safer. Presumably the user knows what they're doing when they return
None
from their function. By the time we get toingest-items
, we have no idea why a file might be missing.