-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of unsorted VCF files #636
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/tiledb/cloud/vcf/ingestion.py
Outdated
uri = sort_and_bgzip(uri, tmp_space=tmp_space) | ||
tmp_uris.append(uri) | ||
create_index_file(uri) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gspowley is it possible to catch only the one (I assume? Could be wrong) specific exception that occurs in create_index_file()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely, I'll have to check.
create_index_file(uri) | ||
except Exception as e: | ||
logger.error("%r: %s", uri, e) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be re-raising here instead of returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we want to log the error and proceed. This allows the ingestion to proceed with the good VCF files.
The change to |
I wasn't aware of that PR, but yes the changes do overlap. I recommend we merge this PR first, since it's making functional changes to the logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting us unstuck @gspowley !
yield None | ||
|
||
logger.info(f"Removing {_LOCAL_WHEEL}") | ||
os.remove(_LOCAL_WHEEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gspowley I think it would be better if we didn't write to our project source during testing. One less thing to worry about if anything goes wrong, is what I'm thinking. How about copying to the built-in pytest tmp_path
and then re-pointing _LOCAL_WHEEL
at that?
I might attack the use of globals to pass these parameters later, after we get ourselves out of this jam.
global _LOCAL_WHEEL | ||
|
||
tmp_path = tmp_path_factory.mktemp("wheel") | ||
_LOCAL_WHEEL = os.path.join(tmp_path, os.path.basename(_LOCAL_WHEEL)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for accepting my suggestion @gspowley , that's all from me.
sort/bgzip/index
by using oneThreadPool
.Non-related CI changes:
.test_durations
and avoid error splitting pyteststest_wheel.py
to avoid failures due to collisions between concurrent tests.