Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of unsorted VCF files #636

Merged
merged 8 commits into from
Sep 5, 2024

Conversation

gspowley
Copy link
Member

@gspowley gspowley commented Sep 3, 2024

  • Improve handling unsorted VCF files, by not relying on the filename extension.
  • Simplify VCF sort/bgzip/index by using one ThreadPool.

Non-related CI changes:

  • Add GHA step to check for valid .test_durations and avoid error splitting pytests
  • Modify test_wheel.py to avoid failures due to collisions between concurrent tests.

@Shelnutt2 Shelnutt2 requested a review from sgillies September 3, 2024 21:41
Copy link
Collaborator

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gspowley this is related to #633, yes? It looks like it supersedes some of that PR.

I don't understand the change to test_wheel.py.

I added two questions inline.

uri = sort_and_bgzip(uri, tmp_space=tmp_space)
tmp_uris.append(uri)
create_index_file(uri)
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gspowley is it possible to catch only the one (I assume? Could be wrong) specific exception that occurs in create_index_file()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, I'll have to check.

create_index_file(uri)
except Exception as e:
logger.error("%r: %s", uri, e)
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be re-raising here instead of returning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we want to log the error and proceed. This allows the ingestion to proceed with the good VCF files.

@gspowley
Copy link
Member Author

gspowley commented Sep 3, 2024

The change to test_wheel.py is intended to avoid issues that happen when 2 different CI jobs are running the tests in test_wheel.py at the same time. The current change does not fix the issue, @spencerseale may have some ideas.

@gspowley
Copy link
Member Author

gspowley commented Sep 3, 2024

this is related to #633, yes?

I wasn't aware of that PR, but yes the changes do overlap. I recommend we merge this PR first, since it's making functional changes to the logic.

Copy link
Collaborator

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting us unstuck @gspowley !

yield None

logger.info(f"Removing {_LOCAL_WHEEL}")
os.remove(_LOCAL_WHEEL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gspowley I think it would be better if we didn't write to our project source during testing. One less thing to worry about if anything goes wrong, is what I'm thinking. How about copying to the built-in pytest tmp_path and then re-pointing _LOCAL_WHEEL at that?

I might attack the use of globals to pass these parameters later, after we get ourselves out of this jam.

global _LOCAL_WHEEL

tmp_path = tmp_path_factory.mktemp("wheel")
_LOCAL_WHEEL = os.path.join(tmp_path, os.path.basename(_LOCAL_WHEEL))
Copy link
Collaborator

@sgillies sgillies Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for accepting my suggestion @gspowley , that's all from me.

@gspowley gspowley merged commit 2c6ff5b into main Sep 5, 2024
18 checks passed
@gspowley gspowley deleted the gspowley/sc-51603/vcf-sort-bgzip-index branch September 5, 2024 17:38
@sgillies sgillies mentioned this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants