Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gencode refactor to remove gcs #934

Merged
merged 3 commits into from
Oct 31, 2024
Merged

Gencode refactor to remove gcs #934

merged 3 commits into from
Oct 31, 2024

Conversation

bpblanken
Copy link
Collaborator

@bpblanken bpblanken commented Oct 24, 2024

I'm running into weird dependency issues with dataproc, decided to remove our gcs dep!

Resolves #609

@bpblanken bpblanken requested a review from a team as a code owner October 24, 2024 21:07
raise ValueError(
'Unexpected number of fields on line in ensemble_to_refseq mapping',
msg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extreme nitpick but you can put the ValueError and msg on the same line here

response = requests.get(url, stream=True, timeout=10)
gene_symbol_to_gene_id = {}
for line in gzip.GzipFile(fileobj=response.raw):
line = line.decode('ascii') # noqa: PLW2901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this became very simple. is there a downside to taking the pickle file part out of the download process? do you know why it was there in the first place?

@bpblanken bpblanken merged commit a515c2c into dev Oct 31, 2024
3 checks passed
@bpblanken bpblanken deleted the benb/gencode_refactor branch October 31, 2024 20:30
bpblanken added a commit that referenced this pull request Nov 5, 2024
* add task to write relatedness check to tsv (#930)

* add task to write relatedness check to tsv

* fix requirements

* relatedness_check_table_path

* add relatedness check file path to metadata.json

* Benb/use metadata as source of family table load (#936)

* use run metadata as source of family table load

* ruff

* Support gcs dirs in rsync (#932)

* Support gcs dirs in rsync

* ws

* Gencode refactor to remove gcs (#934)

* Gencode refactor to remove gcs

* Fix

* additional semi join (#947)

* metadata parameters refactor (#946)

* metadata parameters refactor

* fix missing param

* tweak

* missed one

* last one

* fix test

* last few bugfixes

* fix

* bump

* missed one

* change parameter type due to confusing bug

* push

* enum

* Parse clinvar version from header (#949)

* Parse clinvar version from header

* responses activate

* fix test

* Dependency reordering so that `ValidateCallsetTask` runs before updating the reference data. (#950)

* Parse clinvar version from header

* Dependency reordering for reference data updates and validation

* ruff

* missed one

* Revert relatedness changes

* push

* Fix import issue

* Fix sample type

* ruff

* Fix import mocking

* imports

* responses activate

* fix test

* Tweaks

* comment

* Benb/check parsed clinvar version in complete (#951)

* Parse clinvar version from header

* First pass

* Bump hail tables to https

* correct dataset/dataset types

* Fix clinvar mito

* Fix combined

* Dependency reordering for reference data updates and validation

* ruff

* missed one

* Revert relatedness changes

* push

* Fix import issue

* Fix sample type

* ruff

* Fix import mocking

* imports

* Missed one

* First mocking pass

* Finish mocks in reference data

* responses activate

* ruff

* commas

* fix test

* Update compare_globals.py

* import

---------

Co-authored-by: Julia Klugherz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants