Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEN-863] Add patch release #21

Merged
merged 41 commits into from
Sep 13, 2024
Merged

[GEN-863] Add patch release #21

merged 41 commits into from
Sep 13, 2024

Conversation

thomasyu888
Copy link
Contributor

@thomasyu888 thomasyu888 commented Aug 15, 2024

Create patch release nextflow workflow. Depends on #22

  • This is an nextflow execution that worked https://tower.sagebionetworks.org/orgs/Sage-Bionetworks/workspaces/genie-bpc-project/watch/ii9NWcNyVWB53 .
  • I tested this by running the patch release against 15.0-public to recreate the 15.5-consortium release in staging.
  • I wrote a compare_patch.py which could really be a "compare synapse folders" function. The comparison code is simple, it looks at some things.
    • Do all the filenames from original folder exist in the new folder and vice versa
    • Do all the files from the original folder contain the same md5

The two folders compared are 15.5-consortium (syn55146141) and 15.6-consortium (staging: syn62069187). Here is the output:

Number of files in old folder: 256
Number of files in new folder: 255
File not found in old folder: 15.6-consortium.html (This is expected, the new release is 15.6-consortium)
File not found in new folder: 15.5-consortium.html (This is expected, the release is 15.6-consortium)
Files are different: data_clinical_patient.txt (This is expected, the NAACCR mappings changed)
Files are different: data_clinical_sample.txt (This is expected, the NAACCR mappings changed)
Files are different: data_guide.pdf (The is expected, the release is 15.6-consortium)
Files are different: meta_study.txt (The is expected, the release is 15.6-consortium)
File not found in new folder: release-notes.pdf (This is expected, the release notes are generated after the fact)
Files are different: samples_to_retract.csv (This is expected, the samples are compared against the latest public release which is different now)

I then manually downloaded the 15.5-consortium and 15.6-consortium sample and patient files, removed the comment headers and checked the md5.

Before removing comments:

  • MD5 (data_clinical_patient.txt) = 64d7ed814cc96febee13338c61e37f39
  • MD5 (data_clinical_sample.txt) = dbe110a3043b222c52fa76865449919a
  • MD5 (patient_old.txt) = 1afaa2843fcc2904f5290b7ab89dcd48
  • MD5 (sample_old.txt) = 329e50a9cea586dab362e185943bb0a4

After removing comments:

  • MD5 (data_clinical_patient.txt) = bef6279d39a62ef55c430f232fce1c12
  • MD5 (data_clinical_sample.txt) = 68869f680f53b71f54886aa0437c9e59
  • MD5 (patient_old.txt) = bef6279d39a62ef55c430f232fce1c12
  • MD5 (sample_old.txt) = 68869f680f53b71f54886aa0437c9e59

That is proof that the file is changed due to the NAACCR mappings updated

@thomasyu888 thomasyu888 changed the title Add patch release [GEN-863] Add patch release Aug 15, 2024
@thomasyu888 thomasyu888 marked this pull request as ready for review August 15, 2024 16:47
@thomasyu888 thomasyu888 requested a review from a team as a code owner August 15, 2024 16:47
Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good logically!

modules/patch_release.nf Outdated Show resolved Hide resolved
scripts/patch_release/Dockerfile Show resolved Hide resolved
scripts/patch_release/patch.py Outdated Show resolved Hide resolved
scripts/patch_release/patch.py Outdated Show resolved Hide resolved
scripts/patch_release/patch.py Show resolved Hide resolved
Copy link

dpulls bot commented Aug 20, 2024

🎉 All dependencies have been resolved !

Copy link
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have a few comments

modules/create_dashboard_html.nf Show resolved Hide resolved
patch_release_main.nf Show resolved Hide resolved
scripts/patch_release/compare_patch.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to include a new module to call this compare_patch script?

Copy link
Contributor Author

@thomasyu888 thomasyu888 Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxu17 I just realized, this might not be super useful, because the compare_patch script is more of an integration test for testing given data in, ensure data out.

What we probably want as the comparison module is what you wrote for main GENIE and/or BPC.

That said, I still added the comparison code to run when it's a staging run, the only issue is that it only works with the default parameters of the nextflow workflow

Although that makes me wonder if I should add this at all - thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you mean by this? In regards to staging pipeline run

I still added the comparison code to run when it's a staging run, the only issue is that it only works with the default parameters of the nextflow workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compare patch code currently compares data against an existing public patch release. So for example, I recreated the 15.X public patch release here, and I compared it against the existing public patch release to see if I made any incorrect code changes.

However, when I create a new patch release, there's not exactly an existing patch release to compare it to. Therefore, this would act more like an integration tests against an old public release (assuming the logic doesn't change)

What we probably want in the future, is a comparison of sample count in vs sample count out.

scripts/patch_release/compare_patch.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compare patch code currently compares data against an existing public patch release. So for example, I recreated the 15.X public patch release here, and I compared it against the existing public patch release to see if I made any incorrect code changes.

However, when I create a new patch release, there's not exactly an existing patch release to compare it to. Therefore, this would act more like an integration tests against an old public release (assuming the logic doesn't change)

What we probably want in the future, is a comparison of sample count in vs sample count out.

Copy link

@thomasyu888 thomasyu888 merged commit bc01b10 into main Sep 13, 2024
3 checks passed
@thomasyu888 thomasyu888 deleted the add-patch-release branch September 13, 2024 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants