-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for hdfs and gcs URI's to be passed to GenomicsDB #5197
Conversation
@nalinigans There are some test failures in the GenomicsDB cloud tests. See: https://storage.googleapis.com/hellbender-test-logs/build_reports/master_22153.1/tests/test/index.html We'll have to fix these before we can merge. |
@droazen, will put some debug print statements in the two tests that are failing while authenticating with GCS and issue another pull request to nalinigans_genomicsdb_uri_support branch. Hope that is OK. Thanks. |
@droazen, we have a free GCS account, so it is possible that Hadoop requires extra configuration for authenticating/connecting with the HELLBENDER travis service account. Can anyone help here? This the code we have for connecting to GCS via Hadoop.
This is the error from Travis logs-
|
* Debug Cloud Tests
Codecov Report
@@ Coverage Diff @@
## master #5197 +/- ##
===============================================
- Coverage 86.785% 86.497% -0.288%
- Complexity 30025 30053 +28
===============================================
Files 1838 1841 +3
Lines 139112 139748 +636
Branches 15340 15476 +136
===============================================
+ Hits 120729 120878 +149
- Misses 12806 13279 +473
- Partials 5577 5591 +14
|
@droazen , I got a genomicsdb.jar from @kgururaj and just tried out the GenomicsDB cloud tests. The call stack that I got from my test run in nalini_new_genomicsdb_jar branch mentions that we do need the fs.gs.project.id hadoop configuration set. The google service json I use for our internal testing has this key, but the Hellbender service json does not. Any ideas on how to get this key for the tests? Would this value be HELLBENDER_TEST_PROJECT? How is it being made available to the spark cloud tests for example? I do see it being configured in src/main/java/org/broadinstitute/hellbender/engine/spark/SparkContextFactory.java.
|
…tute/hellbender/engine/FeatureDataSource.java, src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBUtils.java and src/test/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImportIntegrationTest.java
PR is ready to be merged. The 2 GCS tests in GenomicsDBImportIntegrationTest.java are commented out, but they have been tested with the HELLBENDER test project and GOOGLE_APPLICATION_CREDENTIALS in the nalini_new_genomicsdb_jar branch and will be uncommented as soon as a new GenomicsDB jar is released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting! This should simplify our workflows and hopefully speed things up a bit too.
Two minor comments.
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImport.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/io/IOUtils.java
Outdated
Show resolved
Hide resolved
@lbergelson is going to try regenerating our service account key and see if that allows us to uncomment the GCS tests in this branch. |
…nalinigans_genomicsdb_uri_support
@lbergelson, thanks for regenerating the service account key. All tests are enabled and pass now. |
src/main/java/org/broadinstitute/hellbender/engine/FeatureDataSource.java
Outdated
Show resolved
Hide resolved
...est/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImportIntegrationTest.java
Outdated
Show resolved
Hide resolved
...est/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImportIntegrationTest.java
Outdated
Show resolved
Hide resolved
...est/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImportIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- I'll merge once Travis finishes running on the final version.
@nalinigans Tests passed, but it's worth pointing out that the new |
…te#5197) * Allow for hdfs and gcs URI's to be passed to GenomicsDB * Push the URI processing for GenomicsDB to IOUtils
This PR is a finalized version of #5017. I've copied the branch into our repo so that travis will run cloud tests on it.
Currently, only Posix filesystem paths can be passed as workspaces and arrays to GenomicsDB via GenomicsDBImport and SelectVariants. This PR will allow for hdfs and gcs (and emrfs/s3) URIs to be supported as well.
Examples
GenomicsDB supports GCS via the Cloud Storage Connector. Set environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the GCS Service Account json file.