-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualize bulkATACseq? #1334
Comments
@ngehlenborg @mccalluc @ilan-gold - Stanford TMC has approved a list of eight bulk atac-seq processed datasets for release that I found in the system from a while back. After investigating, @khanshawPSC & @jswelling identified this open item as relating to those datasets. May we publish these datasets or is there a reason to refrain from a visualization perspective? |
(For reference, we are talking about these datasets: https://portal.hubmapconsortium.org/search?mapped_data_types[0]=Bulk%20ATAC-seq%20%5BBWA%20%2B%20MACS2%5D&group_name[0]=Stanford%20TMC&entity_type[0]=Dataset) We could visualize these datasets in Vitessce as is (i.e., no additional processing needed), so that can be added later (see hubmapconsortium/portal-visualization#14). I noticed, however, that output directories are not properly annotated, e.g., QC report files (here: FASTQC HTML reports and ZIP files) are not marked as such (i.e., the "Show QA Files Only" button does not work) and the output file formats are not annotated either (hovering on "?" icon results in mostly empty tooltip): Most importantly, it is not possible to figure out which genome build was used for the mapping, i.e., the data can't be interpreted. |
Thank you @ngehlenborg - I will redirect to Stanford TMC, @khanshawPSC , and @mruffalo re: the directory and output file format annotation problems. |
@khanshawPSC & @mruffalo - what are the results of looking at the problem and defining next steps toward moving these datasets to publication? |
Visualization support isn't a blocker for publication -- but it may be worth delaying publication so the pipeline can be modified and re-run to write the additional metadata described by @ngehlenborg and @ilan-gold in hubmapconsortium/portal-visualization#14. There isn't yet any consensus about the file format and content for this additional metadata, but we could add an additional pipeline output file quite easily once the contents are finalized. Alternatively, these datasets can be published as-is (now), then re-run in the future once we add the additional metadata for visualization support, assuming API and UI support for dataset versioning. |
Closing... but please reopen, and clarify the scope, if I have misunderstood. |
currently QA on PROD
The text was updated successfully, but these errors were encountered: