-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New NCBI to AnnData tutorial #4480
Conversation
@hexhowells awesome, thanks a lot! Could you put a copy of the input data on Zenodo? We have automations which automatically populate a Galaxy Data Library with tutorial data from Zenodo. This way learners who are dealing with bad internet connections or speeds can import the data directly from the data library, circumventing their own networks. |
I've made a Zenodo record and linked to it/added it to the tutorial, It requires review before being published but should be good after that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just had some small word changes.
Question: do the users really need to manually download the data first, or is that part an optional step in the tutorial?
Downloading the data manually is an optional step however being able to find and download the relevant data is part of the tutorial for those who may not be familiar with the process. |
@hexhowells agreed, still important to show the standard process. The Zenodo copy can be the fallback option. You should also be able to import the tar file from NCBI/GEO directly into Galaxy via URL, then use Galaxy's "Unzip" tool to unpack it into a collection with all the individual files. |
@shiltemann I was initially going to use the Unzip tool in Galaxy but was getting errors when trying to unzip the .GZ files, I'm not entirely sure what is causing the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very useful tutorial! Thanks @hexhowells !!
GSM5353214_PA_AUG_PB_1A_S1.dge.txt | ||
GSM5353215_PA_AUG_PB_1B_S2.dge.txt | ||
GSM5353216_PA_PB1A_Pool_1_3_S50_L002_dge.txt | ||
GSM5353217_PA_PB1A_Pool_2_S107_L004_dge.txt | ||
GSM5353218_PA_PB1B_Pool_1_2_S74_L003_dge.txt | ||
GSM5353219_PA_PB1B_Pool_2_S24_L001_dge.txt | ||
GSM5353220_PA_PB1B_Pool_3_S51_L002_dge.txt | ||
GSM5353221_PA_PB2A_Pool_1_3_S25_L001_dge.txt | ||
GSM5353222_PA_PB2B_Pool_1_3_S52_L002_dge.txt | ||
GSM5353223_PA_PB2B_Pool_2_S26_L001_dge.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the manual download. Then please move this step under Obtaining the Data section. So the order is as follows. Get the data into Galaxy (either by manual download or from Zenodo), then Look at the metadata, and finally add tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added later as there are 53 raw files but only 10 needed for the tutorial, the process of figuring out which files are needed is done in the "Understanding the Data" section, I'll see if I can reorganise it for it to better make sense.
Co-authored-by: Pavankumar Videm <[email protected]>
Co-authored-by: Pavankumar Videm <[email protected]>
Co-authored-by: Pavankumar Videm <[email protected]>
Co-authored-by: Pavankumar Videm <[email protected]>
> - *"Find pattern"*: `batch` | ||
> - *"Replace with"*: `replicate` | ||
> | ||
> 2. {% tool [Cut](Cut1) %} with the following parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you could speed this section up by removing the 'Cut' option and using the Multi-Join
(combine multiple files)
(Galaxy Version 1.1.1) instead, using the initial barcode column as the key, and THEN cutting once to remove it
|
||
Lets now add the replicate column which tells us which rows are part of pools of the same patient and tumor location. | ||
|
||
> <hands-on-title>Create replicate metadata</hands-on-title> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few screenshots of where you get this data from the spreadsheet would be good, and if you needed lines from the paper or supplemental data/figures/text to help you decode what the hell was happening, those screenshots would be good. It's walking the user through how you were able to figure this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tutorial, it just gets a bit long with the commands I think and maybe reward the user with some kind of image, or info box, or something to help them feel satisfaction that the large parameter tool they just executed did something big.
> | ||
{: .hands_on} | ||
|
||
We will now add a column to indicate which sample each row came from using the sample ID's described earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a big command should be rewarded with some kind of image or screenshot or something to engage the user again.
> 3. **Rename** {% icon galaxy-pencil %} output `Specimen Metadata` | ||
> | ||
{: .hands_on} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
Co-authored-by: mtekman <[email protected]>
Co-authored-by: mtekman <[email protected]>
Co-authored-by: mtekman <[email protected]>
Workflow and tests are missing. @hexhowells do you have a workflow ready? |
@pavanvidem Yes the tutorial was made from this workflow I built: https://usegalaxy.eu/u/hexhowells/h/ncbi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice addition!
@hexhowells I guess, moving tip snippet out of Tag your datasets tip should fix the linting error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that hopefully fixes the linting error
Co-authored-by: Pavankumar Videm <[email protected]>
Co-authored-by: Pavankumar Videm <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the links to the first 2 files, tested the tutorial, and added links to an example history and the workflow. Ready to merge! Thanks @hexhowells for this great tutorial!
🎉 !!! |
New tutorial that takes raw NCBI data and processes it into an AnnData object with metadata annotations. Requires some proof reading and minor updates still!