-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add RNASeqTools TSS tutorial #10
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start! In general, more explanations for the things that functions are doing (at least links to docstrings) would be useful, but we can iterate on that. To me, the biggest things to address are:
- External binaries - I'd like the tutorials to be as self-contained as possible, so using either BinaryBuilder or CondaPkg to manage the environment would be ideal. I can help with the later, the former is something I'm planning to learn up on this summer
- Using BioJulia packages for functionality where possible. I specifically noticed that the
sra-tools
thing could probably be replaced with BioServices.jl, but there may be others. And describing both is ideal! Since people that are familiar withsra-tools
might benefit from a direct comparison of functionality.
# * sra-tools 3.0.3 | ||
# * samtools 1.6 | ||
# * bwa-mem2 2.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're not already, we should get these set up on BinaryBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are external dependencies not managed by julia's package manager, it could be complicated to include.
One possibility is to use CondaPkg
, but getting the jll's would be better, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having all the binary dependencies packaged would be great! I checked and could find samtools and BWA (but not BWA-MEM2, which is faster) artifacts, but none for sra-tools. I also had a quick look into BioServices.jl and could see a way to search for sra accession numbers, but not to download the actual data.
I will read a bit into BinaryBuilder to see what it takes to add something. I can imagine sra-tools to be a nightmare to put into a package since it has so many individual tools. Maybe prefetch and fastq-dump could be in their own artifacts...
# Our data will be from V. Cholerae, we need a genome and its annotation: First go to [NCBI](https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_013085075.1/), | ||
# after clicking Download, select .fasta and .gff and click Download again. Then extract the content of the archive | ||
# directory /ncbi_dataset/data/GCF_013085075.1/ into the folder containing this notebook. The folder should then | ||
# look like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical, but we can do all of these things using julia's Cmd
functionality. I can take a stab at this a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. While playing with BioServices.jl, I found a way to download annotations and genomes with the EUtils module. Sadly, it does not support gff3, though :(
# download reads and convert them to the fastq format. The reads in sample SRR1602510 come from a TEX-treated | ||
# sample and can be used to identify primary transcription start sites. | ||
|
||
download_prefetch(["SRR1602510"], @__DIR__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this defined in your package? I think BioServices.jl
has some related functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function uses prefetch to download the sra internal format and converts this to a fastq file using fastq-dump. I had another look into BioServices.jl and how I understand it, the EUtils module can be used to search SRA and also retreive links to the files, but I could not find anything in there that would allow for a conversion to a readable format. Did I miss it?
Co-authored-by: Kevin Bonham <[email protected]>
Co-authored-by: Kevin Bonham <[email protected]>
Co-authored-by: Kevin Bonham <[email protected]>
Thanks a lot already for the review. I will look into packaging the binary dependencies and fix the comments style consistently in the next days :) |
Hi @kescobo Sorry for the long delay, I couldn't find the time to work on this. Today I managed to make an artifact for one of my dependencies, bwa-mem2: https://github.com/maltesie/bwamem2_jll.jl I'm working on setting up binary packages for sra-tools and fastp (nearly through) as well. Will let you know when I'm done :) I didn't adjust any code in RNASeqTools yet to actually use the artifacts, though. I hope I can do this over the weekend. |
bwamem2_jll is now a registered binary package and I adjusted RNASeqTools to use it. But I cant get sra-tools to compile with BinaryBuilder. I will check how to use BioServices more for downloading, and otherwise check if I can use CondaPkg for that as you suggested. I hope I can update the notebook during the next week. |
Rad! CC @M-PERSIC - might be an opportunity to team up here |
Hi @maltesie! |
RNASeqTools TSS tutorial notebook
Types of changes
This PR implements the following changes:
📋 Additional detail
The existing RNASeq tutorial is outdated and the new one showcases parts of the RNASeqTools package. The notebook should run by itself.
The tutorial shows how to
☑️ Checklist
docs/src/
.[UNRELEASED]
section of the manually curatedCHANGELOG.md
file for this repository.