Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add RNASeqTools TSS tutorial #10

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

maltesie
Copy link

@maltesie maltesie commented Apr 18, 2023

RNASeqTools TSS tutorial notebook

Types of changes

This PR implements the following changes:

  • ✨ New feature (A non-breaking change which adds functionality).
  • 🐛 Bug fix (A non-breaking change, which fixes an issue).
  • 💥 Breaking change (fix or feature that would cause existing functionality to change).

📋 Additional detail

The existing RNASeq tutorial is outdated and the new one showcases parts of the RNASeqTools package. The notebook should run by itself.

The tutorial shows how to

  • download reads from SRA
  • align reads using bwa-mem
  • read .bam files and .gff files
  • annotate alignments with features from .gff files
  • compute coverage from alignments
  • compute maxima in the difference along the coverage as putative TSS

☑️ Checklist

  • 🎨 The changes implemented is consistent with the julia style guide.
  • 📘 I have updated and added relevant docstrings, in a manner consistent with the documentation styleguide.
  • 📘 I have added or updated relevant user and developer manuals/documentation in docs/src/.
  • 🆗 There are unit tests that cover the code changes I have made.
  • 🆗 The unit tests cover my code changes AND they pass.
  • 📝 I have added an entry to the [UNRELEASED] section of the manually curated CHANGELOG.md file for this repository.
  • 🆗 All changes should be compatible with the latest stable version of Julia.
  • 💭 I have commented liberally for any complex pieces of internal code.

@kescobo kescobo linked an issue Apr 19, 2023 that may be closed by this pull request
@kescobo kescobo added enhancement Updating / compat Updating an old tutorial to current versions of packages labels Apr 19, 2023
Copy link
Member

@kescobo kescobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start! In general, more explanations for the things that functions are doing (at least links to docstrings) would be useful, but we can iterate on that. To me, the biggest things to address are:

  1. External binaries - I'd like the tutorials to be as self-contained as possible, so using either BinaryBuilder or CondaPkg to manage the environment would be ideal. I can help with the later, the former is something I'm planning to learn up on this summer
  2. Using BioJulia packages for functionality where possible. I specifically noticed that the sra-tools thing could probably be replaced with BioServices.jl, but there may be others. And describing both is ideal! Since people that are familiar with sra-tools might benefit from a direct comparison of functionality.

Comment on lines +7 to +9
# * sra-tools 3.0.3
# * samtools 1.6
# * bwa-mem2 2.2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're not already, we should get these set up on BinaryBuilder

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are external dependencies not managed by julia's package manager, it could be complicated to include.

One possibility is to use CondaPkg, but getting the jll's would be better, I think

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having all the binary dependencies packaged would be great! I checked and could find samtools and BWA (but not BWA-MEM2, which is faster) artifacts, but none for sra-tools. I also had a quick look into BioServices.jl and could see a way to search for sra accession numbers, but not to download the actual data.

I will read a bit into BinaryBuilder to see what it takes to add something. I can imagine sra-tools to be a nightmare to put into a package since it has so many individual tools. Maybe prefetch and fastq-dump could be in their own artifacts...

RNASeqTools_TSS/rnaseqtools_tss.jl Outdated Show resolved Hide resolved
Comment on lines +23 to +26
# Our data will be from V. Cholerae, we need a genome and its annotation: First go to [NCBI](https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_013085075.1/),
# after clicking Download, select .fasta and .gff and click Download again. Then extract the content of the archive
# directory /ncbi_dataset/data/GCF_013085075.1/ into the folder containing this notebook. The folder should then
# look like:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical, but we can do all of these things using julia's Cmd functionality. I can take a stab at this a bit later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. While playing with BioServices.jl, I found a way to download annotations and genomes with the EUtils module. Sadly, it does not support gff3, though :(

# download reads and convert them to the fastq format. The reads in sample SRR1602510 come from a TEX-treated
# sample and can be used to identify primary transcription start sites.

download_prefetch(["SRR1602510"], @__DIR__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this defined in your package? I think BioServices.jl has some related functionality.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function uses prefetch to download the sra internal format and converts this to a fastq file using fastq-dump. I had another look into BioServices.jl and how I understand it, the EUtils module can be used to search SRA and also retreive links to the files, but I could not find anything in there that would allow for a conversion to a readable format. Did I miss it?

RNASeqTools_TSS/rnaseqtools_tss.jl Outdated Show resolved Hide resolved
RNASeqTools_TSS/rnaseqtools_tss.jl Outdated Show resolved Hide resolved
maltesie and others added 3 commits April 24, 2023 21:34
Co-authored-by: Kevin Bonham <[email protected]>
Co-authored-by: Kevin Bonham <[email protected]>
Co-authored-by: Kevin Bonham <[email protected]>
@maltesie
Copy link
Author

Thanks a lot already for the review. I will look into packaging the binary dependencies and fix the comments style consistently in the next days :)

@maltesie
Copy link
Author

Hi @kescobo

Sorry for the long delay, I couldn't find the time to work on this. Today I managed to make an artifact for one of my dependencies, bwa-mem2: https://github.com/maltesie/bwamem2_jll.jl

I'm working on setting up binary packages for sra-tools and fastp (nearly through) as well. Will let you know when I'm done :)

I didn't adjust any code in RNASeqTools yet to actually use the artifacts, though. I hope I can do this over the weekend.

@maltesie
Copy link
Author

bwamem2_jll is now a registered binary package and I adjusted RNASeqTools to use it. But I cant get sra-tools to compile with BinaryBuilder. I will check how to use BioServices more for downloading, and otherwise check if I can use CondaPkg for that as you suggested. I hope I can update the notebook during the next week.

@kescobo
Copy link
Member

kescobo commented May 22, 2023

Rad! CC @M-PERSIC - might be an opportunity to team up here

@M-PERSIC
Copy link
Member

M-PERSIC commented Jul 11, 2023

Hi @maltesie!
Yesterday I announced the BioJuliaDocs initiative, which is a project for creating a centralized landing page for BioJulia with tutorials and code snippets. This might deprecate the need for a separate BioTutorials package as we would include any relevant examples into the site.
I'd love to collaborate with you on updating some examples to be included. You can reach out to me on the BioJuliaDocs repo linked above or we can discuss on Slack at the #biology channel with other community members or even privately at my email address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Updating / compat Updating an old tutorial to current versions of packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update RNAseq tutorial
3 participants