Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to STAR-mapped RNA-seq bam as input #12

Open
HUNNNGRY opened this issue Jan 5, 2024 · 2 comments
Open

Add support to STAR-mapped RNA-seq bam as input #12

HUNNNGRY opened this issue Jan 5, 2024 · 2 comments

Comments

@HUNNNGRY
Copy link

HUNNNGRY commented Jan 5, 2024

Hi, I know this useful tool was mainly developed for DNA elements. I'm wondering whether its possiable to add the support for splited reads like STAR-mapped RNA-seq bam as input, say I want to explore the Vplot of these RNA reads around genomic RBP-binding sites. As far as I could imagine, one key lies in locating the real reads-center (center-of-transcript) of intron-spanning reads instead of just take means of genomic start and end coordinate (center-of-genomeSpan), i.e, introns should be considered when calculating such center-of-transcript.

I think the code below:
https://github.com/js2264/VplotR/blob/6abac9439399b328ad9ed1a417a78188b4a1d068/R/vmat_utils.R#L66C17-L66C49

could be replaced by another function that support locating bed12 mid position:
getBed12MidPosGr <- function(gr){
#Extract relevant information from bed12 GrangeObj
block_sizes <- as.numeric(strsplit(do.call(c,lapply(X = gr$blocks, FUN = function(x) paste(width(x),collapse=","))), ",")[[1]])
block_sizes_cumsum <- cumsum(block_sizes)
block_sizes_cumsum <- c(0,block_sizes_cumsum)
block_starts <- as.numeric(strsplit(do.call(c,lapply(X = gr$blocks, FUN = function(x) paste(start(x)-1,collapse=","))), ",")[[1]])

exon_starts <- as.numeric(gr@ranges@start-1) + as.numeric(block_starts)

halfSize <- round(sum(block_sizes)/2)
midIdx <- which(block_sizes_cumsum>halfSize)[1]-1 # max(,1)
midPos <- halfSize - block_sizes_cumsum[midIdx] + exon_starts[midIdx]
return(round(midPos))
}

@js2264
Copy link
Owner

js2264 commented Jan 8, 2024

Hi @HUNNNGRY, thanks for your issue. I feel like what you refer to is importing paired-end bam files as fragments, which is something you can achieve e.g. with importPEBamFiles(). but I don't have much experience with splitted reads bam format so maybe I'm missing smthg. Could you please provide a sample dataset (bed12?) so I see how it is structured?

@HUNNNGRY
Copy link
Author

Sorry for my delayed response.
Yes, You could consider as a optimization for Single-End bam with splited reads or bed12 as input.
Here is link of my example bam and bed12 (google drive)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants