Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"negative length vectors are not allowed" ? #41

Closed
ireneg opened this issue Sep 25, 2016 · 2 comments
Closed

"negative length vectors are not allowed" ? #41

ireneg opened this issue Sep 25, 2016 · 2 comments

Comments

@ireneg
Copy link

ireneg commented Sep 25, 2016

Hi guys,

I'm trying to simulate sequence under a bunch of different conditions. The vast majority of them work, but I've been hitting an uninformative error message with others:

extraVars <- commandArgs(trailingOnly=T)
#extraVars <- c("ortho_FALSE", 0.93, 50) # for testing only

library(polyester)
library(Biostrings)

setwd("~/orthoexon/simNGS/hg38_macFas5")

# FASTA annotation
hsFastaPath <- paste0(getwd(), "/fasta/hg38_ensembl84_orthoexon_", extraVars[2], "pc_", extraVars[1], ".fa")
hsFasta <- readDNAStringSet(hsFastaPath)

# ~20x coverage ----> reads per transcript = transcriptlength/readlength * 20
# here all transcripts will have ~equal FPKM
hsDepth <- round(20 * width(hsFasta) / 100)

# Simulate humans:
dir.create(file.path(paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38")), showWarnings = F, recursive=T)
    simulate_experiment(hsFastaPath, reads_per_transcript=hsDepth, num_reps=10, fold_changes=matrix(1, nrow=length(hsFasta)), outdir=paste0(getwd(), "/polyester/", extraVars[1], "_", extraVars[2], "_", extraVars[3], "bp/hg38"), readlen=as.numeric(extraVars[3]), paired=F, seed=round(11051984*as.numeric(extraVars[2])))

Often times, simulate_experiment fails fairly quickly with this error message:

Error in .Call(.NAME, ..., PACKAGE = PACKAGE) : 
  negative length vectors are not allowed

I launch many jobs with the same script, for multiple values of extraVars[2], and only some are repeatedly and predictably failing; extraVars[3] is always either 50 or 100. I realise the problem is probably coming from my fasta input, but the files I use differ by thousands of lines and checking them manually is not really going to work, so any intuition you might have as to what is causing the error would really help me clean them up - thank you!!

(And, not to be captain obvious here, but here are the lengths for one of the failed runs:

summary(width(hsFasta))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     30     293     793    1486    2090   98920 

summary(hsDepth)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    6.0    59.0   159.0   297.1   418.0 19780.0 

there are no obvious negative lengths that I can find.)

Oh, also - for some reason this command yields 20 replicates instead of 10 per run of simulate_experiment. Any idea why? Not that I don't appreciate having twice as many samples as I need, I just couldn't find the answer in the manual.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.40.2   XVector_0.12.1      IRanges_2.6.1      
[4] S4Vectors_0.10.3    BiocGenerics_0.18.0 polyester_1.8.3    

loaded via a namespace (and not attached):
[1] zlibbioc_1.18.0
@alyssafrazee
Copy link
Owner

The negative length vectors not allowed error message means your R session has run out of memory. You can either try to run the simulation on a machine with more memory, or you could break up the input files into smaller input files and do partial simulations; then concatenate the output.

As for the replicates question, it looks like you're using the current release version (1.8.3), which per the docs on num_reps is the number of replicates in each group -- so if it's a single integer, you're simulating 2 groups of 10 replicates each (20 replicates).

That said, the behavior should be that the length of the vector determines the number of groups, as of a refactor I made in Dec 2014 :/ I will check on why the release version is so outdated. In the meantime, you can install the github version using the install_github function from devtools: install_github('alyssafrazee/polyester').

@ireneg
Copy link
Author

ireneg commented Sep 26, 2016

Thank you so much on both counts! I hadn't even thought about memory but our cluster is very weird in how it assigns it, so I might not be getting nearly as much as I thought, which would explain things, especially since the files that work are subsets of the ones that fail.

And yes, I had looked at your example code here on github and it seemed that you specified two groups explicitly, so that also didn't occur to me. I was just wondering where my extra factor of 2 was coming from!

Thanks again! Back to running all my other simulations now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants