-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
troublet command error #212
Comments
Make sure the known genotype vcf is on the same reference as the fasta given (like hg19 vs grch38) and that chrom naming convention is the same (chr1 vs 1). This is most likely. Its finding near 0 overlap in variants. Otherwise paste all .err files here. |
Thanks for quick reply! Yes they are both "chr", ucsc hg38 reference. I've LiftoverVcf (picard) from hg19. Please see error files attached. Thanks for your help, greatly appreciated. |
hmm, those arent useful. The problem is that there are essentially no variants detected. I give the known_genotypes vcf to vartrix and it looks in those regions. the command is where final_vcf is a depth filtered (>4 and <100,000) version of known_genotypes vcf. This file should be called common_variants_covered.vcf (same name using common variants or known genotypes). See if it is there and how many non-header lines there are (cat filename | grep -v "#" | wc -l) |
if that vcf has lots of variants in it, you can try to rerun the vartrix command yourself. Possibly there was a problem in the liftover? Idk |
yeah, because even if it only finds ref alleles, vartrix has output for every location in the input vcf which is the depth filtered vcf (using depth from the bam, not from the dp field or something like that in the vcf) |
for the known genotypes loci your bam must not have much coverage or it has way too much coverage |
how big is your bam after remapping? and before remapping (your input bam)? when using common variants or known genotypes, remapping is much less needed and you can use --skip_remap TRUE |
Thank you for this great tool. I am trialing to determine whether we can pool cell-lines for 10X runs and deconvolute using variants. To test I have a xenograft snRNAseq run (10X v3) that is from a single-cell line which I would like to detect using the .vcf from WES of the cell-line. The human population of this xenograft is small and there are only 212 barcodes wihtin my filtered_feature_bc_matrix/barcodes.tsv
I have run (singularity) without specifying "--known_genotypes" or "--known_genotypes_sample_names" and the pipeline runs to completion.
However, when I specify "--known_genotypes" and "--known_genotypes_sample_names" (as below), the pipeline fails. The error message is from the troublets command but I believe something is happening at the vartix step.
There are 4203 variants within my common_variants_covered.vcf. My "--known_genotypes" vcf (WES) has 93219 variants, but the alt.mtx and ref.mtx look strange i.e. dont start from 1 and only have 89 lines.
Scratching my head how to troubleshoot. Any insights would be greatly appreciated.
`# Run script
soup_or_cell_sif="/home/zac/souporcell/souporcell_latest.sif"
num_clusters=4
num_threads_to_use=24
output_dir_name="/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4"
reference_fasta="/data/zac/asap_snrnaseq/reference/hg38.fa"
barcodes_tsv="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/filtered_feature_bc_matrix/barcodes.tsv"
bam_file="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/possorted_genome_bam.bam"
vcf="/data/zac/asap_snrnaseq/souporcell/merged_ppmi.feb.1.2015_RM35.vcf"
export SINGULARITY_BIND="/data/zac/"
mkdir -p $output_dir_name
cd $output_dir_name
singularity exec $soup_or_cell_sif souporcell_pipeline.py
-i $bam_file
-b $barcodes_tsv
-f $reference_fasta
-t $num_threads_to_use
-o $output_dir_name
-k $num_clusters
--known_genotypes $vcf
--known_genotypes_sample_names 7_21_RM35 PPMI_SI_3868`
`# Error Message
Traceback (most recent call last):
File "/opt/souporcell/souporcell_pipeline.py", line 596, in
doublets(args, ref_mtx, alt_mtx, cluster_file)
File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets
subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/alt.mtx', '--refs', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/ref.mtx', '--clusters', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/clusters_tmp.tsv']' returned non-zero exit status 101.`
`head alt.mtx
%%MatrixMarket matrix coordinate real general
% written by sprs
4203 212 86
316 5 0
316 10 0
316 17 0
316 24 0
316 33 0
316 36 0
316 43 0
head clusters_tmp.tsv
AAACCCAAGCTGGCCT-1 0 -0.6931472 -0.6931472
AACCAACTCTCTATGT-1 0 -0.6931472 -0.6931472
AACGGGAAGTGATCGG-1 0 -0.6931472 -0.6931472
AACGTCAAGCATTGTC-1 0 -0.6931472 -0.6931472
AAGCCATCAATACGCT-1 0 -0.6931472 -0.6931472
AATGAAGTCGTGGACC-1 0 -0.6931472 -0.6931472
AATTCCTGTTTGACAC-1 0 -0.6931472 -0.6931472
AATTTCCCACGTTGGC-1 0 -0.6931472 -0.6931472
AATTTCCCATTAAAGG-1 0 -0.6931472 -0.6931472
ACAACCACAGGCGATA-1 0 -0.6931472 -0.6931472
head clusters.tsv
barcode status assignment log_prob_singleton log_prob_doublet cluster0 cluster1`
The text was updated successfully, but these errors were encountered: