troublet command error #212

zchatt · 2023-11-10T05:49:56Z

Thank you for this great tool. I am trialing to determine whether we can pool cell-lines for 10X runs and deconvolute using variants. To test I have a xenograft snRNAseq run (10X v3) that is from a single-cell line which I would like to detect using the .vcf from WES of the cell-line. The human population of this xenograft is small and there are only 212 barcodes wihtin my filtered_feature_bc_matrix/barcodes.tsv

I have run (singularity) without specifying "--known_genotypes" or "--known_genotypes_sample_names" and the pipeline runs to completion.

However, when I specify "--known_genotypes" and "--known_genotypes_sample_names" (as below), the pipeline fails. The error message is from the troublets command but I believe something is happening at the vartix step.

There are 4203 variants within my common_variants_covered.vcf. My "--known_genotypes" vcf (WES) has 93219 variants, but the alt.mtx and ref.mtx look strange i.e. dont start from 1 and only have 89 lines.

Scratching my head how to troubleshoot. Any insights would be greatly appreciated.

`# Run script

soup_or_cell_sif="/home/zac/souporcell/souporcell_latest.sif"
num_clusters=4
num_threads_to_use=24
output_dir_name="/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4"
reference_fasta="/data/zac/asap_snrnaseq/reference/hg38.fa"
barcodes_tsv="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/filtered_feature_bc_matrix/barcodes.tsv"
bam_file="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/possorted_genome_bam.bam"
vcf="/data/zac/asap_snrnaseq/souporcell/merged_ppmi.feb.1.2015_RM35.vcf"
export SINGULARITY_BIND="/data/zac/"

mkdir -p $output_dir_name
cd $output_dir_name

singularity exec $soup_or_cell_sif souporcell_pipeline.py
-i $bam_file
-b $barcodes_tsv
-f $reference_fasta
-t $num_threads_to_use
-o $output_dir_name
-k $num_clusters
--known_genotypes $vcf
--known_genotypes_sample_names 7_21_RM35 PPMI_SI_3868`

`# Error Message

Traceback (most recent call last):
File "/opt/souporcell/souporcell_pipeline.py", line 596, in
doublets(args, ref_mtx, alt_mtx, cluster_file)
File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets
subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/alt.mtx', '--refs', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/ref.mtx', '--clusters', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/clusters_tmp.tsv']' returned non-zero exit status 101.`

`head alt.mtx

%%MatrixMarket matrix coordinate real general
% written by sprs
4203 212 86
316 5 0
316 10 0
316 17 0
316 24 0
316 33 0
316 36 0
316 43 0

head clusters_tmp.tsv

AAACCCAAGCTGGCCT-1 0 -0.6931472 -0.6931472
AACCAACTCTCTATGT-1 0 -0.6931472 -0.6931472
AACGGGAAGTGATCGG-1 0 -0.6931472 -0.6931472
AACGTCAAGCATTGTC-1 0 -0.6931472 -0.6931472
AAGCCATCAATACGCT-1 0 -0.6931472 -0.6931472
AATGAAGTCGTGGACC-1 0 -0.6931472 -0.6931472
AATTCCTGTTTGACAC-1 0 -0.6931472 -0.6931472
AATTTCCCACGTTGGC-1 0 -0.6931472 -0.6931472
AATTTCCCATTAAAGG-1 0 -0.6931472 -0.6931472
ACAACCACAGGCGATA-1 0 -0.6931472 -0.6931472

head clusters.tsv

barcode status assignment log_prob_singleton log_prob_doublet cluster0 cluster1`

wheaton5 · 2023-11-10T10:34:29Z

Make sure the known genotype vcf is on the same reference as the fasta given (like hg19 vs grch38) and that chrom naming convention is the same (chr1 vs 1). This is most likely. Its finding near 0 overlap in variants. Otherwise paste all .err files here.

zchatt · 2023-11-10T21:20:30Z

Thanks for quick reply! Yes they are both "chr", ucsc hg38 reference. I've LiftoverVcf (picard) from hg19.

Please see error files attached. Thanks for your help, greatly appreciated.

err_files.zip

wheaton5 · 2023-11-10T22:01:17Z

hmm, those arent useful. The problem is that there are essentially no variants detected. I give the known_genotypes vcf to vartrix and it looks in those regions.

the command is
cmd = ["vartrix", "--mapq", "30", "-b", final_bam, "-c", barcodes, "--scoring-method", "coverage", "--threads", str(args.threads),
"--ref-matrix", ref_mtx, "--out-matrix", alt_mtx, "-v", final_vcf, "--fasta", args.fasta]

where final_vcf is a depth filtered (>4 and <100,000) version of known_genotypes vcf. This file should be called common_variants_covered.vcf (same name using common variants or known genotypes). See if it is there and how many non-header lines there are (cat filename | grep -v "#" | wc -l)

wheaton5 · 2023-11-10T22:02:14Z

if that vcf has lots of variants in it, you can try to rerun the vartrix command yourself. Possibly there was a problem in the liftover? Idk

wheaton5 · 2023-11-10T22:03:37Z

yeah, because even if it only finds ref alleles, vartrix has output for every location in the input vcf which is the depth filtered vcf (using depth from the bam, not from the dp field or something like that in the vcf)

wheaton5 · 2023-11-10T22:04:28Z

for the known genotypes loci your bam must not have much coverage or it has way too much coverage

wheaton5 · 2023-11-10T22:05:23Z

how big is your bam after remapping? and before remapping (your input bam)? when using common variants or known genotypes, remapping is much less needed and you can use --skip_remap TRUE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

troublet command error #212

troublet command error #212

zchatt commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

zchatt commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

troublet command error #212

troublet command error #212

Comments

zchatt commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

zchatt commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023

wheaton5 commented Nov 10, 2023