Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

troublet command error #212

Open
zchatt opened this issue Nov 10, 2023 · 7 comments
Open

troublet command error #212

zchatt opened this issue Nov 10, 2023 · 7 comments

Comments

@zchatt
Copy link

zchatt commented Nov 10, 2023

Thank you for this great tool. I am trialing to determine whether we can pool cell-lines for 10X runs and deconvolute using variants. To test I have a xenograft snRNAseq run (10X v3) that is from a single-cell line which I would like to detect using the .vcf from WES of the cell-line. The human population of this xenograft is small and there are only 212 barcodes wihtin my filtered_feature_bc_matrix/barcodes.tsv

I have run (singularity) without specifying "--known_genotypes" or "--known_genotypes_sample_names" and the pipeline runs to completion.

However, when I specify "--known_genotypes" and "--known_genotypes_sample_names" (as below), the pipeline fails. The error message is from the troublets command but I believe something is happening at the vartix step.

There are 4203 variants within my common_variants_covered.vcf. My "--known_genotypes" vcf (WES) has 93219 variants, but the alt.mtx and ref.mtx look strange i.e. dont start from 1 and only have 89 lines.

Scratching my head how to troubleshoot. Any insights would be greatly appreciated.

`# Run script

soup_or_cell_sif="/home/zac/souporcell/souporcell_latest.sif"
num_clusters=4
num_threads_to_use=24
output_dir_name="/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4"
reference_fasta="/data/zac/asap_snrnaseq/reference/hg38.fa"
barcodes_tsv="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/filtered_feature_bc_matrix/barcodes.tsv"
bam_file="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/possorted_genome_bam.bam"
vcf="/data/zac/asap_snrnaseq/souporcell/merged_ppmi.feb.1.2015_RM35.vcf"
export SINGULARITY_BIND="/data/zac/"

mkdir -p $output_dir_name
cd $output_dir_name

singularity exec $soup_or_cell_sif souporcell_pipeline.py
-i $bam_file
-b $barcodes_tsv
-f $reference_fasta
-t $num_threads_to_use
-o $output_dir_name
-k $num_clusters
--known_genotypes $vcf
--known_genotypes_sample_names 7_21_RM35 PPMI_SI_3868`

`# Error Message

Traceback (most recent call last):
File "/opt/souporcell/souporcell_pipeline.py", line 596, in
doublets(args, ref_mtx, alt_mtx, cluster_file)
File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets
subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/alt.mtx', '--refs', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/ref.mtx', '--clusters', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/clusters_tmp.tsv']' returned non-zero exit status 101.`

`head alt.mtx

%%MatrixMarket matrix coordinate real general
% written by sprs
4203 212 86
316 5 0
316 10 0
316 17 0
316 24 0
316 33 0
316 36 0
316 43 0

head clusters_tmp.tsv

AAACCCAAGCTGGCCT-1 0 -0.6931472 -0.6931472
AACCAACTCTCTATGT-1 0 -0.6931472 -0.6931472
AACGGGAAGTGATCGG-1 0 -0.6931472 -0.6931472
AACGTCAAGCATTGTC-1 0 -0.6931472 -0.6931472
AAGCCATCAATACGCT-1 0 -0.6931472 -0.6931472
AATGAAGTCGTGGACC-1 0 -0.6931472 -0.6931472
AATTCCTGTTTGACAC-1 0 -0.6931472 -0.6931472
AATTTCCCACGTTGGC-1 0 -0.6931472 -0.6931472
AATTTCCCATTAAAGG-1 0 -0.6931472 -0.6931472
ACAACCACAGGCGATA-1 0 -0.6931472 -0.6931472

head clusters.tsv

barcode status assignment log_prob_singleton log_prob_doublet cluster0 cluster1`

@wheaton5
Copy link
Owner

Make sure the known genotype vcf is on the same reference as the fasta given (like hg19 vs grch38) and that chrom naming convention is the same (chr1 vs 1). This is most likely. Its finding near 0 overlap in variants. Otherwise paste all .err files here.

@zchatt
Copy link
Author

zchatt commented Nov 10, 2023

Thanks for quick reply! Yes they are both "chr", ucsc hg38 reference. I've LiftoverVcf (picard) from hg19.

Please see error files attached. Thanks for your help, greatly appreciated.

err_files.zip

@wheaton5
Copy link
Owner

hmm, those arent useful. The problem is that there are essentially no variants detected. I give the known_genotypes vcf to vartrix and it looks in those regions.

the command is
cmd = ["vartrix", "--mapq", "30", "-b", final_bam, "-c", barcodes, "--scoring-method", "coverage", "--threads", str(args.threads),
"--ref-matrix", ref_mtx, "--out-matrix", alt_mtx, "-v", final_vcf, "--fasta", args.fasta]

where final_vcf is a depth filtered (>4 and <100,000) version of known_genotypes vcf. This file should be called common_variants_covered.vcf (same name using common variants or known genotypes). See if it is there and how many non-header lines there are (cat filename | grep -v "#" | wc -l)

@wheaton5
Copy link
Owner

if that vcf has lots of variants in it, you can try to rerun the vartrix command yourself. Possibly there was a problem in the liftover? Idk

@wheaton5
Copy link
Owner

yeah, because even if it only finds ref alleles, vartrix has output for every location in the input vcf which is the depth filtered vcf (using depth from the bam, not from the dp field or something like that in the vcf)

@wheaton5
Copy link
Owner

for the known genotypes loci your bam must not have much coverage or it has way too much coverage

@wheaton5
Copy link
Owner

how big is your bam after remapping? and before remapping (your input bam)? when using common variants or known genotypes, remapping is much less needed and you can use --skip_remap TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants