Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All CBs were filtered #114

Open
bettycatherine opened this issue Aug 18, 2020 · 6 comments
Open

All CBs were filtered #114

bettycatherine opened this issue Aug 18, 2020 · 6 comments

Comments

@bettycatherine
Copy link

Hi,
I am using bayesian filteration of UMI, with -s in Droptag and -r in Dropest, everything seemed to be fine at the beginning, however, there was no CB recognized in Dropest and the error messages were as follows:

97052 CBs with more than 20 genes
top CBs:
5836 GCCACATAGTGTTCTAGTCTGTCAGATCAG
5649 GCTAACGAAACTCACCATTGGCTCGATCAG
4813 ACAAGCTAAACTCACCGTGTTCTAACTTGA
4377 GCCACATATGGCTTCAGTGTTCTACCGTCC
4352 GACTAGTACTGTAGCCGAACAGGCCTTGTA
4091 CCTAATCCAAGACGGATGGTGGTAACAGTG
3839 TCCGTCTATTCACGCATGGTGGTACTTGTA
3747 CCTAATCCTGGCTTCAGAGTTAGCTGACCA
3694 CAACCACAAATCCGTCGAGTTAGCTGACCA

Start merge: 20:47:07.
Merge initialized: 20:47:52.
Total 10000 tags processed
Total 20000 tags processed
Total 30000 tags processed
Total 40000 tags processed
Total 50000 tags processed
Total 60000 tags processed
Total 70000 tags processed
Total 80000 tags processed
Total 90000 tags processed
Total 0 cells merged
Total 97052 cells excluded
Merge finished: 20:49:17.
Merge UMIs with N's: 20:49:17.
0 cells processed. Merged 0 UMIs from 0 cells.
UMI merge finished: 20:49:17.
0 cells are considered as real.

0 CBs with more than 20 genes, which have UMIs of the requested type.
no valid CBs found

Done: 20:49:36.
WARNING: filtered cells are empty. Probably, filtration threshold is too strict or you forgot to run 'merge_and_filter'

0 genes
top genes:

Compiling diagnostic stats:
Reads per chromosome per cell;
Fill exon results
Fill intron results
Fill intergenic results
Saturation info;
Mean reads per UMI;
Merge targets;
Completed.

Compiling raw count matrix: 20:49:37.
0 genes, 0 cells.
Done: 20:49:37.
Compiling filtered count matrix: 20:49:37.
0 genes, 0 cells.
Done: 20:49:37.
Reads per UMI per gene;

Writing R data to rsi-hex.rds ...: 20:49:37.
Completed: 20:49:37.
Writing rsi-hex.mtx ...
Completed.

All done: 20:52:33.

I cannot find the "merge_and_filter" command in the Dropest tutorial. And previousely I have tried to go through all the process without the bayesian method, and everything went fine, so I am wondering what is the problem?

I am using split-seq data and here is my command running Droptag and Dropest:

droptag -c split_seq.xml -n mm -S -s mm.hex.R2.fastq.gz mm.hex.R1.fastq.gz

dropest -w -M -b -G 20 -g mm.gtf -c split_seq.xml -r mm.params.gz -o mm-hex mm_merged.bam

By the way, there are two more questions:

  1. There are total of four cell barcodes instead of three in split-seq, and the last one is the library index, and I added them in the config file and the barcode file. I don't know if this is the problem.
  2. I use samtools to merge sam files which splited by Droptag and aligned by STAR, and using this merged bam file to run Dropest, I am wondering if I am correct to do so.

Thank you so much!

Best,

Xue

@bettycatherine
Copy link
Author

bettycatherine commented Aug 19, 2020

And I tried to use merge method without bayesian (without -s and -r), the results were the same, is that because I added the library index into the barcode list? Because previously I tried to go through the process and it worked fine, without modify the barcode list.I think the CBs were identified correctly, however, no cells were considered real:

97052 CBs with more than 20 genes
top CBs:
5836 GCCACATAGTGTTCTAGTCTGTCAGATCAG
5649 GCTAACGAAACTCACCATTGGCTCGATCAG
4813 ACAAGCTAAACTCACCGTGTTCTAACTTGA
4377 GCCACATATGGCTTCAGTGTTCTACCGTCC
4352 GACTAGTACTGTAGCCGAACAGGCCTTGTA
4091 CCTAATCCAAGACGGATGGTGGTAACAGTG
3839 TCCGTCTATTCACGCATGGTGGTACTTGTA
3747 CCTAATCCTGGCTTCAGAGTTAGCTGACCA
3694 CAACCACAAATCCGTCGAGTTAGCTGACCA

Start merge: 15:24:39.
Merge initialized: 15:24:39.
Total 0 cells merged
Total 97052 cells excluded
Merge finished: 15:25:09.
Merge UMIs with N's: 15:25:09.
0 cells processed. Merged 0 UMIs from 0 cells.
UMI merge finished: 15:25:09.
0 cells are considered as real.

This is the modified configuration files:

<config>
    <!-- droptag -->
    <TagsSearch>
        <protocol>split_seq</protocol>
        <MultipleBarcodeSearch>
            <barcode_starts>10 48 86 150</barcode_starts>
            <barcode_lengths>8 8 8 6</barcode_lengths>
            <umi_start>0</umi_start>
            <umi_length>10</umi_length>
        </MultipleBarcodeSearch>

        <Processing>
            <min_align_length>10</min_align_length>
            <reads_per_out_file>10000000</reads_per_out_file>
        </Processing>
    </TagsSearch>

    <!-- dropest -->
    <Estimation>
        <Merge>
            <barcodes_file>/gpfs/home/lvxue/tpa/split-seq/cart2/cart2/4-align/drop/split_seq</barcodes_file>
            <barcodes_type>const</barcodes_type>
            <max_cb_merge_edit_distance>4</max_cb_merge_edit_distance>
            <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
            <min_genes_after_merge>100</min_genes_after_merge>
            <min_genes_before_merge>20</min_genes_before_merge>
        </Merge>
    </Estimation>
</config>

And this is the modified barcode file:
ACAGTGGT ACTTGATG ATCACGTT CAGATCTG CGATGTTT CTTGTACT GAATCTGT GACCTTAG GACGGATT GAGCCAAT GAGGATGG GAGGTGCT GATAGAGG GATCAGCG GATCTCTT GATTCATC GCAACATT GCAATCCG GCACTGTC GCATGGCT GCCAATGT GCCTGTTC GCTAACTC GCTCCTTG GGAATGAT GGATTAGG GGCTACAG GGTCGTGT GGTGAGTT GTAAGGTG GTACATCT GTCGCTAT GTCTTGGC GTGTCCTT GTGTGTCG GTTAGCCT GTTGTCGG TAACGCTG TAAGCGTT TAAGTTCG TACAGGAT TACCACCA TACCGAGC TACTAGTC TACTTCGG TAGAACAC TAGACGGA TAGCTTGT TAGTCTTG TAGTGACT TATGCCAG TATGTGGC TCAGATTC TCAGGAGG TCATCCTA TCATTGAG TCCAGTCG TCCGTCTT TCCTCAAT TCGAAGTG TCGAGCGT TCGTTAGC TCTACGAC TCTCACGG TCTCGGTT TCTCTTCA TCTGCTGT TGAACTGG TGAAGCCA TGACAGAC TGACCACT TGATACGT TGCATAGT TGCGATCT TGCGTGAA TGCTGATA TGGCTCAG TGGTTGTT TGTACCTT TGTATGCG TGTCTATC TGTGAAGA TGTGGTTG TGTTCTCC TTACTCGC TTAGGCAT TTCAGCTC TTCCATTG TTCCTGCT TTCGCACC TTCTGTGT TTGACTCT TTGCGTAC TTGGAGGT TTGGTATG TTGTTCCA
ACAGTGGT ACTTGATG ATCACGTT CAGATCTG CGATGTTT CTTGTACT GAATCTGT GACCTTAG GACGGATT GAGCCAAT GAGGATGG GAGGTGCT GATAGAGG GATCAGCG GATCTCTT GATTCATC GCAACATT GCAATCCG GCACTGTC GCATGGCT GCCAATGT GCCTGTTC GCTAACTC GCTCCTTG GGAATGAT GGATTAGG GGCTACAG GGTCGTGT GGTGAGTT GTAAGGTG GTACATCT GTCGCTAT GTCTTGGC GTGTCCTT GTGTGTCG GTTAGCCT GTTGTCGG TAACGCTG TAAGCGTT TAAGTTCG TACAGGAT TACCACCA TACCGAGC TACTAGTC TACTTCGG TAGAACAC TAGACGGA TAGCTTGT TAGTCTTG TAGTGACT TATGCCAG TATGTGGC TCAGATTC TCAGGAGG TCATCCTA TCATTGAG TCCAGTCG TCCGTCTT TCCTCAAT TCGAAGTG TCGAGCGT TCGTTAGC TCTACGAC TCTCACGG TCTCGGTT TCTCTTCA TCTGCTGT TGAACTGG TGAAGCCA TGACAGAC TGACCACT TGATACGT TGCATAGT TGCGATCT TGCGTGAA TGCTGATA TGGCTCAG TGGTTGTT TGTACCTT TGTATGCG TGTCTATC TGTGAAGA TGTGGTTG TGTTCTCC TTACTCGC TTAGGCAT TTCAGCTC TTCCATTG TTCCTGCT TTCGCACC TTCTGTGT TTGACTCT TTGCGTAC TTGGAGGT TTGGTATG TTGTTCCA
ACAGTGGT ACTTGATG ATCACGTT CAGATCTG CGATGTTT CTTGTACT GAATCTGT GACCTTAG GACGGATT GAGCCAAT GAGGATGG GAGGTGCT GATAGAGG GATCAGCG GATCTCTT GATTCATC GCAACATT GCAATCCG GCACTGTC GCATGGCT GCCAATGT GCCTGTTC GCTAACTC GCTCCTTG GGAATGAT GGATTAGG GGCTACAG GGTCGTGT GGTGAGTT GTAAGGTG GTACATCT GTCGCTAT GTCTTGGC GTGTCCTT GTGTGTCG GTTAGCCT GTTGTCGG TAACGCTG TAAGCGTT TAAGTTCG TACAGGAT TACCACCA TACCGAGC TACTAGTC TACTTCGG TAGAACAC TAGACGGA TAGCTTGT TAGTCTTG TAGTGACT TATGCCAG TATGTGGC TCAGATTC TCAGGAGG TCATCCTA TCATTGAG TCCAGTCG TCCGTCTT TCCTCAAT TCGAAGTG TCGAGCGT TCGTTAGC TCTACGAC TCTCACGG TCTCGGTT TCTCTTCA TCTGCTGT TGAACTGG TGAAGCCA TGACAGAC TGACCACT TGATACGT TGCATAGT TGCGATCT TGCGTGAA TGCTGATA TGGCTCAG TGGTTGTT TGTACCTT TGTATGCG TGTCTATC TGTGAAGA TGTGGTTG TGTTCTCC TTACTCGC TTAGGCAT TTCAGCTC TTCCATTG TTCCTGCT TTCGCACC TTCTGTGT TTGACTCT TTGCGTAC TTGGAGGT TTGGTATG TTGTTCCA
ATCACG CGATGT TTAGGC TGACCA ACAGTG GCCAAT CAGATC ACTTGA GATCAG TAGCTT GGCTAC CTTGTA AGTCAA AGTTCC ATGTCA CCGTCC

And this is parts of the bamfile used in Dropest:
QQYI316289273!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 584 255 150M * 0 0 CTCCTGAATAGCGTTTGG
QQYI324155127!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 671 255 150M * 0 0 CTGGTAACACCATCCGAA
QQYI314950512!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 693 255 150M * 0 0 GAAAACATCTGACCACAT
QQYI316271302!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 809 255 105M45S * 0 0 CAAGAGCTCTATAACACA
QQYI316539348!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 817 255 97M53S * 0 0 CTATAACACATTTTAGAA
QQYI316539529!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 817 255 97M53S * 0 0 CTATAACACATTTTAGAA
QQYI317404648!TGAAGAGAAACTCACCAGATGTACACTTGA#TTAGACGAGT 0 scaffold0001 817 255 97M53S * 0 0 CTATAACACATTTTAGAA

So is there something wrong with the barcode list?

@evanbiederstedt
Copy link
Contributor

What protocol are you using?

@bettycatherine
Copy link
Author

Split-seq is the protocol I used.

@zhuojiuqingyun
Copy link

I encountered the same error. Could you tell me how did you deal with this error? Thanks very much!

@bettycatherine
Copy link
Author

I encountered the same error. Could you tell me how did you deal with this error? Thanks very much!

It has been too long to remember, but I think it was something wrong about the configure file. If you'd like I can send you my configure file which progressed fine.

@zhuojiuqingyun
Copy link

zhuojiuqingyun commented Oct 25, 2024

It has been too long to remember, but I think it was something wrong about the configure file. If you'd like I can send you my configure file which progressed fine.

Thank you very much! Here is my email: [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants