Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run match_cell_barcode, no error, no result, match_cell_barcode /data_RAGE_seq/data1 cell_barcode_stat.txt split_barcode.fastq flame_3M-february-2018.txt 2; split_barcode.fastq is zero,no other file generation。 #12

Open
markme123 opened this issue Jan 10, 2021 · 7 comments

Comments

@markme123
Copy link

No description provided.

@LuyiTian
Copy link
Owner

LuyiTian commented Feb 8, 2021

it is hard to see with limited information. Is there any output in terminal?
usually you would see some stats printed after you run the program, here is an example

the first few lines of output:

set UMI length to 10.
First 5 cell barcode:
        AAACCTGCAATCCAAC
        AAACGGGCATACGCCG
        AAACGGGCATTAGGCT
        AAACGGGGTATAGTAG
        AAAGATGCAACACCCG
/stornext/Genomics/data/CLL_venetoclax/FLTseq/HD11/fastq/HD11_pass.fq.gz
forward flanking end: 66        2819
forward flanking end: 67        2486

the last lines:

        24      1117
        32      487
###total read: 56654147
###barcode hm match: 33287709
###barcode match: 3337587
###barcode not match: 20009062
###too short: 19789

@yuchen345
Copy link

yuchen345 commented Mar 21, 2022

Hi, Luyi,
Would you please make an example of the usage of match_cell_barcode and explain the input files in more details?
Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?

here is my error message:
image

Thanks!
Yuchen

@icanccwhite
Copy link

Hi, Luyi, Would you please make an example of the usage of match_cell_barcode and explain the input files in more details? Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?

here is my error message: image

Thanks! Yuchen

I have the same question

@LuyiTian
Copy link
Owner

LuyiTian commented Apr 12, 2022

Hi @icanccwhite and @yuchen345 you should use long-read fastq data as input. The cell barcode file come from the short-read data output. from your screenshot it seems you have printed the first 5 cell barcode so the program is running well. Can you check your data path again? I think you need to use absolute path.

@yuchen345
Copy link

yuchen345 commented Apr 28, 2022

Thanks for your reply! @LuyiTian

Here is another error using sc_long_pipeline.py :

### read gene annotation 2022-04-20 20:57:58

remove similar transcripts in gene annotation: Counter({'duplicated_transcripts': 370})
### find isoforms 2022-04-20 20:59:27
GL000219.1
KI270713.1
KI270733.1
GL000194.1
GL000195.1
KI270731.1
20
Traceback (most recent call last):
File "./sc_long_pipeline.py", line 213, in
sc_long_pipeline(args)
File "./sc_long_pipeline.py", line 179, in sc_long_pipeline
raw_gff3=raw_splice_isoform if config_dict["global_parameters"]["generate_raw_isoform"] else None)
File "/home/chenz/biosoft/FLAMES/python/sc_longread.py", line 975, in group_bam2isoform
it_region = bamfile.fetch(ch, bl.s, bl.e)
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig 20

Waiting for your reply!

@LuyiTian
Copy link
Owner

it seems chromosome 20 is not in the pysam dictionary. I would suggest double check your genome annotation and make sure you download the fasta and gff/gtf file from the same source. did you do anything to the genome annotation? usually chromosome 20 wont be in the end of the chromosome list. from your output it seems to be at the end.

@yuchen345
Copy link

yuchen345 commented Jun 13, 2022

Thank you very much ! @LuyiTian

More questions i am wondering:

  1. As you said, the FLAMES searches for both directions and trims adapter sequence + cellbarcode/UMI at both directions, what dose FLAMES do for UMI assignment while a read was tagged with UMI and perhaps there is a sequencing error?

  2. I noticed that there is a find_polyT function in match_cell_barcode, have you omitted polyT sequence in the output fastq.gz file? How do you deal with the polyA sequence at the reverse strand?

  3. Can the FLAMES be used with 5' libraries(10X ) as there is TSO sequence rather than polyT after cellbarcode/UMI?

Looking forward to your reply.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants