Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error generating Kallisto index #257

Closed
Upendra19993 opened this issue Feb 27, 2024 · 3 comments
Closed

Error generating Kallisto index #257

Upendra19993 opened this issue Feb 27, 2024 · 3 comments
Labels
Installation Installation-related issues

Comments

@Upendra19993
Copy link

Hi,

I want to run sqanti3 for my dataset. But to get familiar with the tool, I first tried the tool with the example dataset you have provided. I ran sqanti3 quality control step and I am getting an error. The whole message I get is as below.

The command I used is: sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both

The progress of the job and error messages are as below.

(base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both
Rscript (R) version 4.3.1 (2023-06-16)
ERROR: genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_reinstallation_2/GRCh38.p13_chr22.fasta doesn't exist. Abort!
(base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ cd ..
(base) [uqwwijes@bun025 Exampla_data]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both
Rscript (R) version 4.3.1 (2023-06-16)
Write arguments to /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22.params.txt...
**** Running SQANTI3...
**** Parsing provided files....
Reading genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta....
Skipping aligning of sequences because GTF file was provided.

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
**** Parsing Reference Transcriptome....
**** Parsing Isoforms....
**** Running STAR for calculating Short-Read Coverage.
START running STAR...
Running indexing...
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --runMode genomeGenerate --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --genomeFastaFiles /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta --outTmpDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index//_STARtmp/
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:20:52 ..... started STAR run
Feb 27 12:20:52 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=50818468, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 11
Feb 27 12:20:53 ... starting to sort Suffix Array. This may take a long time...
Feb 27 12:20:53 ... sorting Suffix Array chunks and saving them to disk...
Feb 27 12:21:02 ... loading chunks from disk, packing SA...
Feb 27 12:21:02 ... finished generating suffix array
Feb 27 12:21:02 ... generating Suffix Array index
Feb 27 12:21:11 ... completed Suffix Array index
Feb 27 12:21:11 ... writing Genome to disk ...
Feb 27 12:21:11 ... writing Suffix Array to disk ...
Feb 27 12:21:11 ... writing SAindex to disk
Feb 27 12:21:11 ..... finished successfully
Indexing done.
Mapping for UHR_Rep1_chr22.R1 : in progress...
Mapping for UHR_Rep1_chr22.R1 : done.
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:21:12 ..... started STAR run
Feb 27 12:21:12 ..... loading genome
Feb 27 12:21:12 ..... started 1st pass mapping
Feb 27 12:21:36 ..... finished 1st pass mapping
Feb 27 12:21:36 ..... inserting junctions into the genome indices
Feb 27 12:21:44 ..... started mapping
Feb 27 12:22:09 ..... finished mapping
Feb 27 12:22:09 ..... started sorting BAM
Feb 27 12:22:09 ..... finished successfully
Mapping for UHR_Rep2_chr22.R1 : in progress...
Mapping for UHR_Rep2_chr22.R1 : done.
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:22:10 ..... started STAR run
Feb 27 12:22:10 ..... loading genome
Feb 27 12:22:10 ..... started 1st pass mapping
Feb 27 12:22:28 ..... finished 1st pass mapping
Feb 27 12:22:28 ..... inserting junctions into the genome indices
Feb 27 12:22:36 ..... started mapping
Feb 27 12:22:55 ..... finished mapping
Feb 27 12:22:55 ..... started sorting BAM
Feb 27 12:22:55 ..... finished successfully
Input pattern: /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/.
The following files found and to be read as junctions:
/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1SJ.out.tab
/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1SJ.out.tab
6762 junctions read. 2 junctions added to both strands because no strand information from STAR.
Running calculation of TSS ratio
BAM files identified: ['/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep1_chr22.R1Aligned.sortedByCoord.out.bam', '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep2_chr22.R1Aligned.sortedByCoord.out.bam']
Temp files removed.

**** Performing Classification of Isoforms....
Number of classified isoforms: 3925
**** RT-switching computation....
Full-length read abundance files not provided.
**** Adding TSS ratio data... ****
**** Running Kallisto to calculate isoform expressions.
Running kallisto index /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx using as reference /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22_corrected.fasta

**Running Kallisto quantification for UHR_Rep1_chr22.R1 sample

Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to

Optional arguments:
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--single-overhang Include reads where unobserved rest of fragment is
predicted to lie outside a transcript
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
-t, --threads=INT Number of threads to use (default: 1)
--verbose Print out progress information every 1M proccessed reads
Running Kallisto quantification for UHR_Rep2_chr22.R1 sample

Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to

Optional arguments:
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--single-overhang Include reads where unobserved rest of fragment is
predicted to lie outside a transcript
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
-t, --threads=INT Number of threads to use (default: 1)
--verbose Print out progress information every 1M proccessed reads
Traceback (most recent call last):
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2542, in
main()
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2525, in main
run(args)
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 1978, in run
exp_dict = expression_parser(expression_files)
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 806, in expression_parser
reader = DictReader(open(exp_file), delimiter='\t')
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/UHR_Rep1_chr22.R1/abundance.tsv'**

Kindly let me know what I can do to fix this issue.

Many thanks,
Upendra.

@carolinamonzo
Copy link
Contributor

Hi @Upendra19993, that's very unfortunate, it seems like your process ended because it's missing the kallisto index for quantification. This is the main problem that's making everything downstream fail:
Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx

Do you have Kallisto installed? Can you try indexing the Kallisto fasta file?

@eprdz
Copy link

eprdz commented Mar 4, 2024

I think there is a problem with latest version of kallisto...
I had a similar problem with kallisto v0.50.1. I downgraded by doing: conda install "bioconda::kallisto<0.50.1" and I could make the kallisto index.

@Upendra19993
Copy link
Author

Hi Carolinamonzó and eprdz,

I had installed Kallisto but had this issue. Then I also tried with a different version (kallisto0.48.0) and it worked and got the results without any error. Thank you both of you!

@carolinamonzo carolinamonzo changed the title Error in running sqanti3 Quality Control step Error generating Kallisto index Mar 5, 2024
@carolinamonzo carolinamonzo added the Installation Installation-related issues label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Installation Installation-related issues
Projects
None yet
Development

No branches or pull requests

3 participants