"result would exceed 2^31-1 bytes" #170

CEPHAS-01 · 2024-10-17T23:45:16Z

Hi and thanks for this beautiful comparative genomics tool.

I was trying out Genespace on our HPC system using the human and sheep assemblies from NCBI but ran into the following error when trying to parse_annotations "result would exceed 2^31-1 bytes".

I have checked and I am sure that the machine is a 64-bit architecture. Any suggestions on how to resolve this?

Temitayo

LovellHAGSC · 2024-10-23T19:58:32Z

huh - funny you should mention this ... I just broke the dev version of DEEPSPACE with this same error. This happens when trying to generate an integer 2^31-1 ... for example position coordinate of a sequence > ~2.1Gb. I can't imagine how this would happen with GENESPACE though. Can you print the exact error and what step it came at?

CEPHAS-01 · 2024-10-23T22:14:43Z

Oh I see

The parse annotation step produced the error.

parsedPaths <- parse_annotations(

rawGenomeRepo = "genespace/source",
genomeDirs = c("human", "sheep"),
genomeIDs = c("human", "sheep"),
gffString = "gff",
faString = "fasta",
genespaceWd = "genespace/workspace")
Error in paste(fa[1:100], collapse = "") :
result would exceed 2^31-1 bytes

The genomes I am working with are quite large - human ~3GB and sheep ~2.8GB

perhaps some of the data type needs to be changed to increase the storage range.

LovellHAGSC · 2024-10-23T23:17:12Z

I don't think thats it ... unless all the chromosomes got concatenated. Pine broke it and it has several chromosomes that are as large as the entire Hg38 human genome.

CEPHAS-01 · 2024-10-23T23:21:18Z

The chromosomes were not concatenated. I used the genome as downloaded from NCBI.

LovellHAGSC · 2024-10-23T23:24:08Z

Can you post the urls to the files you downloaded from ncbi?

CEPHAS-01 · 2024-10-23T23:35:59Z

Sure
Human genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/
[GCF_000001405.40_GRCh38.p14_genomic.fna.gz]
[GCF_000001405.40_GRCh38.p14_protein.faa.gz]

Sheep genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/772/045/GCF_016772045.2_ARS-UI_Ramb_v3.0/
[GCF_016772045.2_ARS-UI_Ramb_v3.0_genomic.fna.gz]
[GCF_016772045.2_ARS-UI_Ramb_v3.0_protein.faa.gz]

LovellHAGSC · 2024-10-23T23:39:00Z

did you try to pass parse_annotations these files?
You want the
translated_cds.faa.gz
and
genomic.gff.gz
See:
https://htmlpreview.github.io/?https://github.com/jtlovell/tutorials/blob/main/genespaceGuide.html

CEPHAS-01 · 2024-10-23T23:45:34Z

Yes, the parse_annotations stage produced the error.
I was using the protein.faa.gz and not the translated_cds.faa.gz. Perhaps this is the reason.
Stepping away from my desk shortly, I will test it with translated_cds.faa.gz and give you feedback.
Thanks!

LovellHAGSC · 2024-10-24T00:34:34Z

It should give a more informative error than that if you gave it the protein fa ... that one just doesn't parse right. I was wondering if you fed the genomic.fna.gz as a gff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"result would exceed 2^31-1 bytes" #170

"result would exceed 2^31-1 bytes" #170

CEPHAS-01 commented Oct 17, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024 •

edited

Loading

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 24, 2024

"result would exceed 2^31-1 bytes" #170

"result would exceed 2^31-1 bytes" #170

Comments

CEPHAS-01 commented Oct 17, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024 • edited Loading

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 23, 2024

CEPHAS-01 commented Oct 23, 2024

LovellHAGSC commented Oct 24, 2024

LovellHAGSC commented Oct 23, 2024 •

edited

Loading