Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"result would exceed 2^31-1 bytes" #170

Open
CEPHAS-01 opened this issue Oct 17, 2024 · 9 comments
Open

"result would exceed 2^31-1 bytes" #170

CEPHAS-01 opened this issue Oct 17, 2024 · 9 comments

Comments

@CEPHAS-01
Copy link

Hi and thanks for this beautiful comparative genomics tool.

I was trying out Genespace on our HPC system using the human and sheep assemblies from NCBI but ran into the following error when trying to parse_annotations "result would exceed 2^31-1 bytes".

I have checked and I am sure that the machine is a 64-bit architecture. Any suggestions on how to resolve this?

Temitayo

@LovellHAGSC
Copy link
Contributor

huh - funny you should mention this ... I just broke the dev version of DEEPSPACE with this same error. This happens when trying to generate an integer 2^31-1 ... for example position coordinate of a sequence > ~2.1Gb. I can't imagine how this would happen with GENESPACE though. Can you print the exact error and what step it came at?

@CEPHAS-01
Copy link
Author

Oh I see

The parse annotation step produced the error.

parsedPaths <- parse_annotations(

  • rawGenomeRepo = "genespace/source",
  • genomeDirs = c("human", "sheep"),
  • genomeIDs = c("human", "sheep"),
  • gffString = "gff",
  • faString = "fasta",
  • genespaceWd = "genespace/workspace")
    Error in paste(fa[1:100], collapse = "") :
    result would exceed 2^31-1 bytes

The genomes I am working with are quite large - human ~3GB and sheep ~2.8GB

perhaps some of the data type needs to be changed to increase the storage range.

@LovellHAGSC
Copy link
Contributor

I don't think thats it ... unless all the chromosomes got concatenated. Pine broke it and it has several chromosomes that are as large as the entire Hg38 human genome.

@CEPHAS-01
Copy link
Author

The chromosomes were not concatenated. I used the genome as downloaded from NCBI.

@LovellHAGSC
Copy link
Contributor

LovellHAGSC commented Oct 23, 2024

Can you post the urls to the files you downloaded from ncbi?

@CEPHAS-01
Copy link
Author

Sure
Human genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/
[GCF_000001405.40_GRCh38.p14_genomic.fna.gz]
[GCF_000001405.40_GRCh38.p14_protein.faa.gz]

Sheep genome and protein sequence from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/772/045/GCF_016772045.2_ARS-UI_Ramb_v3.0/
[GCF_016772045.2_ARS-UI_Ramb_v3.0_genomic.fna.gz]
[GCF_016772045.2_ARS-UI_Ramb_v3.0_protein.faa.gz]

@LovellHAGSC
Copy link
Contributor

did you try to pass parse_annotations these files?
You want the
translated_cds.faa.gz
and
genomic.gff.gz
See:
https://htmlpreview.github.io/?https://github.com/jtlovell/tutorials/blob/main/genespaceGuide.html

@CEPHAS-01
Copy link
Author

Yes, the parse_annotations stage produced the error.
I was using the protein.faa.gz and not the translated_cds.faa.gz. Perhaps this is the reason.
Stepping away from my desk shortly, I will test it with translated_cds.faa.gz and give you feedback.
Thanks!

@LovellHAGSC
Copy link
Contributor

It should give a more informative error than that if you gave it the protein fa ... that one just doesn't parse right. I was wondering if you fed the genomic.fna.gz as a gff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants