Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.DataError: string or blob too big #132

Open
arslan9732 opened this issue Jun 9, 2024 · 2 comments
Open

sqlite3.DataError: string or blob too big #132

arslan9732 opened this issue Jun 9, 2024 · 2 comments
Assignees

Comments

@arslan9732
Copy link

I am trying to run fine_tuning for a new plant. But during the conversion of the gff3 output by HelixerPost to Helixer's training data format I got this error:

Traceback (most recent call last):
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlite3.DataError: string or blob too big

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/data/arslan/tool/GeenuFF/import2geenuff.py", line 120, in <module>
    main(args)
  File "/mnt/data/arslan/tool/GeenuFF/import2geenuff.py", line 93, in main
    controller.add_genome(paths.fasta_in, paths.gff_in, genome_args)
  File "/mnt/data/arslan/tool/GeenuFF/geenuff/applications/importer.py", line 875, in add_genome
    self.add_sequences(fasta_path, genome_args)
  File "/mnt/data/arslan/tool/GeenuFF/geenuff/applications/importer.py", line 894, in add_sequences
    self.session.commit()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1454, in commit
    self._transaction.commit(_to_root=self.future)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 832, in commit
    self._prepare_impl()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 811, in _prepare_impl
    self.session.flush()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
    self._flush(objects)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3588, in _flush
    with util.safe_reraise():
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
    flush_context.execute()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1238, in _emit_insert_statements
    result = connection._execute_20(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (sqlite3.DataError) string or blob too big
[SQL: INSERT INTO coordinate (sequence, length, seqid, sha1, genome_id) VALUES (?, ?, ?, ?, ?)]
[parameters: ('CCCACTTGCAACCAAACACGGGCACTTGAAAGCATGAGTAATCCAATTCCCAAATACGTTCAATGACCCCAAAATATGACAATTTGGAAAATGCGGGATTTCTATTTTTGGAACTTGAGATATGCACAGATTCAGCTACGAGTGTGACA ... (1853204065 characters truncated) ... CCAAGGCACTAGATGAATTGGAAATATCAAGAATATTCATGTGAAAATCATGAATACACTCATCACCCTTCATCCCGAGATTCCCAAATTTGGTGGTGAGAATTTGAAGTCTTGACATTTTTAATTTTGATTTCCCTTCATGAGTGGTT', 1853204363, 'chr1L', '55cf8a4f2868b7127b10c94200d1c8e29516f0db', 1)]
(Background on this error at: https://sqlalche.me/e/14/9h9h)

Here is the command that I used:

python GeenuFF/import2geenuff.py --fasta genome.fa --gff3 genome.hlx.gff \
  --db-path Vfaba.sqlite3 --log-file my_genome_import.log \
  --species my_genome
@alisandra
Copy link
Collaborator

Ah, do you have a single chromosome that is longer than $2^{31}-1$ i.e. 2147483647?

If so, this number is unfortunately a limitation of our current implementation.

@alisandra
Copy link
Collaborator

@arslan9732 As a work around and only for the sake of fine tuning, you could split or truncate any chromosomes longer than the above numbers, but run the final inference (once you have the tuned model) on the original full-length sequences.

@felicitas215 felicitas215 self-assigned this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants