Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc add no longer working #10692

Open
wappler99 opened this issue Feb 20, 2025 · 17 comments
Open

dvc add no longer working #10692

wappler99 opened this issue Feb 20, 2025 · 17 comments

Comments

@wappler99
Copy link

Bug Report

Issue name

dvc add causes an unexpected error:

Description

I've had DVC running on a Linux server for the past 3 months. Yesterday I set up a new conda environment with DVC 3.59.1 and now when I add data with "dvc add xxx" I get the following error:

"Adding... ERROR: unexpected error - no such column: "size" - should this be a string literal in single-quotes?".

Let me add that I have tried to reset DVC ("dvc init") and I have also deleted the .dvc directory and started from scratch, but the error persists. Maybe I made a mistake deleting the .dvc directory. Is there something else I need to do for a complete dvc reset?

Reproduce

  1. conda install dvc==3.57.0
  2. git init
  3. dvc init
  4. dvc add data/testfile.txt

Expected

Dvc adds file and does not show an error.

Environment information

Ubuntu 20.04
DVC 3.57.0

Output of dvc doctor:

$ dvc doctor

DVC version: 3.57.0 (conda)

Platform: Python 3.9.21 on Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.9
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.10
Supports:
http (aiohttp = 3.11.12, aiohttp-retry = 2.8.3),
https (aiohttp = 3.11.12, aiohttp-retry = 2.8.3)
Config:
Global: /home/uli/.config/dvc
System: /etc/xdg/dvc
Cache types: https://error.dvc.org/no-dvc-cache
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/1efc5571a372d9b6285ce7e59990bbae

@wappler99 wappler99 changed the title dvc add no longer not working dvc add no longer working Feb 20, 2025
@kjnam
Copy link

kjnam commented Feb 21, 2025

Having the same error message while pulling. So far this happens on Windows. My environment is shown below:

DVC version: 3.59.1 (conda)
---------------------------
Platform: Python 3.11.11 on Windows-10-10.0.26100-SP0
Subprojects:
        dvc_data = 3.16.7
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.10
Supports:
        http (aiohttp = 3.11.12, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.11.12, aiohttp-retry = 2.8.3)

@wappler99
Copy link
Author

I've upgraded to the latest version (3.59.1), still getting the error when I try to do a dvc add.

DVC version: 3.59.1 (conda)

Platform: Python 3.12.7 on Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.9
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.10
Supports:
http (aiohttp = 3.11.12, aiohttp-retry = 2.8.3),
https (aiohttp = 3.11.12, aiohttp-retry = 2.8.3)
Config:
Global: /home/uli/.config/dvc
System: /etc/xdg/dvc
Cache types: https://error.dvc.org/no-dvc-cache
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/c02b34b775c7476e44bf2ef0e03c61c6

@skshetry
Copy link
Member

skshetry commented Feb 21, 2025

Can you please try with pip? Also, please share the verbose output, try adding --verbose to the end of your command.

@vamshisaideep9
Copy link

Instead of conda try it with virtualenv

@nobutoba
Copy link

Hi, I ran into the same problem today and found that it's related to libsqlite version 3.49.1. In my case, running:

conda install -c conda-forge libsqlite=3.48.0

fixed the issue. Hope this helps!

@kjnam
Copy link

kjnam commented Feb 21, 2025

@nobutoba Thanks. It seems to resolve the issue. Looks like there is a bug or an incompatibility in the recent version of libsqlite.

@wappler99
Copy link
Author

wappler99 commented Feb 21, 2025

@skshetry: Thanks for jumping in. I looked at this in more detail last night. the crash happened in site-packages/dvc_data/hashfile/cache.py". The program was trying to do an insert into cache db and complained the column 'size' did not exist. The print for the added file looks as follows: ('/test/data/test.txt', '{"version":1,"checksum":"321037014704241475440851141032219623992","size":5,"hash_info":{"md5":"6137cde4893c59f76f005a8123d8e8e6"}}')].
So something's off with that database and like @nobutoba has pointed out, it seems to be an issue with sqlite. His proposed fix worked for me.

@nobutoba : Thanks, you are a life saver!

@mdekstrand
Copy link

It doesn't make sense to me to close this issue until the problem is fixed, either in DVC or SQLite — downgrading SQLite works around the problem, but there is still a bug when combining DVC with the most recent version of SQLite, which is a configuration that happens in Conda and I expect would also appear in rolling-release Linux distributions like Arch.

@skshetry skshetry reopened this Feb 24, 2025
@skshetry
Copy link
Member

skshetry commented Feb 24, 2025

@mdekstrand, I have reopened the issue. If you are experiencing this problem, please provide a verbose traceback so that I can identify where this is happening.

If you are installing DVC from any other places than PyPI, I suggest you try installing with uv or pip and see if the issue happens. There was a bug in a dependency that was fixed recently (iterative/sqltrie#53) and released.

There is also a related bug report: iterative/dvc-data#599, but so far I haven't been able to reproduce with sqlite3 3.49.1. That seems to come from https://github.com/grantjenks/python-diskcache, but I don't see how size column can be missing. :(

As another step, try deleting Repo.site_cache_dir (you can find its path in the output of dvc doctor) — the database may have been corrupted somehow.

@mdekstrand
Copy link

Here's a traceback. This was in CI, so cache was completely clean (except for the local file cache, cached in GHA).

Switching to uv does get a working install, and we have switched this project to uv. I expect that is using whatever sqlite version the Python install was linked against.

Traceback
2025-02-20 15:23:34,081 DEBUG: Preparing to collect status from '/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.dvc/cache/files/md5'
2025-02-20 15:23:34,081 DEBUG: Collecting status from '/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.dvc/cache/files/md5'
2025-02-20 15:23:36,269 ERROR: unexpected error - no such column: "size" - should this be a string literal in single-quotes?
Traceback (most recent call last):
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/commands/data_sync.py", line 35, in run
    stats = self.repo.pull(
            ^^^^^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/repo/pull.py", line 30, in pull
    processed_files_count = self.fetch(
                            ^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc/repo/fetch.py", line 184, in fetch
    fetch_transferred, fetch_failed = ifetch(
                                      ^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc_data/index/fetch.py", line 159, in fetch
    fetched += save(
               ^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc_data/index/save.py", line 172, in save
    transferred += cache.add(
                   ^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc_data/hashfile/db/__init__.py", line 119, in add
    self.state.save_many(
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc_data/hashfile/state.py", line 197, in save_many
    return self.hashes.set_many(lst)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.pixi/envs/test/lib/python3.11/site-packages/dvc_data/hashfile/cache.py", line 116, in set_many
    self._con.executemany(query, items)
sqlite3.OperationalError: no such column: "size" - should this be a string literal in single-quotes?

2025-02-20 15:23:36,288 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2025-02-20 15:23:36,288 DEBUG: Removing '/home/runner/work/poprox-recommender-locality/.lp6xqzg6jvvOhuwflIfPSA.tmp'
2025-02-20 15:23:36,288 DEBUG: Removing '/home/runner/work/poprox-recommender-locality/.lp6xqzg6jvvOhuwflIfPSA.tmp'
2025-02-20 15:23:36,288 DEBUG: Removing '/home/runner/work/poprox-recommender-locality/.lp6xqzg6jvvOhuwflIfPSA.tmp'
2025-02-20 15:23:36,288 DEBUG: Removing '/home/runner/work/poprox-recommender-locality/poprox-recommender-locality/.dvc/cache/files/md5/.byYaGb1jhptze1WCh4uMFg.tmp'
2025-02-20 15:23:36,291 DEBUG: Version info for developers:
DVC version: 3.59.1 (conda)
---------------------------
Platform: Python 3.11.11 on Linux-6.8.0-1021-azure-x86_64-with-glibc2.39
Subprojects:
	dvc_data = 3.16.9
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.10
Supports:

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
	http (aiohttp = 3.11.12, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.11.12, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2025.2.0, boto3 = 1.36.3)
Config:
	Global: /home/runner/.config/dvc
	System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda1
Caches: local
Remotes: s3, s3, s3
Workspace directory: ext4 on /dev/sda1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/0e7b94d77fff1d1d5a21c2b03a6980d6
2025-02-20 15:23:36,292 DEBUG: Analytics is enabled.
2025-02-20 15:23:36,369 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp3wvfh0pw', '-v']
2025-02-20 15:23:36,377 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp3wvfh0pw', '-v'] with pid 4104
2025-02-20 15:23:36,378 DEBUG: Removing '/tmp/tmp5ljc62dfdvc-clone'
2025-02-20 15:23:36,396 DEBUG: Removing '/tmp/tmpmotiwnuxdvc-clone'
2025-02-20 15:23:36,398 DEBUG: Removing '/tmp/tmp5gyth2k5dvc-clone'
2025-02-20 15:23:36,400 DEBUG: Removing '/tmp/tmp6j7u3h2odvc-cache'
2025-02-20 15:23:36,400 DEBUG: Removing '/tmp/tmp6od4q00ndvc-cache'
2025-02-20 15:23:36,401 DEBUG: Removing '/tmp/tmpcfx4hzd0dvc-cache'

@skshetry
Copy link
Member

skshetry commented Feb 24, 2025

Ok, I finally found the root cause: conda-forge/sqlite-feedstock#130, which is a regression in a recent release of sqlite installed from conda. cc @sfinkens.

@mdekstrand
Copy link

Is DVC or one of its dependencies using a double-quoted string literal when it should use a single-quoted one for SQL conformity?

@skshetry
Copy link
Member

skshetry commented Feb 24, 2025

Is DVC or one of its dependencies using a double-quoted string literal when it should use a single-quoted one for SQL conformity?

yes, python-diskcache (which we use for caching hashes) uses double-quoted string. There's an open PR grantjenks/python-diskcache#311 but that is open for almost a year now. :(

@jothsnapraveena

This comment has been minimized.

@jothsnapraveena

This comment has been minimized.

@jothsnapraveena

This comment has been minimized.

@skshetry
Copy link
Member

skshetry commented Feb 26, 2025

@jothsnapraveena (and, anyone else encountering this issue), please check if you are using Python/sqlite from conda. You can either rollback to an older version of sqlite as mentioned in #10692 (comment), or use a different Python installation (eg: from uv) as mentioned in #10692 (comment).

We'll keep this issue open until either conda-forge/sqlite-feedstock#130 or grantjenks/python-diskcache#311 gets resolved.

Please read the discussion above, and avoid commenting unless the above suggested solutions do not work, or you find any new information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants