Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Setting core.download:parallel causes ERROR: database is locked #13924

Closed
sykhro opened this issue May 19, 2023 · 27 comments · Fixed by #14410 or #14968
Closed

[bug] Setting core.download:parallel causes ERROR: database is locked #13924

sykhro opened this issue May 19, 2023 · 27 comments · Fixed by #14410 or #14968
Assignees
Milestone

Comments

@sykhro
Copy link

sykhro commented May 19, 2023

Environment details

  • Operating System+version: Windows Server 2016, Ubuntu 20.04
    • All machines have >= 48 vCPUs
  • Compiler+version: N/A
  • Conan version: 2.0.4, 2.0.5
  • Python version: 3.11

Steps to reproduce

  1. Set core.download:parallel to some value. In our case, this is global.conf:
core:non_interactive = True
core.download:parallel = 10
  1. Invoke conan install ...
  2. sqlite3.OperationalError: database is locked

I should note that conan install is invoked in succession by a Python script. However, no amount of sleep() seems to fix the issue and unsetting core.download:parallel makes the problem disappear

Logs

conan install, given a standard conanfile.py and paths to host and build profiles

2023-05-18T18:19:35.7639220Z -------- Downloading 13 packages --------
2023-05-18T18:19:35.7639762Z Downloading binary packages in 10 parallel threads
2023-05-18T18:19:35.7678332Z fmt/9.1.0: Retrieving package 6c6677cbc81464b6b6b1695787db778b0be09550 from remote '[REDACTED]' 
2023-05-18T18:19:35.7679416Z imgui/1.87: Retrieving package 324b368d774fa34a6cdb068fe2ef8192a818cc88 from remote '[REDACTED]' 
2023-05-18T18:19:35.7680340Z json-c/0.15: Retrieving package b9bf0ccbd7c2de257a9ab3cfb4dd16ed65007983 from remote '[REDACTED]' 
2023-05-18T18:19:35.7683493Z libjpeg-turbo/2.1.5@[REDACTED]: Retrieving package 27567f108b4464edaf4634e2589c286d9b0cfd85 from remote '[REDACTED]' 
2023-05-18T18:19:35.7684566Z libiconv/1.17: Retrieving package b9bf0ccbd7c2de257a9ab3cfb4dd16ed65007983 from remote '[REDACTED]' 
2023-05-18T18:19:35.7687209Z dawn/5579@[REDACTED]: Retrieving package 1420f5eb1fd84f274aefcb2a9adeeaefff9d3dea from remote '[REDACTED]' 
2023-05-18T18:19:35.7689954Z libpng/1.6.39: Retrieving package 8e1ee4604f7dcf4947d3e06571faa4a7b589fda5 from remote '[REDACTED]' 
2023-05-18T18:19:35.7690762Z pcre/8.45: Retrieving package e26935b6bdd917b706a0a17334d607698d470cce from remote '[REDACTED]' 
2023-05-18T18:19:35.7694146Z liquidfun/1.1.0@[REDACTED]/dev: Retrieving package 324b368d774fa34a6cdb068fe2ef8192a818cc88 from remote '[REDACTED]' 
2023-05-18T18:19:35.7695056Z implot/0.14@[REDACTED]: Retrieving package ede40434c0b2ae7401db6ecf750d64a4edd6cd14 from remote '[REDACTED]' 
2023-05-18T18:19:35.8830595Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/fmt/9.1.0/_/_/revisions/e747928f85b03f48aaf227ff897d9634/packages/6c6677cbc81464b6b6b1695787db778b0be09550/revisions/79ed374e39171ec952a012f4579051e5/files/conan_package.tgz
2023-05-18T18:19:35.8837174Z Decompressing conan_package.tgz
2023-05-18T18:19:35.8920136Z fmt/9.1.0: Package installed 6c6677cbc81464b6b6b1695787db778b0be09550
2023-05-18T18:19:35.8920790Z fmt/9.1.0: Downloaded package revision 79ed374e39171ec952a012f4579051e5
2023-05-18T18:19:35.8921717Z libzip/1.8.0: Retrieving package c063fc064e9a56ccff6fb501a6d5e5ac74bba67b from remote '[REDACTED]' 
2023-05-18T18:19:36.0107439Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/imgui/1.87/_/_/revisions/5633ebf337730424e894d34203b11b6d/packages/324b368d774fa34a6cdb068fe2ef8192a818cc88/revisions/16d69ff4ccec8fd0ebf8fbb1c78d0d85/files/conan_package.tgz
2023-05-18T18:19:36.0184636Z Decompressing conan_package.tgz
2023-05-18T18:19:36.0484713Z imgui/1.87: Package installed 324b368d774fa34a6cdb068fe2ef8192a818cc88
2023-05-18T18:19:36.0485363Z imgui/1.87: Downloaded package revision 16d69ff4ccec8fd0ebf8fbb1c78d0d85
2023-05-18T18:19:36.0486245Z sdl/2.26.1@[REDACTED]: Retrieving package 4498fafa351ccb53920f61e14181f0b3c16806a5 from remote '[REDACTED]' 
2023-05-18T18:19:36.1200095Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/pcre/8.45/_/_/revisions/563d050c69aaa0a61be46bb09ba5c23c/packages/e26935b6bdd917b706a0a17334d607698d470cce/revisions/e7ec9bddd44826e0057df8f2a29403c9/files/conan_package.tgz
2023-05-18T18:19:36.1214034Z Decompressing conan_package.tgz
2023-05-18T18:19:36.1305277Z pcre/8.45: Package installed e26935b6bdd917b706a0a17334d607698d470cce
2023-05-18T18:19:36.1305906Z pcre/8.45: Downloaded package revision e7ec9bddd44826e0057df8f2a29403c9
2023-05-18T18:19:36.1306752Z freetype/2.13.0: Retrieving package 55351983b5f5d042a5022c172c89e30573d3f532 from remote '[REDACTED]' 
2023-05-18T18:19:36.2918597Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/sdl/2.26.1/[REDACTED]/revisions/97757243b40f008e925d55259af595de/packages/4498fafa351ccb53920f61e14181f0b3c16806a5/revisions/82ce4d7d7e0958818a934da63e291dce/files/conan_package.tgz
2023-05-18T18:19:36.2960636Z Decompressing conan_package.tgz
2023-05-18T18:19:36.3389210Z sdl/2.26.1@[REDACTED]: Package installed 4498fafa351ccb53920f61e14181f0b3c16806a5
2023-05-18T18:19:36.3389905Z sdl/2.26.1@[REDACTED]: Downloaded package revision 82ce4d7d7e0958818a934da63e291dce
2023-05-18T18:19:36.5695960Z Decompressing conan_package.tgz
2023-05-18T18:19:36.5745769Z json-c/0.15: Package installed b9bf0ccbd7c2de257a9ab3cfb4dd16ed65007983
2023-05-18T18:19:36.5746849Z json-c/0.15: Downloaded package revision 6c9a9340ebae4cf0e4994f5a7d527ac6
2023-05-18T18:19:36.6803579Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/freetype/2.13.0/_/_/revisions/a312c096b314c6cfd6b43e290c88928a/packages/55351983b5f5d042a5022c172c89e30573d3f532/revisions/a524e1f748840e13098a6c271742435b/files/conan_package.tgz
2023-05-18T18:19:36.6831777Z Decompressing conan_package.tgz
2023-05-18T18:19:36.7082534Z freetype/2.13.0: Package installed 55351983b5f5d042a5022c172c89e30573d3f532
2023-05-18T18:19:36.7083364Z freetype/2.13.0: Downloaded package revision a524e1f748840e13098a6c271742435b
2023-05-18T18:19:36.8712488Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/libiconv/1.17/_/_/revisions/fa54397801cd96911a8294bc5fc76335/packages/b9bf0ccbd7c2de257a9ab3cfb4dd16ed65007983/revisions/7caa9e5b3af799bceffcdbe1ee870ece/files/conan_package.tgz
2023-05-18T18:19:36.8781076Z Decompressing conan_package.tgz
2023-05-18T18:19:36.8931988Z libiconv/1.17: Package installed b9bf0ccbd7c2de257a9ab3cfb4dd16ed65007983
2023-05-18T18:19:36.8932709Z libiconv/1.17: Downloaded package revision 7caa9e5b3af799bceffcdbe1ee870ece
2023-05-18T18:19:37.0289862Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/libpng/1.6.39/_/_/revisions/a4953955f060811c79abff9fe986b6fe/packages/8e1ee4604f7dcf4947d3e06571faa4a7b589fda5/revisions/b3aaf7d1d260a23aaa89cd37dabf8768/files/conan_package.tgz
2023-05-18T18:19:37.0308358Z Decompressing conan_package.tgz
2023-05-18T18:19:37.0377380Z libpng/1.6.39: Package installed 8e1ee4604f7dcf4947d3e06571faa4a7b589fda5
2023-05-18T18:19:37.0379062Z libpng/1.6.39: Downloaded package revision b3aaf7d1d260a23aaa89cd37dabf8768
2023-05-18T18:19:37.1878249Z $Downloading conan_package.tgz from ***/api/conan/conan-internal/v2/conans/libjpeg-turbo/2.1.5/[REDACTED]/revisions/6fa0c11b0f58c7102a4e73003c6e76e1/packages/27567f108b4464edaf4634e2589c286d9b0cfd85/revisions/a4ee91372a20134eb537382e47732b91/files/conan_package.tgz
2023-05-18T18:19:37.1901411Z Decompressing conan_package.tgz
2023-05-18T18:19:37.2006714Z libjpeg-turbo/2.1.5@[REDACTED]: Package installed 27567f108b4464edaf4634e2589c286d9b0cfd85
2023-05-18T18:19:37.2008866Z libjpeg-turbo/2.1.5@[REDACTED]: Downloaded package revision a4ee91372a20134eb537382e47732b91
2023-05-18T18:19:37.2032496Z ERROR: database is locked
2023-05-18T18:19:37.2033015Z Traceback (most recent call last):
2023-05-18T18:19:37.2033753Z   File "/usr/local/lib/python3.10/dist-packages/conan/cli/cli.py", line 271, in main
2023-05-18T18:19:37.2034308Z     cli.run(args)
2023-05-18T18:19:37.2034934Z   File "/usr/local/lib/python3.10/dist-packages/conan/cli/cli.py", line 171, in run
2023-05-18T18:19:37.2035593Z     command.run(self._conan_api, self._commands[command_argument].parser, args[0][1:])
2023-05-18T18:19:37.2036330Z   File "/usr/local/lib/python3.10/dist-packages/conan/cli/command.py", line 134, in run
2023-05-18T18:19:37.2036931Z     info = self._method(conan_api, parser, *args)
2023-05-18T18:19:37.2037670Z   File "/usr/local/lib/python3.10/dist-packages/conan/cli/commands/install.py", line 84, in install
2023-05-18T18:19:37.2038362Z     conan_api.install.install_binaries(deps_graph=deps_graph, remotes=remotes)
2023-05-18T18:19:37.2039117Z   File "/usr/local/lib/python3.10/dist-packages/conan/api/subapi/install.py", line 22, in install_binaries
2023-05-18T18:19:37.2039747Z     installer.install(deps_graph, remotes)
2023-05-18T18:19:37.2040446Z   File "/usr/local/lib/python3.10/dist-packages/conans/client/installer.py", line 247, in install
2023-05-18T18:19:37.2041165Z     self._download_bulk(install_order)
2023-05-18T18:19:37.2041875Z   File "/usr/local/lib/python3.10/dist-packages/conans/client/installer.py", line 277, in _download_bulk
2023-05-18T18:19:37.2042490Z     thread_pool.map(self._download_pkg, downloads)
2023-05-18T18:19:37.2043066Z   File "/usr/lib/python3.10/multiprocessing/pool.py", line 367, in map
2023-05-18T18:19:37.2043884Z     return self._map_async(func, iterable, mapstar, chunksize).get()
2023-05-18T18:19:37.2044749Z   File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
2023-05-18T18:19:37.2045243Z     raise self._value
2023-05-18T18:19:37.2045768Z   File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
2023-05-18T18:19:37.2046317Z     result = (True, func(*args, **kwds))
2023-05-18T18:19:37.2046864Z   File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
2023-05-18T18:19:37.2047381Z     return list(map(*args))
2023-05-18T18:19:37.2048070Z   File "/usr/local/lib/python3.10/dist-packages/conans/client/installer.py", line 288, in _download_pkg
2023-05-18T18:19:37.2048758Z     self._remote_manager.get_package(node.conanfile, node.pref, node.binary_remote)
2023-05-18T18:19:37.2049550Z   File "/usr/local/lib/python3.10/dist-packages/conans/client/remote_manager.py", line 105, in get_package
2023-05-18T18:19:37.2050153Z     pkg_layout = self._cache.get_or_create_pkg_layout(pref)
2023-05-18T18:19:37.2050913Z   File "/usr/local/lib/python3.10/dist-packages/conans/client/cache/cache.py", line 87, in get_or_create_pkg_layout
2023-05-18T18:19:37.2051571Z     return self._data_cache.get_or_create_pkg_layout(ref)
2023-05-18T18:19:37.2052320Z   File "/usr/local/lib/python3.10/dist-packages/conan/internal/cache/cache.py", line 127, in get_or_create_pkg_layout
2023-05-18T18:19:37.2052932Z     return self.get_package_layout(pref)
2023-05-18T18:19:37.2053645Z   File "/usr/local/lib/python3.10/dist-packages/conan/internal/cache/cache.py", line 107, in get_package_layout
2023-05-18T18:19:37.2054256Z     pref_data = self._db.try_get_package(pref)
2023-05-18T18:19:37.2055004Z   File "/usr/local/lib/python3.10/dist-packages/conan/internal/cache/db/cache_database.py", line 73, in try_get_package
2023-05-18T18:19:37.2055621Z     ref_data = self._packages.get(ref)
2023-05-18T18:19:37.2056294Z   File "/usr/local/lib/python3.10/dist-packages/conan/internal/cache/db/packages_table.py", line 62, in get
2023-05-18T18:19:37.2056863Z     r = conn.execute(query)
2023-05-18T18:19:37.2057355Z sqlite3.OperationalError: database is locked
@memsharded
Copy link
Member

Hi @sykhro

Thanks for your report

I have been trying to reproduce this, on Windows 10, Python 3.11, downloading opencv from ConanCenter that it has a bunch of transitive dependencies, and I haven't been able to make it fail (ran it several times)

Some quick questions:

  • Do you see this failing consistently? It is random or sporadic?
  • Does it happen both in Windows and Linux machines? Same consistent failures in both?
  • Does your Python installation has something in particular? Is it the mainstream installer from Python.org site?

@AbrilRBS
Copy link
Member

Note that I have also not been able to reproduce this with a similar setup in macos

@sykhro
Copy link
Author

sykhro commented May 19, 2023

Hi @sykhro

Thanks for your report

I have been trying to reproduce this, on Windows 10, Python 3.11, downloading opencv from ConanCenter that it has a bunch of transitive dependencies, and I haven't been able to make it fail (ran it several times)

Some quick questions:

  • Do you see this failing consistently? It is random or sporadic?
  • Does it happen both in Windows and Linux machines? Same consistent failures in both?
  • Does your Python installation has something in particular? Is it the mainstream installer from Python.org site?
  • It hass been happening 100% of the time on Linux, at least 50% of the time on Windows. Conan is called 4 times in a row ({ConanFileA, ConanFileB} x {Debug, Release}), it always seems to happen past the first run. I haven't seen this happen locally either, wondering if the high core-count might have something to do with this. There's a full 1000ms sleep in between calls.
  • Mainstream installer from python.org for Windows (3.11), official Ubuntu package (3.10). The job runs inside Docker on Linux.

@memsharded
Copy link
Member

Do you mean that there are parallel OS-level jobs/tasks, running over the same cache?
That would be expected, because the Conan cache is not designed for concurrency (https://docs.conan.io/2/knowledge/guidelines.html)

@sykhro
Copy link
Author

sykhro commented May 19, 2023

No, no parallel jobs. We call this function sequentially for each build config (Debug, Release) and for two different conanfiles.

def _install_from_conanfile(
    path_to_conanfile: str,
    path_to_host_profile: str,
    path_to_build_profile: str,
    deploy_folder=None,
    environment=None,
    build_missing=False,
):
    conan_run_args = [
        "conan",
        "install",
        path_to_conanfile,
        f"--profile:host={path_to_host_profile}",
        f"--profile:build={path_to_build_profile}",
        "--generator=CMakeToolchain",
    ]

    if build_missing:
        conan_run_args.append("--build=missing")

    if deploy_folder:
        conan_run_args[-1:-1] = [
            "--deploy",
            "full_deploy",
            "--output-folder",
            deploy_folder,
        ]

    conan_res = subprocess.run(
        conan_run_args, env=environment if environment else os.environ,
    )

@memsharded
Copy link
Member

Then yeah, maybe the high-core count is the necessary thing to fire this, thanks for the feedback.

@DoDoENT
Copy link
Contributor

DoDoENT commented Aug 1, 2023

The same thing now happened on my servers. The docker which runs the conan commands has --cpus 10, and global.conf defined 10 parallel jobs, but the underlying server actually has 56 vCores (28 real + hyperthreading). The dependency graph wasn't big at all - just 4 packages.

In my case, there was a parallel run, but each Jenkins node has its own docker container that doesn't share anything.

@memsharded
Copy link
Member

Thanks @DoDoENT for the extra feedback.
It seems there is enough evidence that DB concurrent access is causing issues, we need to have a deeper look at it, we haven't had time enough to check it yet

@memsharded
Copy link
Member

Ok, I have had a very interesting debugging session, resulting in #14410

It happened that the timeout in the DB of 1 second was not enough to acquire the locks when excessive concurrency, and it resulted in that error. I didn't expect that such timeout was actually raising when there were concurrent access and locks, so I was looking for 1hr in all the wrong places.

I have been testing raising it to 10 seconds, and it seems to be working fine now. Thanks again for the feedback!

@DoDoENT
Copy link
Contributor

DoDoENT commented Sep 22, 2023

We are still occasionally getting this error on our CI servers even with Conan 2.0.10 (haven't tested with 2.0.11, but nothing from release notes indicates that there were any changes there).

@DoDoENT
Copy link
Contributor

DoDoENT commented Oct 18, 2023

Happens still with conan 2.0.13 on servers with 56 threads under heavy server load.

@memsharded memsharded reopened this Oct 18, 2023
@memsharded
Copy link
Member

Lets try to raise the timeout to 20 seconds. Is it possible that the servers are just too loaded, so the system becomes a bit too unresponsive?

@DoDoENT
Copy link
Contributor

DoDoENT commented Oct 19, 2023

Is it possible that the servers are just too loaded, so the system becomes a bit too unresponsive?

That was the most probable cause in our case.

Lets try to raise the timeout to 20 seconds.

🤔, wouldn't it be just better to add a new config to global.conf that defines this timeout instead of hardcoding it? Thus, 10 seconds could remain the default as it works fine for most users and use cases, and people would have the option to tweak this timeout according to their needs. Who says that 20 seconds will always be enough? It will certainly be possible to make a scenario where this won't be enough (e.g. extremely heavy load, slow disks, machine swapping like there is no tomorrow).

@memsharded
Copy link
Member

I think because that is quite a difficult thing to understand and connect the dots to the conf when that is happening.

Who says that 20 seconds will always be enough? It will certainly be possible to make a scenario where this won't be enough (e.g. extremely heavy load, slow disks, machine swapping like there is no tomorrow).

I think that if the system cannot sync a simple DB read/write in 20 seconds, then the problem is a different one, and the system is already degraded to a point when other things can also go wrong. But I get your point, let's give it 20 seconds at this moment, and if it happens again we will consider the conf.

@vient
Copy link
Contributor

vient commented Oct 19, 2023

Look what I found python/cpython#70812

Seems that it is actually a "bug" in CPython sqlite3 module. Check file https://bugs.python.org/file42259/unintuitive_sqlite_behavior.py. There seems to be no option to fix this in <3.12 where CPython devs made a new autocommit option and deprecated isolation_level. Easies fix seems to be using fetchall() after all SELECTs, or you can wrap your SELECTs in BEGIN .. COMMIT.

@vient
Copy link
Contributor

vient commented Oct 19, 2023

I'm not 100% sure about this since you use with db.connection() as conn everywhere - it should explicitly close the connection on the exit from with block which will also close all existing requests through this connection? Maybe not.

Taken from the same unintuitive_sqlite_behavior.py, you may try to add r.close() after all r.fetchone() to explicitly close the cursor before closing the connection, hopefully it will help.

@memsharded
Copy link
Member

I'm not 100% sure about this since you use with db.connection() as conn everywhere - it should explicitly close the connection on the exit from with block which will also close all existing requests through this connection? Maybe not.

Yes, exactly, we did this in the past to avoid issues, and make sure everything was a transaction.
Maybe this is excessive and is putting a ton of extra load on the sqlite DB, which in turn can cause issues if the system is too loaded? This really only happen when enabling paralellism and heavy loads on the system.

The new autocommit from Python 3.12 is not enough, because we will not be able to enforce Python 3.12 at least in a few years :(, we should try other approaches.

Adding a cursors close() might be worth trying, specially if it doesn't have risks (it doesn't look like). As you investigated it (thanks very much!), would you like to open the PR yourself?

@AbrilRBS AbrilRBS reopened this Oct 20, 2023
@memsharded memsharded modified the milestones: 2.0.10, 2.0.14 Oct 20, 2023
@vient
Copy link
Contributor

vient commented Oct 20, 2023

I think proper way is to first check that original script from python issue works, then modify it to have same code as in conan and see if it still shows database locking. I can try to do it later.

@memsharded
Copy link
Member

Hi @vient

Did you manage to try that? I am following up on the open things for 2.0.14 release, we'd like to release it next week.

@vient
Copy link
Contributor

vient commented Oct 26, 2023

Ok, I planned to do it later but will try today then.

@vient
Copy link
Contributor

vient commented Oct 29, 2023

I did not manage to create a reproducer based on that CPython bug if connection is always closed after execute, seems that this is not related after all.

I noticed one interesting effect though: making db access exclusive improves speed a lot in tests with 100+ threads (and it's hard to measure effects with less threads). By exclusivity I mean placing whole BaseDbTable.db_connection body under acquired threading.Lock instance so only one db connection will be created at a time.
Maybe you will find that interesting:

  • database operations should be pretty fast compared with other stuff like downloading and unpacking packages so making db access exclusive may not really affect performance
  • it should fix this error since we will not start connecting until db is free so timeout should never kick in

Example

from contextlib import contextmanager
import sqlite3
import threading

DB_FILE = "test.db"

_db_lock = threading.Lock()

@contextmanager
def db_connection():
    with _db_lock:
        connection = sqlite3.connect(DB_FILE, isolation_level=None, timeout=120)
        try:
            yield connection
        finally:
            connection.close()

with db_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("create table if not exists t (k int primary key, v int)")
    with conn:
        cursor.execute("insert or ignore into t (k, v) values (1, 1), (2, 2)")

def fn():
    try:
        with db_connection() as conn:
            conn.execute("select k from t2").fetchone()
    except Exception:
        with db_connection() as conn2:
            conn2.execute("update t set v = 3 where k = ?", (2,))
    with db_connection() as conn3:
        conn3.execute("update t set v = 3 where k = ?", (2,))

threads = []
for _ in range(1000):
    threads.append(threading.Thread(target=fn))

for thr in threads:
    thr.start()

for thr in threads:
    thr.join()

Here I am creating 1000 threads. This version works 0.5s, this time is stable. Removing with _db_lock from db_connection function makes it work anywhere from 30 to 60 seconds. I ran it on an empty machine with 16 cores.

This was referenced Oct 29, 2023
@memsharded
Copy link
Member

I noticed one interesting effect though: making db access exclusive improves speed a lot in tests with 100+ threads (and it's hard to measure effects with less threads). By exclusivity I mean placing whole BaseDbTable.db_connection body under acquired threading.Lock instance so only one db connection will be created at a time.

Indeed very interesting. I wouldn't have expected this, and I would have trusted that the sqlite3 internals had equally efficient locking. But that is the thing with performance and parallelism, most often it is difficult to know or intuition doesn't work, and only profiling is useful. Good investigation.

@memsharded
Copy link
Member

I have ran it in Windows, and the savings are not that dramatic, but still large, for example 300 threads it reduces from 12 seconds to 6 seconds, still a big win, seems it is worth.

@vient
Copy link
Contributor

vient commented Oct 29, 2023

Without diving into sqlite code I can only guess that waiting on single lock is more efficient because threads are waiting in futex syscall and kernel can efficiently resume one of them. Sqlite on Linux uses POSIX file locks via fcntl which may not be implemented as efficiently? At least locking logic in sqlite is surely more complicated. Concurrent accesses may be beneficial when we have a lot of tables or do mostly reads but in this case we have only two tables, recipes and packages, and all threads perform writing.

@memsharded
Copy link
Member

Closed by #15029

@DoDoENT
Copy link
Contributor

DoDoENT commented Nov 22, 2024

This has now been replaced with this error (conan v2.9.3).

ERROR: Traceback (most recent call last):

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/cli/cli.py", line 307, in main

    cli.run(args)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/cli/cli.py", line 192, in run

    command.run(self._conan_api, args[0][1:])

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/cli/command.py", line 187, in run

    sub.run(conan_api, parser, *args)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/cli/command.py", line 205, in run

    info = self._method(conan_api, parent_parser, self._parser, *args)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jenkins/.conan2/extensions/commands/cmd_project.py", line 313, in project_ninja

    built_binaries, presets_info = project_utils.install_deps_and_create_project(

                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jenkins/.conan2/extensions/commands/project_utils.py", line 197, in install_deps_and_create_project

    cmake_preset, install_deps_graph = install_single_build_type(

                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jenkins/.conan2/extensions/commands/project_utils.py", line 440, in install_single_build_type

    conan_api.install.install_binaries(

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/api/subapi/install.py", line 29, in install_binaries

    installer.install(deps_graph, remotes, install_order=install_order)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conans/client/installer.py", line 255, in install

    self._download_bulk(install_order)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conans/client/installer.py", line 285, in _download_bulk

    thread_pool.map(self._download_pkg, downloads)

  File "/usr/lib/python3.12/multiprocessing/pool.py", line 367, in map

    return self._map_async(func, iterable, mapstar, chunksize).get()

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get

    raise self._value

  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker

    result = (True, func(*args, **kwds))

                    ^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar

    return list(map(*args))

           ^^^^^^^^^^^^^^^^

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conans/client/installer.py", line 296, in _download_pkg

    self._remote_manager.get_package(node.pref, node.binary_remote)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conans/client/remote_manager.py", line 134, in get_package

    pkg_layout = self._cache.create_pkg_layout(pref)

                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/internal/cache/cache.py", line 163, in create_pkg_layout

    self._db.create_package(package_path, pref, None)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/internal/cache/db/cache_database.py", line 91, in create_package

    self._packages.create(path, ref, build_id=build_id)

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/internal/cache/db/packages_table.py", line 80, in create

    with self.db_connection() as conn:

  File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__

    return next(self.gen)

           ^^^^^^^^^^^^^^

  File "/opt/jenkins/root/E0/b/venv/lib/python3.12/site-packages/conan/internal/cache/db/table.py", line 26, in db_connection

    assert self._lock.acquire(timeout=10), "Conan failed to acquire database lock"

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AssertionError: Conan failed to acquire database lock



ERROR: Conan failed to acquire database lock

It happens at random, only on servers with high number of cores (128 in our case), under high server load. Is there a config option to increase this timeout?

@vient
Copy link
Contributor

vient commented Nov 22, 2024

Not sure if this timeout is needed at all. If it is there as a "just-in-case check so conan does not get stuck forever", may as well be increased to 10 minutes? IMO it's better than adding a config option.

I think that timeout can be removed because all db operations look pretty safe - if they take long, it seems to always be because of the high load by other processes, which is normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants