-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #167 : API performances via DB redesign #300
Conversation
* CI: Add Docker layer caching for conda store and server * CHORE: Add Hadolint for Dockerfile linting * CI: Add Docker Build Push action and caching * CI: Add setup Buildx action * CI: Add Docker Build Push action and caching for release workflow
…#249) * Adding ability for conda-store server to use proxy X-Forward- headers * Fixing github-actions specification * Black formatting * Spelling
Closes #247 This is a shorter term fix but does seem to effectively limit the number of connections.
* Adding ability to modify and validate specification Closes #117 * Adding stricter validation of schemas * Flake8 error * Adding documentation * Documentation linting errors
* Adding sane defaults to conda-store for environments channels/packages Closes #253 * Black formatting * Adding documentation * Making schema more robust * Black formatting
* For local dev, allow access to all endpoints * Install dev branch of Gator * Adding filesystem to default environment * Setup condarc properly * Making ipykernel always be added to environment Co-authored-by: Chris Ostrouchov <[email protected]>
* Making conda package population independent from package builds Closes #257 * Adding conda to the dependencies
Closes #260 We are now dependant on the rest api
…271) * Adding permissions view to user ui endpoint along with api endpoint * Adding documentation on the new permissions * Black formatting * Split namespace and name in UI for user
Bellow -> Below
Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
allow cookies cross domain
This weirdly caused the kubernetes example to reload. Fixed.
task_key = f"lock_{self.name}_{channel_name}" | ||
|
||
is_locked = False | ||
lock = conda_store.redis.lock(task_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierrotsmnrd redis was removed as a requirement (due to trying to allow for a standalone mode). Is it possible for us to make this a non-requirement for the locking? I know that redis the the correct way to handle this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costrouc We absolutely need a lock of some sort, to avoid updating a channel twice at the same time to prevent from any error. The algorithm to populate the database is based on this assumption - which I guess is fine considering it's now very fast.
What other options do we have for a lock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierrotsmnrd we could also use a filelock which for now will work fine. It will break down when we have multiple celery workers but in that case they should have redis anyways.
@pierrotsmnrd performance is great love how you preserved the api. I don't see any conflicts except for requiring redis. It is completely fine if it is optional. It is possible to check for redis and if it does not exist fallback with to retrying+backoff if bulk inserts fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two examples fail -- examples/docker and examples/standalone. The docker one is failing due to a permissions issue. This just needs a chmod in the https://github.com/Quansight/conda-store/blob/fix167_api_perf/conda-store-server/Dockerfile#L26-L32.
conda-store-worker_1 | Traceback (most recent call last):
conda-store-worker_1 | File "/opt/conda/envs/conda-store-server/bin/conda-store-worker", line 8, in <module>
conda-store-worker_1 | sys.exit(main())
conda-store-worker_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/traitlets/config/application.py", line 978, in launch_instance
conda-store-worker_1 | app.start()
conda-store-worker_1 | File "/opt/conda-store-server/conda_store_server/worker/app.py", line 81, in start
conda-store-worker_1 | self.conda_store.ensure_directories()
conda-store-worker_1 | File "/opt/conda-store-server/conda_store_server/app.py", line 391, in ensure_directories
conda-store-worker_1 | os.makedirs(self.store_directory, exist_ok=True)
conda-store-worker_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/os.py", line 225, in makedirs
conda-store-worker_1 | mkdir(name, mode)
conda-store-worker_1 | PermissionError: [Errno 13] Permission denied: '/opt/conda-store/conda-store'
The standalone one is not working due to sqlite not having the alter capability and this needs to be fixed. This can be accomplished with the batch alter seen here https://github.com/Quansight/conda-store/blob/fix167_api_perf/conda-store-server/conda_store_server/alembic/versions/48be4072fe58_initial_schema.py#L161-L169. This uses alter in the postgresql case and copy/drop in the sqlite case where alter is not supported.
age_build", type_="unique")
conda-store-server_1 | File "<string>", line 8, in drop_constraint
conda-store-server_1 | File "<string>", line 3, in drop_constraint
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/alembic/operations/ops.py", line 218, in drop_constraint
conda-store-server_1 | return operations.invoke(op)
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/alembic/operations/base.py", line 399, in invoke
conda-store-server_1 | return fn(self, operation)
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/alembic/operations/toimpl.py", line 184, in drop_constraint
conda-store-server_1 | operations.impl.drop_constraint(
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/alembic/ddl/sqlite.py", line 90, in drop_constraint
conda-store-server_1 | raise NotImplementedError(
conda-store-server_1 | NotImplementedError: No support for ALTER of constraints in SQLite dialect. Please refer to the batch mode feature which allows for SQLite migrations using a copy-and-move strategy.
conda-store-server_1 | Traceback (most recent call last):
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/bin/conda-store-server", line 8, in <module>
conda-store-server_1 | sys.exit(main())
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/traitlets/config/application.py", line 977, in launch_instance
conda-store-server_1 | app.initialize(argv)
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/traitlets/config/application.py", line 110, in inner
conda-store-server_1 | return method(app, *args, **kwargs)
conda-store-server_1 | File "/opt/conda-store-server/conda_store_server/server/app.py", line 173, in initialize
conda-store-server_1 | dbutil.upgrade(self.conda_store.database_url)
conda-store-server_1 | File "/opt/conda-store-server/conda_store_server/dbutil.py", line 97, in upgrade
conda-store-server_1 | check_call(["alembic", "-c", alembic_ini, "upgrade", revision])
conda-store-server_1 | File "/opt/conda/envs/conda-store-server/lib/python3.10/subprocess.py", line 369, in check_call
conda-store-server_1 | raise CalledProcessError(retcode, cmd)
conda-store-server_1 | subprocess.CalledProcessError: Command '['alembic', '-c', '/tmp/tmp1an46k_i/alembic.ini', 'upgrade', 'head']' returned non-zero exit status 1.
|
||
is_locked = False | ||
|
||
if conda_store.redis_url is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this pattern and using try/finally is using the https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack. This way you don't have to enter/exit manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. Thanks for all the work @pierrotsmnrd. I was able to run this locally and it was snappy.
As described in issue #167, the table
conda_package
needs to be split into 2 tables :conda_package
containing the data about the packages, regardless of the builds : name, license, version, summary ..conda_package_build
containing the data about the builds of a conda package : build number, constrains, md5, size, timestamp ...Impacts of splitting this into two tables :
build
) used to be linked to aconda_package
. After this, they need to be associated to aconda_package_build
.conda_package_build
, and oneconda_package_build
can be linked to N builds : this results in a N-N association. the tablebuild_conda_package_build
is used for this purpose.Check the new DB schema :
I'm opening this PR as draft because I need to see how the API tests behave on GH actions, I struggle to make them work properly locally. I expect multiple changes to be needed.
Multiple questions for you @costrouc :
Question 1 : When I started this work, the tables
solve
andsolve_conda_package
were not there yet. I assume thatsolve_conda_package
should point to aconda_package_build
instead of aconda_package
. Do you confirm ? If yes, I'll make the change.Question 2 : inside the table
conda_package
, multiple data columns are still duplicated. For instance, the license, the description, the summary, is always the same for each package, regardless of the channel.Putting these data into as
conda_package_metadata
would make the DB lighter and save space, and improve performances as we would request these metadata only when needed.This can be done later in a separate PR, or now.
Question 3 : check the conflict below between my branch and main, in
orm.py
, that I don't resolve on purpose.The
batch_size
parameter was used to reduce the load on the DB when updating conda channels, right ?Now that we don't insert as much data as before, do you think it's still needed ?
Question 4 : do you have any metric relative to performances so I can compare ? Can we imagine having tests that measure the performance and helps comparing the improvements ? it's hard to do so manually.
TODO :