Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloning git repo 4 times when adding a git dependency #5188

Closed
3 tasks done
yajo opened this issue Feb 11, 2022 · 3 comments · Fixed by #5428
Closed
3 tasks done

Cloning git repo 4 times when adding a git dependency #5188

yajo opened this issue Feb 11, 2022 · 3 comments · Fixed by #5428
Labels
kind/bug Something isn't working as expected

Comments

@yajo
Copy link

yajo commented Feb 11, 2022

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Fedora Silverblue 35
  • Poetry version: 1.1.13
  • Link of a Gist with the contents of your pyproject.toml file: Not needed.

Issue

To reproduce it:

  1. Add a basic pyproject.toml with poetry init
  2. Run poetry add git+https://github.com/odoo/odoo.git#15.0

It will take about forever x4.

I included locally a cherry-pick of python-poetry/poetry-core#290 with:

pipx install poetry
pipx inject poetry git+https://github.com/moduon/poetry-core.git@stable-git-clone-blobless

Then, repeat those steps, and it will take about 5 minutes. Still too much.

I have executed this command:

time py-spy record --format speedscope --idle --threads --subprocesses --output ~/Downloads/poetry.speedscope.json.txt poetry add git+https://github.com/odoo/odoo.git#15.0

It produced this tracing file, that you can upload to https://www.speedscope.app/ to browse the performance poetry.speedscope.json.txt

Once you're browsing that, use the top dropdown thread selector and choose these threads:

  • Process 75070 Thread 75070 "" (1/124)
  • Process 75070 Thread 75272 "Thread-5 (_install)" (3/124)

Search for "clone" (Use Ctrl+F to open search). You'll see 4 clones being highlighted. I put screenshots here to make it easier in case you're not familiarized with speedscope:

image
image

You can see that each one of those sections takes about 1:00 to 1:20 minutes. Sum the normal poetry operations for solving dependencies and you have the about 5 minutes it takes.

Of course, without python-poetry/poetry-core#290 it takes forever because Odoo is a huge repo, and without --filter=blob:none it's impossible. Besides, Poetry is cloning the whole repo, not only the selected branch.

Looking at the code and comparing it with the speedscope graph, I can see the problem:

  1. Each time Poetry calls get_package_from_vcs(), it clones the repo in a different temporary path:

    tmp_dir = Path(mkdtemp(prefix=f"pypoetry-git-{suffix}"))

    That path is then removed:

    safe_rmtree(str(tmp_dir))

  2. When all is solved and finally Poetry wants to install the git dependency inside the venv, it uses a different dir. It is not temporary this time, but surprisingly it will be removed if found:

    src_dir = self._env.path / "src" / package.name
    if src_dir.exists():
    safe_rmtree(str(src_dir))
    src_dir.parent.mkdir(exist_ok=True)

So, it's easy to infer where the performance problem comes from.

This is not just a performance problem; it's also a reproducibility problem. Cloning 4 times, a commit can easily land in the repo in the mean time.

I think that Poetry needs to have a proper caching system, and:

  1. On 1st call, save the repo into .cache/pypoetry/some-reproducible-hash
  2. On further calls, if the cache exists, use that instead of cloning again.
  3. On install, just move the cache to the new location.

Another option:

  1. On 1st call, save the repo into .cache/pypoetry/some-reproducible-hash
  2. On futher calls, clone, but using git clone --reference .cache/pypoetry/some-reproducible-hash ...
  3. On the last call (for installing), use git clone --reference .cache/pypoetry/some-reproducible-hash --dissociate ...

All of this apart from merging python-poetry/poetry-core#290.

@moduon MT-83

@yajo yajo added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Feb 11, 2022
@abn
Copy link
Member

abn commented Apr 8, 2022

#5428 should improve this situation a bit. @yajo I am not sure how to implement your poetry-core fix in dulwich yet. But even without it it should be fairly faster now.

@abn
Copy link
Member

abn commented May 2, 2022

Resolved-by: #5428

@abn abn closed this as completed May 2, 2022
@mkniewallner mkniewallner removed the status/triage This issue needs to be triaged label Jun 11, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants