Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up checkout for big repos? #77

Closed
plokhotnyuk opened this issue Nov 26, 2019 · 11 comments
Closed

How to speed up checkout for big repos? #77

plokhotnyuk opened this issue Nov 26, 2019 · 11 comments
Assignees
Milestone

Comments

@plokhotnyuk
Copy link

I would like to limit number of cloned branches to minimum. As example, cloning of the master branch takes ~7s at ~3MiB/s rate:

$ time git clone --single-branch --branch master https://github.com/plokhotnyuk/jsoniter-scala.git
Cloning into 'jsoniter-scala'...
remote: Enumerating objects: 260, done.
remote: Counting objects: 100% (260/260), done.
remote: Compressing objects: 100% (221/221), done.
remote: Total 29660 (delta 39), reused 236 (delta 27), pack-reused 29400
Receiving objects: 100% (29660/29660), 13.22 MiB | 2.92 MiB/s, done.
Resolving deltas: 100% (10355/10355), done.
 
real	0m6,484s
user	0m2,249s
sys	0m0,368s

While the checkout action with the following configuration takes more than 1.5m:

      - uses: actions/checkout@v1
        with:
          ref: ${{ github.ref }}
          fetch-depth: 100

Which options can be used to speed it up?

@eyal0
Copy link

eyal0 commented Nov 30, 2019

Your first command is downloading 13MiB and the second one downloaded 1.36GiB. So it makes sense that it would take longer.

The checkout action doesn't have --single-branch so it can't compete.

@ericsciple
Copy link
Contributor

@plokhotnyuk i'm making perf improvements with #70

i'll merge it into master tomorrow and push a tag v2-beta

@ericsciple ericsciple self-assigned this Dec 2, 2019
@ericsciple ericsciple added this to the v2 milestone Dec 2, 2019
@ericsciple
Copy link
Contributor

it should be faster even than single-branch. it will just fetch the single commit

@plokhotnyuk
Copy link
Author

I can make a single commit by setting the depth to 1...
Please ensure that it will still work for greater values and will fetch tags too.

@ericsciple
Copy link
Contributor

@plokhotnyuk you can now try out actions/checkout@v2-beta. let me know

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Dec 3, 2019

@ericsciple It doesn't work for Windows atm:


Run actions/checkout@v2-beta
Added matchers: 'checkout-git'. Problem matchers scan action output for known warning or error strings and report these inline.
Syncing repository: plokhotnyuk/jsoniter-scala
Working directory is 'd:\a\jsoniter-scala\jsoniter-scala'
"C:\Program Files\Git\bin\git.exe" version
git version 2.23.0.windows.1
Removed matchers: 'checkout-git'
##[error]Command failed: rd /s /q "d:\a\jsoniter-scala\jsoniter-scala"
The process cannot access the file because it is being used by another process.

##[error]Node run failed with exit code 1

On other envs builds have failed too just after updating of dependencies... while checking out completed successfully in 4 sec: https://github.com/plokhotnyuk/jsoniter-scala/pull/434/checks?check_run_id=331500758

@ericsciple
Copy link
Contributor

@plokhotnyuk sorry the windows issue is fixed now - unfortunately when i originally tested on windows, i had overridden the path input :(

@ericsciple
Copy link
Contributor

published v2 tag

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Dec 13, 2019

@ericsciple Thanks, but it doesn't work for me:
https://github.com/plokhotnyuk/jsoniter-scala/runs/346900860

My build requires the last release tag to be available for checking of binary compatibility:
https://github.com/plokhotnyuk/jsoniter-scala/blob/master/build.sbt#L5

@ericsciple
Copy link
Contributor

@plokhotnyuk tracked in this issue #100

Basically perf was optimized for the mainline scenario (download single commit only).

Currently I'm planning to update the README with scenarios, e.g. fetch all tags. I'm working on those docs today. Checkout v2 leaves the auth token in the git config (removed post-job). So for the tags scenario, the guidance will be something like add a step run: git fetch --depth=1 origin +refs/tags/*:refs/tags/*

We may end up adding an input like tags: true to fetch all tags. Not sure yet, gathering feedback based on different scenarios and trying to distill down to one input hopefully. Trying to avoid a bunch of random inputs. For example, see discussion on issue #93 regarding a more generic input like refspec: tags

Let me know what you think.

@plokhotnyuk
Copy link
Author

@ericsciple Thank you a lot for you support!

The following config works fine and completes both steps in ~4 seconds:

      - uses: actions/checkout@v2
        with:
          fetch-depth: 100
      - name: Fetch tags
        run: git fetch --depth=100 origin +refs/tags/*:refs/tags/*

It didn't work with depth=1 for me.

Using of tags to get version of the latest release is quite handful, especially in case of multiple maintained branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants