Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File selection for check-file-format.sh seems over-broad and potentially misconfigured #129

Closed
regularfry opened this issue Sep 13, 2023 · 4 comments · Fixed by #130
Closed

Comments

@regularfry
Copy link
Contributor

Here we run a diff to see what files we need to check against editorconfig to make sure everything's aligned.

However there are two problems. The first is that this is a script we run in a pre-commit hook, but it checks all files changed since the divergence with the origin/main branch. That means we run it repeatedly on any file we've ever changed on our current branch, not just the changes we're trying to commit. That the script calls docker run... once per file that's ever changed in a loop makes this slow for even fairly short histories, especially if the upstream has had merges from others since we branched.

The second problem is that it assumes origin/main points at something controlled. In my current checkout, origin points at a fork, and the nhs-england-tools/repository-template repo is called upstream. In my case, it should be comparing to upstream/main. This makes it worse because inevitably my fork is behind, so it adds more files to the changed list.

I think this should just be calling git diff --name-only without the branch comparison. If it's getting called in every git commit, there's no sense in re-checking all the files changed in the branch. We only need to check what's changed in this commit specifically.

regularfry added a commit that referenced this issue Sep 13, 2023
Prior to this patch, the `git diff` command used to identify which files to check for editorconfig compliance caught too many files, and did the expensive part of the check inside the per-file loop.

By lucky chance, the editorconfig-checker container image has a `git` binary installed, so we can move the entire loop inside a single container invocation.

Fixes #129.
@regularfry
Copy link
Contributor Author

On further reflection, I think there's another problem lurking here. We're using git diff to identify the changed files, but unless pre-commit is doing something funky that's not what we want. It ignores staged changes. What I think we actually need is git diff --cached.

@stefaniuk
Copy link
Contributor

Thanks @regularfry for reporting this issue.

With regard to comparing changes to the origin/main by default, this is necessary because it is also run as a GitHub Action where only branches can be compared. The check file format script has been implemented to be consistent in its behaviour both as a commit hook and when run as a GitHub action. If this behaviour causes an issue for a forked repository, a workaround could be to export BRANCH_NAME=upstream/main, commit the changes, then unset BRANCH_NAME.

The command

files=$( (git diff --diff-filter=ACMRT --name-only ${BRANCH_NAME:-origin/main}; git diff --name-only) | sort | uniq )

selects all the staged files and modified files in the working tree. The last part of it, git diff --name-only) | sort | uniq, indeed seems unnecessary. Let's remove it.

What causes the slowness of this script is that a docker container is spun up for each file that has to be checked. Running the git diff from within the container addresses this issue and speeds up the execution. Thanks for the PR.

@regularfry
Copy link
Contributor Author

Responded over here

@stefaniuk
Copy link
Contributor

I think, introducing the mode switch/parameter is a good suggestion.

Let's make it simple - I like your original suggestion of using git from within the container. We should do that. Implementing this approach simplifies the check execution since passing filenames from outside the container might not be as trivial as it seems, especially when considering challenges like handling and escaping spaces in file names.

Here are the modes I envision being implemented:

  • check=all - check all files
  • check=staged-changes - check only the files that are currently staged
  • check=working-tree-changes- check only the files that have been modified in the working tree
  • check=branch - check files changed compared to the canonical branch (origin/main)

Subsequently, the git precommit hook will execute check=staged-changes ./scripts/githooks/check-file-format.sh and the GitHub action will call check=branch ./scripts/githooks/check-file-format.sh

stefaniuk pushed a commit that referenced this issue Sep 23, 2023
Prior to this patch, the `git diff` command used to identify which files to check for editorconfig compliance caught too many files, and did the expensive part of the check inside the per-file loop.

By lucky chance, the editorconfig-checker container image has a `git` binary installed, so we can move the entire loop inside a single container invocation.

Fixes #129.
github-merge-queue bot pushed a commit that referenced this issue Sep 26, 2023
<!-- markdownlint-disable-next-line first-line-heading -->
## Description

Prior to this patch, the `git diff` command used to identify which files
to check for editorconfig compliance caught too many files, and did the
expensive part of the check inside the per-file loop.

By lucky chance, the editorconfig-checker container image has a `git`
binary installed, so we can move the entire loop inside a single
container invocation.

Fixes #129.

## Context

It makes the editorconfig check, which is called in a pre-commit hook,
fast.

## Type of changes

<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply. -->

- [ ] Refactoring (non-breaking change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would change existing
functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants