feat: Add parallelism to major chunk of hooks. Check `Parallelism` section in README #620

MaxymVlasov · 2024-02-09T21:12:50Z

Put an x into the box if that apply:

This PR introduces breaking change.
This PR fixes a bug.
This PR adds new functionality.
This PR enhances existing functionality.

Reasoning

We have a GH workflow that runs lockflies updates every week (implementation and reasoning here).
It usually takes from 2h 30min to 3h 15min. That was fine for us, till we found that our GH runners, based on AWS EC2s, started silently failing after 30min "without recent logs", and that was fixed by crutch which sends a dummy log every 10min.

However, during the debugging, I spent some time describing why hooks were not utilizing all the provided resources.
And that means a waste of time and money, not only for that corner case but for every huge commit, which can cause opting out by git commit -n of using hooks locally for changes that affect many directories.

Description of your changes

Add per-hook --parallelism-limit setting to --hook-config. Defaults to number of logical CPUs - 1
~~Main implementation uses flock, which not pre-installed in macOS and some ancient BSD/Linux systems.~~
- ~~To maintain backward compatibility, for cases where no flock - fallback to "bash-native locking mechanism" was added.~~
As quick tests show, ~5% of stacks face race condition problem, no matter if any locking mechanism exists or dirs try to init in parallel. I suppose the lock failed as it uses disk when hooks run in memory, so the creation of the lock can take some time as there bunch of caches between Mem and Disk. These milliseconds are enough to allow running a few t init in parallel.
Final implementation uses a retry mechanism for cases when race condition failed to t init directory.

In quick tests, I can say that on big changes:

~~With flock, there 300-500% speed increase for most of the hooks.~~
~~Without flock, there 250-450% speed increase for most of the hooks.~~
Up to 2000% speed increase for terraform_validate, and up to 500% - for other affected hooks.
When --parallelism-limit=1 I observed an insignificant increase in time (about 5-10%) compared to v1.86.0 which has no parallelism at all. This may be the cost of maintaining parallelism or the result of external factors since the tests were not conducted in a vacuum.

For small changes, improvements are less significant.

How can we test changes

There is no easy way to test it, but that could be used as a testing template

repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
  rev: f8d26f85b054e2842ef1b0c5bcef82d8c8cf1832
  hooks:
    # - id: infracost_breakdown
    #   args:
    #     # - --args=--config-file=test/infracost.yml
    #     - --args=--path=./environment/qa
    #     - --args=--terraform-plan-flags="-var-file=../../test.tfvars"t
    #     - --args=--terraform-plan-flags="-var-file=../../test2.tfvars"
    #     - --hook-config='.totalHourlyCost >= 0.1'
    #     - --hook-config=".totalHourlyCost|tonumber > 1"
    #     - -h ".projects[].diff.totalMonthlyCost|tonumber!=10000"
    #     - -h [.projects[].diff.totalMonthlyCost | select (.!=null) | tonumber] | add > 1000
    #   verbose: true # Always show costs

    # - id: terraform_checkov
    #   args:
    #   #   - --args=--quiet
    #     - --args=--skip-check CKV_SECRET_14
    #     # - --env-vars=ANSI_COLORS_DISABLED=true
    #     # - --hook-config=--parallelism-limit=1


    # - id: terraform_docs
    #   args:
    #   # - --hook-config=--path-to-file=README.md        # Valid UNIX path. I.e. ../TFDOC.md or docs/README.md etc.
    #   # - --hook-config=--add-to-existing-file=true     # Boolean. true or false
    #   - --hook-config=--create-file-if-not-exist=true # Boolean. true or false
    #   # - --args=--config=__GIT_WORKING_DIR__/.terraform-docs.yml
    #   - --args=--config=.terraform-docs.yml
    #   # - --args=--lockfile=false
    #   # - --args=--max-depth=2


    # - id: terraform_fmt
    #   exclude: environment\/.*$


    # - id: terraform_providers_lock
    #   args:
    #   # - --args=-platform=windows_amd64
    #   # - --args=-platform=darwin_amd64
    #   - --args=-platform=linux_amd64

    #   - --hook-config=--mode=always-regenerate-lockfile
    #   # - --hook-config=--parallelism-limit=CPU_NUM*2
    #   verbose: true

    # - id: terraform_tflint
    #   args:
    #     - --args=--config=__GIT_WORKING_DIR__/.config/tflint.hcl
    #     # # - '--args=--config=__GIT_WORKING_DIR__/${CONFIG_FILE}' # export CONFIG_FILE=.tflint.hcl
    #     # - --args=--config=__GIT_WORKING_DIR__/${CONFIG_NAME}.${CONFIG_EXT} # export CONFIG_NAME=.tflint; export CONFIG_EXT=hcl
    #     - --args=--module

    # - id: terraform_tfsec
    #   args:
    #       - --args=--config-file=.tfsec.json
    #       - --args=--force-all-dirs
    #       - --args=--exclude-downloaded-modules
    #       - --args=--concise-output
    #       - --args=--config-file=__GIT_WORKING_DIR__/.tfsec.json

    # - id: terraform_trivy
    #   args:
    #     - --hook-config=--parallelism-limit=CPU_NUM*4


    # - id: terraform_validate
    #   args:
    #     - --tf-init-args=-upgrade
    #     - --tf-init-args=-get=true
    #     - --env-vars=AWS_DEFAULT_REGION="us-west-2"
    #     - --env-vars=AWS_ACCESS_KEY_ID="anaccesskey"
    #     - --env-vars=AWS_SECRET_ACCESS_KEY="asecretkey"
    #     - --hook-config=--retry-once-with-cleanup=true     # Boolean. true or false
    # #    - --hook-config=--parallelism-limit=CPU_NUM*4
    #   verbose: true



    # - id: terraform_validate
    #   args:
    #     - --hook-config=--retry-once-with-cleanup=true


    # - id: terragrunt_fmt


    # - id: terragrunt_validate


    # - id: terrascan
    #   args:
    #     - --args=--non-recursive # avoids scan errors on subdirectories without Terraform config files
    #     - --args=--policy-type aws
    #     # - --hook-config=--parallelism-limit=CPU_NUM*2


    - id: tfupdate
      name: Autoupdate AWS provider versions
      args:

        - --args=provider aws
        - --args=--version '~> 4.2.0'
        - --hook-config=--parallelism-limit=1


    # - id: tfupdate
    #   name: Autoupdate Helm provider versions
    #   args:
    #     - --args=provider helm
    #     - --args=-v 2.5.0
    #     - --args=--ignore-path sdfs/sfdfs

    # - id: tfupdate
    #   name: Update Google provider to specified version
    #   args:
    #     - --args=provider google
    #     - --args=--ignore-path sdfs/sfdfs
    #     - --args=--version ">= 4.18.0, < 5"

    # - id: tfupdate
    #   name: Update Terraform provider to specified version
    #   args:
    #     - --args=terraform
    #     - --args=--version "> 1.1.8"
    #     - --env-vars=AWS_DEFAULT_REGION="us-west-2"
    #     - --env-vars=AWS_ACCESS_KEY_ID="anaccesskey"
    #     - --env-vars=AWS_SECRET_ACCESS_KEY="asecretkey"

As non-zero codes now can popup only in subshells, there is no more reason to keep that option in main shell

This reverts commit b6017f1.

Co-authored-by: George L. Yermulnik <[email protected]>

README.md

hooks/_common.sh

…623) So, I found that `nproc` always shows how many CPUs available is. K8s "limits" and docker `--cpus` are throttling mechanisms, which do not hide the visibility of all cores. There are a few workarounds, but IMO, it is better to implement checks for that than do them >Workaround for docker - set `--cpuset-cpus` >Workaraund for K8s - somehow deal with [kubelet static CPU management policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-management-policies), as [recommend in Reddit](https://news.ycombinator.com/item?id=25224714) * Send all "colorify" logs through stderr, as make able to add user-facing-logs in functions that also need to return same value to the function-caller. Needed for `common::get_cpu_num` err_msg show up --------- Co-authored-by: George L. Yermulnik <[email protected]>

hooks/terraform_validate.sh

hooks/_common.sh

Details: #620 (review)

README.md

hooks/_common.sh

Co-authored-by: George L. Yermulnik <[email protected]>

antonbabenko

OMG! It seems to work as expected ‼️

yermulnik · 2024-02-17T19:57:35Z

OMG! It seems to work as expected ‼️

Yeah, that's a bit of magic by Max 😄

# [1.87.0](v1.86.1...v1.87.0) (2024-02-17) ### Features * Add parallelism to major chunk of hooks. Check `Parallelism` section in README ([#620](#620)) ([6c6eca4](6c6eca4))

antonbabenko · 2024-02-17T20:42:16Z

This PR is included in version 1.87.0 🎉

MaxymVlasov added 17 commits February 8, 2024 20:59

Init paralelizm

6f44b5d

Remove not more needed.

4011cfc

As non-zero codes now can popup only in subshells, there is no more reason to keep that option in main shell

[WIP] Add debug logs for testing

007b752

If plugin_cache_dir specified - lock it to avoid bug

819d913

fixup flock

30b3069

Test flock on directory rather than stdin

b6017f1

Revert "Test flock on directory rather than stdin"

357b72e

This reverts commit b6017f1.

Try --no-fork option and locking plugin_cache_dir, which more logical

2b04472

Try without --no-fork

92cc899

Add fallback for macOS&Co w/o flock and hook-config=--parallelism_limit

2c52da1

TEST that tha non-flock works

d1a3a70

Add lockdir cleanup after unexcpected end

fd8becf

return flock, testing parallelism

e430641

dbg

467959c

Fix issue when all hook-config ignored except first

91ffb99

Cleanup debug log

6051543

Add docs and rename CPU_NUM to CPU

f8d26f8

MaxymVlasov and others added 2 commits February 13, 2024 03:05

Update test results to new implementation

a427a7c

Apply suggestions from code review

ac4d89f

Co-authored-by: George L. Yermulnik <[email protected]>

MaxymVlasov commented Feb 13, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

MaxymVlasov added 2 commits February 13, 2024 03:13

Apply suggestions from code review

7022a29

f

598d16b

MaxymVlasov requested a review from yermulnik February 13, 2024 01:16

yermulnik approved these changes Feb 13, 2024

View reviewed changes

MaxymVlasov requested review from antonbabenko and robinbowes and removed request for robinbowes February 13, 2024 13:33

MaxymVlasov commented Feb 14, 2024

View reviewed changes

README.md Show resolved Hide resolved

MaxymVlasov commented Feb 14, 2024

View reviewed changes

hooks/_common.sh Outdated Show resolved Hide resolved

MaxymVlasov commented Feb 15, 2024

View reviewed changes

hooks/terraform_validate.sh Show resolved Hide resolved

Fix issue with "no logs shown" when something goes wrong (#624)

01d3236

MaxymVlasov commented Feb 15, 2024

View reviewed changes

hooks/_common.sh Outdated Show resolved Hide resolved

Count --parallelism-ci-cpu-cores only in edge-cases

2bf56bc

Details: #620 (review)

yermulnik reviewed Feb 15, 2024

View reviewed changes

README.md Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

hooks/_common.sh Outdated Show resolved Hide resolved

hooks/_common.sh Show resolved Hide resolved

MaxymVlasov and others added 4 commits February 15, 2024 21:05

Update README.md

60a106d

Co-authored-by: George L. Yermulnik <[email protected]>

Update hooks/_common.sh

89ccdcc

Add more warnings and deal with possible bugs

4cf480f

Improve comment

60bfeea

antonbabenko approved these changes Feb 17, 2024

View reviewed changes

Merge branch 'master' into feat/parallelizm

d84ce5d

MaxymVlasov merged commit 6c6eca4 into master Feb 17, 2024
5 checks passed

MaxymVlasov deleted the feat/parallelizm branch February 17, 2024 20:41

maxbrunet mentioned this pull request Mar 6, 2024

terraform validate: Force rerun t init when plugin cache is enabled and parrallelism conflit happens #640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add parallelism to major chunk of hooks. Check `Parallelism` section in README #620

feat: Add parallelism to major chunk of hooks. Check `Parallelism` section in README #620

MaxymVlasov commented Feb 9, 2024 •

edited

Loading

antonbabenko left a comment

yermulnik commented Feb 17, 2024

antonbabenko commented Feb 17, 2024

feat: Add parallelism to major chunk of hooks. Check Parallelism section in README #620

feat: Add parallelism to major chunk of hooks. Check Parallelism section in README #620

Conversation

MaxymVlasov commented Feb 9, 2024 • edited Loading

Reasoning

Description of your changes

How can we test changes

antonbabenko left a comment

Choose a reason for hiding this comment

yermulnik commented Feb 17, 2024

antonbabenko commented Feb 17, 2024

feat: Add parallelism to major chunk of hooks. Check `Parallelism` section in README #620

feat: Add parallelism to major chunk of hooks. Check `Parallelism` section in README #620

MaxymVlasov commented Feb 9, 2024 •

edited

Loading