Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking only links from modified files in a Pull Request? #17

Closed
polarathene opened this issue Feb 27, 2021 · 18 comments
Closed

Checking only links from modified files in a Pull Request? #17

polarathene opened this issue Feb 27, 2021 · 18 comments
Labels
enhancement New feature or request

Comments

@polarathene
Copy link

polarathene commented Feb 27, 2021

This is a feature that github-action-markdown-link-check offers.

Could this action support that as well? (or document advice on how to provide only relevant file paths to check)


Almost 18 months later, I revisited this request, comments of interest are:


Examples - Workflows that can be used currently

The first two support file paths containing spaces.

Without considering file paths containing spaces; if a mistake is made or a third-party contributor provides a file path containing a space, lychee will treat that path as separate files/inputs which would be invalid (or the wrong files to check).

Manually generating the file paths to check

I looked into this extensively, and I'm rather confident of this approach.

Click to view YAML
  • Remove or adjust the --diff-filter option as appropriate to your needs. View the git diff-tree docs linked for other options that may be of interest such as ignoring white-space only changes.
  • The sed command is specifically for lychee-action needs. It folds the multi-line output of git diff-tree (\n / line delimited list of file paths_) into a single line, separating paths with single quotes and later in the lychee-action.args input adding single quotes to the start/end, so that file paths can support spaces when provided as a single line to eval (which lychee-action uses internally). This handling is not required for lychee, which can instead replace sed with xargs -d '\n' lychee --.
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github (HEAD) to the base branch (HEAD^1)
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M)
      - name: 'Example - Get a list of changed files to process'
        id: changed-files
        run: |
          CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z "s/\n$//;s/\n/' '/g")
          echo "::set-output name=all_changed_files::${CHANGED_FILES}"

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/[email protected]
        with:
          args: --verbose --no-progress -- '${{ steps.changed-files.outputs.all_changed_files }}'

Delegate to tj-actions/changed-files action to get the file paths

After this PR is resolved, the pre-processing step for lychee-action can likely be dropped and the output of tj-actions/changed-files provided to a new input for lychee-action that handles the delimited list of paths internally.

Click to view YAML
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github of the PR branch into the base branch
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M) | (via outputs.all_changed_files)
      - name: 'Example - Get a list of changed files to process'
        id: get-changed-files
        uses: tj-actions/changed-files@v23
        with:
          separator: ','

      # lychee-action requires pre-processing for file paths with spaces (boundary quotes added in lychee-action input):
      - name: 'Compatibility with eval: Quote wrap paths to support spaces'
        id: changed-files
        run: |
          QUOTE_WRAPPED="$( sed "s/,/' '/g" <<< '${{ steps.get-changed-files.outputs.all_changed_files }}' )"
          echo "::set-output name=all_changed_files::${QUOTE_WRAPPED}"

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/[email protected]
        with:
          args: --verbose --no-progress -- '${{ steps.changed-files.outputs.all_changed_files }}'

Simpler version when file paths of inputs would never contain spaces

If you are confident that this won't be a concern to support, the extra pre-processing step can be avoided.

NOTE: If adapting to the "manual" example, you must still convert \n to space, unless lychee gets a --inputs-from option, then you can provide a file with git diff-tree output directly (or without writing an actual file).

Click to view YAML
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github of the PR branch into the base branch
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M) | (via outputs.all_changed_files)
      - name: 'Example - Get a list of changed files to process'
        id: get-changed-files
        uses: tj-actions/changed-files@v23

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/[email protected]
        with:
          args: --verbose --no-progress -- ${{ steps.changed-files.outputs.all_changed_files }}
@mre
Copy link
Member

mre commented Feb 27, 2021

Hey that's a good feature to have. I checked the code (https://github.com/gaurav-nelson/github-action-markdown-link-check/blob/master/entrypoint.sh#L155) and it doesn't seem too hard to port.

@polarathene

This comment was marked as resolved.

@mre
Copy link
Member

mre commented Aug 23, 2021

Just to clarify, the diff checking would exactly work like in github-action-markdown-link-check because we'd use the same code. So if that works for you, we should be all set. 😄

@polarathene

This comment was marked as outdated.

@mre mre added the enhancement New feature or request label Feb 4, 2022
@tooomm
Copy link
Contributor

tooomm commented Jun 16, 2022

I like this idea a lot! Is this still something that you plan on adding?

@mre
Copy link
Member

mre commented Jun 21, 2022

Yes, I still plan to add that feature by porting over the code from github-action-markdown-link-check.

@luisorbaiceta
Copy link

This would be amazing! I was doing some testing to add the action to the fastify repo, but it reaches the api rate limit easily for github links

@tooomm
Copy link
Contributor

tooomm commented Jun 21, 2022

This issue and the markdown-link-check action mention "check modified files" in particular.
I was hoping to only "check modified lines/links" or "check changes", so even more narrow. The file basis doesn't help me actually as it's all in one file already. 🙈

Would that also be possible and do you consider this a useful addition?

@polarathene
Copy link
Author

@luisorbaiceta

but it reaches the api rate limit easily for github links

Internal links workaround via local webserver

If the links are internal, you could run a simple webserver (Caddy is quite easy). At least I assume Github has no restrictions for links that aren't external traffic? (I tried looking up docs on rate limits, but only found mention of these applying to APIs or to products/services Github hosts)

If your internal links to test aren't able to be replaced/mapped to localhost for that local webserver instance, then you should be able to add hostname into /etc/hosts and that should work well.


Minimizing requests

Prioritize reachable hosts, then individual links

For external links, I suppose the best options available to minimize requests would be caching the FQDN availability and only checking the first request to each unique FQDN, then afterwards you could check all other links are actually valid, not just the host for it is reachable.

Leverage the response cache

I still haven't got to try lychee out yet, but the devs did add response caching support a while back. If you're hitting the limit due to frequency of runs, leveraging that caching feature across runs should make a big difference :)


Workaround for rate-limit

For restricted network requests, I did see this issue, where they are apparently hitting limits because of some links to Github hosted services, which suggested excluding these with a lychee option. You could always have a scheduled workflow that checks your links daily (or hourly) with the response caching in place for working around that hourly limit (with the cache being a larger TTL like 24 hours).

Assuming responses that get Network error: Too Many Requests response can be tried again by this workflow next time. You could then have any 404's and the like reported via an action that raises an issue listing them. If you have multiple scheduled runs in a day, you could cache those 404 responses separately across runs for the final daily task to report.

You would probably want to have such workflows (or this action in general) be conditionally run only if querying the current available requests for the hour is still sufficient to support any other workflows you have that may need to perform Github API requests.

@polarathene
Copy link
Author

polarathene commented Jun 22, 2022

@tooomm

The file basis doesn't help me actually as it's all in one file already. 🙈
Would that also be possible and do you consider this a useful addition?

Do you really need filtering to individual lines for a single file? If you don't have any problems you're trying to avoid, you'll have better reliability with processing the entire file through lychee from what I found. If you want that functionality, you can implement it in your CI workflow to pass to lychee as shown below.


Here is an example that retrieves the diff from this commit (docs deployment from Github Actions). I haven't tried integrating with the lychee-action or PRs yet (where you'd do the same with the PR diff Github generates)

Line diff scope (individual lines per file):

Click to view details
# Get lychee:
$ docker run --rm -it rust bash
$ cargo install lychee

# Example repo with a branch of deployed docs, clone it to get a diff from desired commit
$ git clone https://github.com/docker-mailserver/docker-mailserver -b gh-pages
$ cd docker-mailserver

# Finally show a specific commit diff (default format is -p for patch), and only show diff for files with `.html` suffix,
# Then filter to addition/modification lines (green lines on github, starts with +),
# And feed these lines into lychee to parse for any links:
$ git show d4ba2b7f5027a2868e19c68fcba41b34b2bad1fd '*.html' | grep '^+' | lychee -
🔍 3 Total ✅ 3 OK 🚫 0 Errors

It successfully parsed the 3 links in files that were changed. lychee - will take links via STDIN, so we can read a commit diff, filter it and send it to lychee this way. I'm not familiar with how lychee is extracting links, I thought I'd have to extract out the href values as I didn't provide a proper HTML doc, but nope lychee is great! 😀

# For context, these are the 3 filtered lines of that commit:
$ git show d4ba2b7f5027a2868e19c68fcba41b34b2bad1fd '*.html' | grep '^+' | grep 'href'

+<p><a href="https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf">Our default Postfix configuration</a> can easily be extended to add parameters or modify existing ones by providing a <code>docker-data/dms/config/postfix-main.cf</code>. This file uses the same format as Postfix <code>main.cf</code> does (<a href="http://www.postfix.org/postconf.5.html">See official docs</a> for all parameters and syntax rules).</p>
+<p>One can easily increase the <a href="http://www.postfix.org/postconf.5.html#compatibility_level">backwards-compatibility level</a> and set new Postscreen options:</p>

It doesn't have to be an existing commit just a diff, or any other approach you have of collecting only the changed lines and feeding those into lychee.

The merged commit for the PR associated to that docs deployment commit has different files/content but works just as well with the same technique, so it could be used on the source files (in my case PR content differs for external and internal links, with the latter being relative links that the build tool handles):

$ git show 0010786d181a3a6c70856d4c34e46b60ad015b7e '*.md' | grep '^+' | lychee -   
🔍 3 Total ✅ 3 OK 🚫 0 Errors

# Just to show the same links that are found:
$ git show 0010786d181a3a6c70856d4c34e46b60ad015b7e '*.md' | grep '^+' | grep 'http'

+[Our default Postfix configuration](https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf) can easily be extended to add parameters or modify existing ones by providing a `docker-data/dms/config/postfix-main.cf`. This file uses the same format as Postfix `main.cf` does ([See official docs](http://www.postfix.org/postconf.5.html) for all parameters and syntax rules).
+    One can easily increase the [backwards-compatibility level](http://www.postfix.org/postconf.5.html#compatibility_level) and set new Postscreen options:

I'm assuming a bit more work would be needed to verify link references are valid. The link referenced may not have changed, but there might be a typo in the reference name or the referenced link itself was accidentally not added by the contributor.

The approach above won't work, an example is from this markdown docs commit:

- SASLauthd with LDAP auth (please see the note [down below](#ldap-setup))

...

Add these additions to the `mailserver` service in your [`docker-compose.yml`][github-file-compose]:

The first references a heading (#ldap-setup), the 2nd references a link at the bottom of the document github-file-compose, but that information won't be available to lychee from the changes input. It doesn't appear that this is checked/supported yet, so matters less.

These kind of changes, would be local file issues, which could be handled via a separate pass (or alternatively preferring to check links against build output, not source files). Thus lychee could ignore that link reference syntax for diff-based checks.


Actually... perhaps lychee lacks context for STDIN approach. I didn't see any options to hint content (HTML, Markdown, etc), so while markdown file works with the equivalent content, providing it to lychee via STDIN flops:

echo -e '[test](./relative/path/to/file.md)\n\n[test-ref][ref-file]\n\n[ref-file]: ./relative/path/to/another/file.md' \
  | lychee --dump -

File-level diff scope (file paths of changed files)

Click to view details

If you don't need granular line changes, you can detect which files were changed instead and provide just these files to lychee to look through.

This was a bit confusing for me to figure out how to get the list of file paths as args to lychee, especially if for some reason a space existed in one of the file paths:

# --no-commit-id --name-only will list only file paths involved in the commit.
# --diff-filter 'AM' is filtering the files in the diff to only be ones added (A) or modified (M).
# -r (recursive) is to include files in sub-directories, so that all files belonging to the commit are listed.
# HEAD references the latest commit, but could be a commit hash like previous examples.
# '*.md' like above is to filter files to only those with '.md' extension/suffix, uses gitignore syntax,
# ':!README.md' shows how to add more filter patterns, this one excludes with ':!' the top-level 'README.md' file.
#
# xargs splits the multi-line output by '\n' (defaults don't seem to work well with this git diff-tree output), and
# provides each line to lychee as a separate arg, spaces in filepath will be fine, as if quote wrapped.

git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r HEAD '*.md' | xargs -d '\n' lychee --dump --

This should be more reliable, for other types of link support since actual files will be parsed. Leverage the response cache feature and file-level scope for changes should work very close to changed lines scope. At least assuming your lychee-action workflow is triggered multiple times for PR iterations.

File or line scope for PR branch diff

Click to view details

For PRs, if you're wanting to compare changes from the PR branch to a target branch (eg: master / main), then you'll want to use --merge-base with git diff-tree command I think?

# Example with --merge-base, can reference by commit hashes, branch names, or any "tree-ish" value:
# https://stackoverflow.com/questions/4044368/what-does-tree-ish-mean-in-git/18605496#18605496
# The PR would be the 2nd value, to get it's additions/modifications when added to the 1st (eg: master branch).
# Make sure your checked out branch is the 2nd one being compared, so that the correct content is parsed.
git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r --merge-base master remote/branch-name | xargs -d '\n' lychee --dump --

# Same syntax, but swaps '--name-only' for '-p' and adds a filter with grep to output changed lines:
git diff-tree --no-commit-id -p --diff-filter 'AM' -r --merge-base master some-local-branch | grep '^+ ' | lychee --dump -

Note that if you have lots of filepaths, you may exceed the character limit for args I think. lychee doesn't appear to have support for taking in a list of paths (from STDIN or file) which would probably be a safer approach.


I checked the code (https://github.com/gaurav-nelson/github-action-markdown-link-check/blob/master/entrypoint.sh#L155) and it doesn't seem too hard to port.

I probably should have read through the earlier comments as I see I ended up sharing pretty much the same thing that was linked early on 😅

I wanted to share easier commands, so avoided using a shell script with a loop and array (which would have been easier for me, than trying to resolve the issues I encountered with xargs 😬 ). Hopefully it's still of some use though 👍

@polarathene

This comment was marked as outdated.

@mre
Copy link
Member

mre commented Jun 23, 2022

Thanks for the heads-up @polarathene.
I wonder if anyone tried do something like this:

jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # OR "2" -> To retrieve the preceding commit.

      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v23

      - name: Link Checker
        id: lychee
        uses: lycheeverse/[email protected]
        with:
                args: --verbose --no-progress ${{ steps.changed-files.outputs.all_changed_files }}
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}

⚠️ I did not test this, so there are most likely syntax errors in the above code.

@luisorbaiceta
Copy link

luisorbaiceta commented Jun 24, 2022

fetch-depth: 0 # OR "2" -> To retrieve the preceding commit.

I guess the problem with this in PRs would be to set the fetch depth dynamically to ensure all files modified in the PR are checked

@polarathene

This comment was marked as outdated.

@polarathene
Copy link
Author

polarathene commented Jul 1, 2022

I got it working, but as suspected if the file path has a space, some extra work is needed to be compatible with lychee input args, especially for lychee-action which adds more restrictions to what is valid via the args input (and it's usage of eval under the hood).


Failed attempt with xargs
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      - name: 'A - Get a list of changed files to process'
        id: changed-files-a
        run: |
          # Gets a diff of files from git with Added (A) or Modified (M) status,
          # Compares the default generated "test merge commit" by Github (HEAD) to the base branch (base.sha == HEAD^1),
          # `sed` then converts from multi-line (`\n`) to single-line delimited by `,` to support file paths containing spaces:
          CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z 's/\n/,/g;s/,$//')
          echo "::set-output name=all_changed_files::${CHANGED_FILES}"

      # Same output as prior step:
      - name: 'B - Get a list of changed files to process'
        id: changed-files-b
        uses: tj-actions/changed-files@v23
        with:
          separator: ','

      - name: Link Checker
        id: lychee
        uses: lycheeverse/[email protected]
        with:
          args: --verbose --no-progress $( xargs -d ',' <<< "${{ steps.changed-files-a.outputs.changed_files }}" )

The usage of xargs is required due to lychee not supporting inputs sourced from a file, the inputs need to be provided with white-space in mind for paths, so , was used as a delimiter to split on (when not using lychee-action, I could split the git diff-tree output on \n instead).

I thought this would work due to eval being used (which AFAIK is advised against):

eval lychee ${FORMAT} --output ${LYCHEE_TMP} ${ARGS}

But... as it turns out this did not work. I realized to make it eval compatible as well, I'd need to quote wrap paths... This was problematic as the inputs are being escaped (possibly to prevent security vulnerabilities described here, but perhaps some malicious file paths with the eval approach could still risk damage?).

I had trouble trying to make that approach work. I gave up and added another intermediate step which works:

jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      - name: 'A - Get a list of changed files to process'
        id: changed-files-a
        run: |
          # Gets a diff of files from git with Added (A) or Modified (M) status,
          # Compares the default generated "test merge commit" by Github (HEAD) to the base branch (base.sha == HEAD^1),
          # `sed` then converts from multi-line (`\n`) to single-line delimited by `,` to support file paths containing spaces:
          CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z 's/\n/,/g;s/,$//')
          echo "::set-output name=all_changed_files::${CHANGED_FILES}"

      # Same output as prior step:
      - name: 'B - Get a list of changed files to process'
        id: changed-files-b
        uses: tj-actions/changed-files@v23
        with:
          separator: ','

      # lychee-action requires pre-processing for file paths with spaces
      # NOTE: Github outputs and ENV do not support `\n`, so rely on single-line string or read/write a file
      - name: 'Compatibility with eval: Quote wrap paths to support spaces'
        id: changed-files
        env:
          # Change between a or b step above:
          changed_files: ${{ steps.changed-files-a.outputs.all_changed_files }}
        run: |
          QUOTE_WRAPPED="'$( sed "s/,/' '/g" <<< '${{ env.changed_files }}' )'"
          echo "::set-output name=all_changed_files::${QUOTE_WRAPPED}"

      - name: Link Checker
        id: lychee
        uses: lycheeverse/[email protected]
        with:
          args: --verbose --no-progress ${{ steps.changed-files.outputs.all_changed_files }}

So not being able to use xargs option rules out the alternative I had in mind with using a file to avoid all the extra transforms to make the paths compatible args (I've sunk a fair amount of time messing with shell incantations to try figure this out, with plenty of failures that other users should not have to go through as well I hope).

I did try messing around with printf '%s' variants with xargs and readarray, but when using xargs -d I would end up with extra \n at the end that I couldn't strip off, and would end up as part of the last item path (if quote wrapping), or it's own item $'\n' with printf '%q', with readarray involved as well I avoided that with printf '%s\n', but it was getting long/ugly and with eval involved, seemed easier to just quote wrap with sed.


It'd probably be better for lychee-action to support a new file-list input that can take a , delimited string as a list of inputs? (simpler to support tj-actions/changed-files or similar, than having lychee support inputs from a file)

I did not have much luck looking up GH actions limits for input lengths which might be a problem on larger projects? There is this limit for job outputs stating 1MB per output. If file paths use 200 characters each, that's still 5k paths, or likewise 50k for path lengths of 20 characters each, maybe not a concern then.

Alternatively not supporting file paths with spaces for lychee-action would avoid the hassle to support above.

@mre
Copy link
Member

mre commented Jul 1, 2022

Thanks for the thorough writeup. To avoid issues with spaces, have you tried terminating the inputs with a NUL terminator?
BurntSushi/ripgrep#273 (comment)

@polarathene
Copy link
Author

polarathene commented Jul 1, 2022

To avoid issues with spaces, have you tried terminating the inputs with a NUL terminator?

Yes, but I don't see how that would help given the input constraints? (not just with spaces, but passing data to an input through from a prior step output/env, and the usage of eval by lychee-action internally)

Like with ripgrep, this isn't as big of an issue with lychee executable directly, it's the other layers of indirection involved. I'd like to also draw your attention to this comment in that referenced issue, regarding passing too many long file paths and/or xargs availability/awareness. The maintainer also changed their opinion after that comment.

Perhaps I'm doing it wrong, but here's what I tried. I also think it highlights why this is frustrating to try workaround/resolve? (I know I've spent far more time looking into this than desirable, hoping that others would not have to go through these hurdles to accomplish the same goal)

Attempts with xargs that failed to support spaces in paths for lychee-action

First example, swapping \n for \0 (NUL terminator?):

$ xargs -t -0 printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures | tr '\n' '\0')

bash: warning: command substitution: ignored null byte in input
printf "'%s' " '.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'$'\n'

# Note `$'\n'` at end results in newline for the last quote, and null (\0) ignored, no separation of lines:
'.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
'

xargs without -0 option:

$ xargs -t printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures | tr '\n' '\0')')

bash: warning: command substitution: ignored null byte in input
printf "'%s' " .github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md

# No `$'\n'` at end to printf now, but null bytes as delimiter still ignored resulting in one big string: 
'.github/workflows/scripts/docs/build-docs.shdocs/content/config/environment.mddocs/content/config/user-management/accounts.mddocs/content/examples/tutorials/basic-installation.mddocs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'

Original diff output with \n seems to work (no tr used):

$ xargs -t printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures)

printf "'%s' " .github/workflows/scripts/docs/build-docs.sh docs/content/config/environment.md docs/content/config/user-management/accounts.md docs/content/examples/tutorials/basic-installation.md docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md

# Each filepath is properly quoted now (treated as separate args):
'.github/workflows/scripts/docs/build-docs.sh' 'docs/content/config/environment.md' 'docs/content/config/user-management/accounts.md' 'docs/content/examples/tutorials/basic-installation.md' 'docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'

But actually fails for paths with spaces, as white-space is split on too:

$ xargs -t printf "'%s' " <<< $(echo -e 'file.txt\nanother/file.txt\npath/that has/spaces in it.txt\npath/to/another-file.txt') 

printf "'%s' " file.txt another/file.txt path/that has/spaces in it.txt path/to/another-file.txt

'file.txt' 'another/file.txt' 'path/that' 'has/spaces' 'in' 'it.txt' 'path/to/another-file.txt'

Fixed with \n delimiter:

$ xargs -t -d '\n' printf "'%s' " <<< $(echo -e 'file.txt\nanother/file.txt\npath/that has/spaces in it.txt\npath/to/another-file.txt')

printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' path/to/another-file.txt

'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt'

Now change to , delimiter to accommodate limitations of GH Action inputs & eval in lychee-action:

$ xargs -t -d ',' printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt')

# Notice the `$'\n'` added by xargs:
printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' 'path/to/another-file.txt'$'\n'

# Broken with newline at last input again:
'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt
'

Try fix failure by making the trailing '\n' a null byte:

$ xargs -t -d ',' printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt' | tr '\n' '\0')

bash: warning: command substitution: ignored null byte in input
printf "'%s' " file.txt another/file.txt 'path/that has/spaces in it.txt' 'path/to/another-file.txt'$'\n'

'file.txt' 'another/file.txt' 'path/that has/spaces in it.txt' 'path/to/another-file.txt
'


# With xargs -0:
$ xargs -t -d ',' -0 printf "'%s' " <<< $(echo -e 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt' | tr '\n' '\0')

bash: warning: command substitution: ignored null byte in input
printf "'%s' " 'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt'$'\n'

'file.txt,another/file.txt,path/that has/spaces in it.txt,path/to/another-file.txt
'

Additional reference, this command with -0 added for xargs, the \n remains for each path, at it is treated as a single value/input (keeping the \n delimiter):

$ xargs -t -0 printf "'%s' " <<< $(git diff-tree --name-only --diff-filter 'AM' -r --merge-base dms/master docs/ci-build-catch-failures)

printf "'%s' " '.github/workflows/scripts/docs/build-docs.sh'$'\n''docs/content/config/environment.md'$'\n''docs/content/config/user-management/accounts.md'$'\n''docs/content/examples/tutorials/basic-installation.md'$'\n''docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md'$'\n'

'.github/workflows/scripts/docs/build-docs.sh
docs/content/config/environment.md
docs/content/config/user-management/accounts.md
docs/content/examples/tutorials/basic-installation.md
docs/content/examples/uses-cases/forward-only-mailserver-with-ldap-authentication.md
'

To be clear... xargs works great with -d '\n' with the non-action lychee command. It's only with lychee-action that it becomes invalid due to eval and GH actions. For the latter it's easier to use a different delimiter and quote wrap like shown in previous comment.

  • lychee-action could just as easily support another arg for the inputs list with a delimiter, and use xargs -d internally (without eval). - Compatible with tj-actions/changed-files outputs.
  • lychee could take a file with a list of line delimited inputs, and lychee action could likewise support that? - Incompatible with tj-actions/changed-files outputs.

@mre
Copy link
Member

mre commented Sep 5, 2024

Closing this in favor of #238, which is a clever and superior solution in my opinion. It uses lychee's --dump feature + .lycheeignore to ignore the links in the main branch and only checks the new links from a pull request. More info in the linked issue. Thanks for laying the groundwork @polarathene.

@mre mre closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants