Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change xxd dependency from hard to soft #181

Merged
merged 3 commits into from
Oct 8, 2024
Merged

Conversation

andreineculau
Copy link
Collaborator

@andreineculau andreineculau commented Aug 2, 2024

xxd is a dependency for openssl 3+ environments,
but xxd is packaged with vim,
which basically translates to
"transcrypt has a dependency on vim" πŸ™„πŸ€¦β€β™‚οΈ

In order to keep the dependencies slim, and vim is not slim,
here's an attempt to replace xxd with sed and xargs.

Alternatively, perl also seems to work perl -pe "s/([0-9A-Fa-f]{2})/chr(hex(\$1))/eg" instead of xxd -r -p. Maybe there are other/better ways.

sed+xargs, and even perl, are much more easily available, if not readily available, than xxd (vim). Think automation (CI/CD, docker) - you don't install a full-blown editor, but if you need to decrypt with transcrypt, currently you need to install one...

NOTE I am not πŸ’― sure this is safe and that there's no edge cases, but it looks alright e.g.

echo -n "deadbeaf00000bad" | xxd -r -p
echo -n "deadbeaf00000bad" | sed "s/../\\\\x&/g" | xargs -0 printf "%b"
echo -n "deadbeaf00000bad" | perl -pe "s/([0-9A-Fa-f]{2})/chr(hex(\$1))/eg"

@andreineculau andreineculau requested a review from jmurty August 2, 2024 22:32
@andreineculau andreineculau force-pushed the andreineculau-patch-1 branch from 4226ba8 to 054adb3 Compare August 23, 2024 22:05
@jmurty
Copy link
Collaborator

jmurty commented Sep 1, 2024

I would very much like to remove the dependency on xxd for the reasons you point out – it's a pain to install on most platforms because it's often only available with vim.

However xxd does work reliably across platforms, which I'm not confident the alternatives will. I'm not keen to switch to something else to then discover it breaks in different ways on different platforms, each of would need to be debugged and fixed.

For example, the sed + xargs alternative commands in this PR failed the bats tests when run on my Mac running 14.6.1 against OpenSSL 3.2.1. The perl alternative passes tests in this environment, but I'm not sure how consistent perl versions are across platforms, nor if perl is really a better dependency to take than xxd.

I'm interested in exploring alternatives and the sed option is attractive, provided it can be made to work reliably across platforms and versions with the right incantations.

To that end, can you describe in more detail what the sed & xargs commands are doing exactly? Then it would be a matter of finding out whether, and how, we can make them work as we need across all flavours of those utilities across platforms.

@andreineculau
Copy link
Collaborator Author

For example, the sed + xargs alternative commands in this PR failed the bats tests when run on my Mac running 14.6.1 against OpenSSL 3.2.1.

That's really good to know.

I now tested locally with 3.3.1 and got these failing:

  • contexts: encrypt a file in default and 'super-secret' contexts
  • contexts: confirm --list-contexts lists contexts with config status
  • contexts: confirm --list-contexts only lists contexts it should
  • contexts: transcrypt --uninstall leaves decrypted files and repo dirty for all contexts
  • contexts: any one of multiple contexts works in isolation
  • contexts: --upgrade retains all context configs
  • crypt: transcrypt --uninstall leaves decrypted file and repo dirty
  • crypt: transcrypt --upgrade applies new merge driver
  • init: creates .gitattributes

But since the tests are not properly sandboxed, I checked out master (at af3bb46 ) and ran the tests again, only to find the exact same tests failing...

Which tests are failing on your side?

I'm not keen to switch to something else to then discover it breaks in different ways on different platforms, each of would need to be debugged and fixed.

πŸ’― And how do you suggest we reach enough certainty then? Run more granular matrix tests with different arch/openssl/etc versions? Which would those be?

@andreineculau
Copy link
Collaborator Author

can you describe in more detail what the sed & xargs commands are doing exactly?

sed "s/../\\\\x&/g" takes pairs of 2 chars and puts \x in front (escaped hexadecimal byte values). So abcd becomes \xab\xcd.

I don't know if there's wider support for the longer but maybe more common usage with 2 capture groups: sed "s/\(.\)\(.\)/\\\\x\1\2/g", but the behaviour is/should be the same.

printf interprets the escape sequences into binary

@jmurty
Copy link
Collaborator

jmurty commented Sep 9, 2024

The problem on MacOS seems to be that printf doesn't actually print the hex-encoded string as binary, it just strips the backslashes.

$ echo -n "deadbeaf00000bad" | sed "s/../\\\\x&/g"
\xde\xad\xbe\xaf\x00\x00\x0b\xad%

$ echo -n "deadbeaf00000bad" | sed "s/../\\\\x&/g" | xargs -0 printf '%b'
xdexadxbexafx00x00x0bxad%                      

This happens on MacOS for shell versions:

  • zsh 5.9 (arm-apple-darwin21.3.0)
  • GNU bash, version 5.2.21(1)-release (aarch64-apple-darwin23.0.0)

Other times I have looked into an alternative to xxd to convert a hex string to binary bytes I've ended up down a nightmare rabbit hole of hacky bash scripts. But a slow and hacky pure bash script might be the only way to do it if we can't rely on other commonly-available lightweight tools like printf to work consistently across platforms.

@andreineculau
Copy link
Collaborator Author

andreineculau commented Sep 9, 2024

Bingo! My printf is the GNU version from homebrew's coreutils, not the macos /usr/bin/printf. I confirm.

Same with /bin/echo -en. The GNU variant works but not the BSD.

Github is not showing anymore the pre-git-push-force commits (runs here), but I was also aiming for a bash-only solution based on https://stackoverflow.com/questions/42828273/bash-hex-to-ascii-is-possible-without-xxd-or-perl/42828523#42828523. Couldn't get it to work, but I bet the problem is between the chair and the keyboard (actually now that I tested /bin/echo -en I understand why the bash solution in StackOverflow didn't work.

I did also look into https://stackoverflow.com/questions/4614775/converting-hex-to-decimal-in-awk-or-sed

After lots of trial and error, I cannot get a success run. Even my friends Google and ChatGPT are dry.

Could we say that we have a bash function that works with a case statement checking if xxd is available, uses that, checks if perl is available, uses that, otherwise checks if sed+printf are reliable and uses that, otherwise it errors ? That would still mean that in a lot of cases xxd is not needed. Maybe we don't even need perl. Check if sed+printf (actually only printf or even echo) is reliable to convert \xHH to binary, if not default to xxd.

PS: echo -n "deadbeaf00000bad" | sed "s/../\\\\x&/g" | xargs -0 printf "%b" can also be written as printf "$(echo -n "deadbeaf00000bad" | sed "s/../\\\\x&/g")" so that we remove xargs from the picture

@andreineculau andreineculau changed the title remove xxd dependency change xxd dependency from hard to soft Sep 15, 2024
@andreineculau
Copy link
Collaborator Author

Could we say that we have a bash function...

@jmurty See the new changeset

jmurty added a commit that referenced this pull request Oct 6, 2024
…rintf` or `perl` instead

commit e514010
Author: James Murty <[email protected]>
Date:   Sun Oct 6 22:52:33 2024 +1100

    Tweaks

Remove hard dependency on `xxd` which is often a heavy requirement
because it is only available with Vim on some platforms.

Fall back to using `printf` (provided it has full %b support) or
`perl` when either of these are available, and only require that
`xxd` be installed and available when that is the only viable option.

Adapted from #181. Thanks @andreineculau!
@jmurty
Copy link
Collaborator

jmurty commented Oct 6, 2024

I really like this approach, thanks! I've merged it to main in dd778bf with some minor tweaks, mainly changing the ordering to prefer printf to perl because I suspect – but don't know for sure – that printf is likely to be faster.

Encryption speed is a real casualty of our need to force OpenSSL 3+ to include the Salted__ prefix since the hex-to-bin workaround is a required extra step for every clean/encrypt operation.

@jmurty jmurty closed this Oct 6, 2024
jmurty added a commit that referenced this pull request Oct 6, 2024
…rintf` or `perl` instead

Remove hard dependency on `xxd` which is often a heavy requirement
because it is only available with Vim on some platforms.

Fall back to using `printf` (provided it has full %b support) or
`perl` when either of these are available, and only require that
`xxd` be installed and available when that is the only viable option.

Adapted from #181. Thanks @andreineculau!
@andreineculau
Copy link
Collaborator Author

@jmurty I can't help but take notice that this approach of incorporating a PR is akin to appropriating intellectual property!

A human error for sure (and one that can be corrected as well!) but such practices have no place in the OSS world built on contributions and attributions. I do not take offence, but I want to clearly delimit myself from such practices.

@jmurty
Copy link
Collaborator

jmurty commented Oct 6, 2024

Hi @andreineculau I'm truly sorry, I didn't mean to take your contribution in an underhand way.

I'm happy to remedy this in whatever way you think best.

I would like to keep the final merge/commit to main as simple as possible since there were a few changes along the way, which was the only reason I didn't do a normal PR merge in the first place. I've pushed the tweaks I made before my squash merge to your PR branch so we can work from there.

@jmurty jmurty reopened this Oct 7, 2024
@jmurty
Copy link
Collaborator

jmurty commented Oct 7, 2024

Hi @andreineculau I have reset the main branch to the point before my prior CLI squash-merge of this PR.

How would you like me to merge this PR? Would a "Squash and merge" done with the button in GitHub be okay, or would you prefer a different approach?

@andreineculau
Copy link
Collaborator Author

Hey @jmurty ! I really had no intention to cause this much fuss. I should have been clearer it's about doing different next time πŸ™‡

That said, now I feel I need to bring myself up to meet you. I don't know if there are other techniques, but here's 4 alternatives to take a PR "home" in a quick fashion i.e. without holding a contributor's hand

  1. if you have access to the PR's branch, just add commits on top i.e. take over the branch, and then merge the PR
  2. if you don't have access (PR from a forked repo), add suggestions inline to the PR. Not sure if you can commit the suggestion to the contributor's branch (I doubt it, but at least you get one step ahead because you're virtually saying "if you want to merge your PR, this is exactly or very close to what I expect to see in your PR" https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request
  3. a more direct alternative if the PR is good enough is just merge the PR as is, and make the necessary changes immediately after. For instance in this case, the PR was 100% functional, but you also wanted to act on a hunch that printf is faster than perl. For all that we know, maybe that doesn't hold, and even if we had 1 commit before, we will end up with another one switching back the order - perl before printf...
  4. a more raw alternative is that you can checkout the PR branch locally - see https://cli.github.com/manual/gh_pr_checkout - and then you squash/fixup and the commit still has git author set to the contributor, but will have you as the committer

Squash/fixup is an orthogonal matter - locally or in the github UI, you can choose to have 1 commit/PR. One could argue that the history is part of the contribution, but if we are to think of "accepting a changeset", than how the changeset is delivered - in 1 or 100 commits - is secondary.
Screenshot 2024-10-07 at 18 23 50

All of the above alternatives attribute the changeset to the author=contributor, while ticking boxes like: squash and move fast.

Apologies if I came in hard, but again it's not about this PR. You didn't need to revert etc.

@jmurty jmurty merged commit 016b2e4 into main Oct 8, 2024
8 checks passed
@jmurty
Copy link
Collaborator

jmurty commented Oct 8, 2024

Hi @andreineculau it's not really a fuss, I just want to get this right. This project and I have benefitted a lot from your contributions over the years, I don't want to spoil that.

I have now squash-merged using GitHub's button.

My mistake was not realising – or noticing – that doing a squash merge of this PR branch locally using the git CLI made me the author of the entire commit, not just its committer. As I've now learned, if you squash merge using the CLI you need to then follow up with a git commit --amend --author command or similar to put the original author metadata back.

The GitHub button does a much better job of keeping attribution for squash merges, and also knows that the PR itself has been merged, so that's the smart option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants