Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on sed(1) for history processing #167

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jparise
Copy link

@jparise jparise commented Jan 12, 2025

We post-process history 1's output to extract the current command. This processing needs to strip the leading history number, an optional * character indicating whether the entry was modified (or a space), and then a space separating character.

We were previously using sed(1) for this, but we can implement an equivalent transformation using bash's native parameter expansion syntax.

This also results in ~4x reduction in per-prompt command overhead.

export LC_ALL=C
HISTTIMEFORMAT='' builtin history 1 | sed '1 s/^ *[0-9][0-9]*[* ] //'
)
this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
Copy link
Author

@jparise jparise Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also removes the export for LC_ALL, which shouldn't be necessary here. I can move that into it's own proposed change if that's preferred.

@jparise jparise force-pushed the sedless-history branch 2 times, most recently from 0962448 to e4d1398 Compare January 12, 2025 21:13
We post-process `history 1`'s output to extract the current command.
This processing needs to strip the leading history number, an optional
`*` character indicating whether the entry was modified (or a space),
and then a space separating character.

We were previously using sed(1) for this, but we can implement an
equivalent transformation using bash's native parameter expansion
syntax.
bash-preexec.sh Outdated Show resolved Hide resolved
The history number is printed as "%5d" so we need to handle all possible
columns.

    declare -a lines=(
        "10000  history 1"
        " 1000  history 1"
        "    1  history 1"
        " 1000* history 1"
        " 1000   history 1"
    )

    for line in "${lines[@]}"; do
        echo "${line#*[[:digit:]][* ] }"
    done
@jparise
Copy link
Author

jparise commented Jan 13, 2025

With the latest revision, the pattern now yields correct results for a complete set of possible history values:

#!/usr/bin/env bash

declare -a lines=(
  "10000  history 1"
  " 1000  history 1"
  "    1  history 1"
  " 1000* history 1"
  " 1000   history 1"
)

for line in "${lines[@]}"; do
  echo "${line#*[[:digit:]][* ] }"
done

It matches anything up to the last digit as the number prefix.

@jparise jparise requested a review from akinomyoga January 13, 2025 01:00
@jparise
Copy link
Author

jparise commented Jan 13, 2025

We could get more precise (and more closely reproduce the sed pattern) if we (temporarily) used extglob, but that might be overkill for this set of constrained inputs.

Copy link
Contributor

@akinomyoga akinomyoga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your script produces a wrong result for the last input:

$ bash bash-completion-167.sh
history 1
history 1
history 1
history 1
 history 1

though the last input " 1000 history 1" would never happen because of the Bash code. So it anyway works.

We could get more precise (and more closely reproduce the sed pattern) if we (temporarily) used extglob, but that might be overkill for this set of constrained inputs.

I agree. The extglob engine of Bash is slow.

@jparise
Copy link
Author

jparise commented Jan 13, 2025

Your script produces a wrong result for the last input:

$ bash bash-completion-167.sh
history 1
history 1
history 1
history 1
 history 1

That's actually intentional. There's a test to ensure we don't strip whitespace from the command itself:

@test "preexec should not strip whitespace from commands" {
preexec_functions+=(test_preexec_echo)
__bp_interactive_mode
history -s " this command has whitespace "
run '__bp_preexec_invoke_exec'
[ $status -eq 0 ]
[ "$output" == " this command has whitespace " ]
}

I think that's only relevant if ignorespace and ignoreboth is unset in HISTCONTROL.

@akinomyoga
Copy link
Contributor

Ah, I see your point.

@rcaloras
Copy link
Owner

Thanks for the PR @jparise! and @akinomyoga for the review.

@jparise do we know if this reduces the overhead time of running bash-preexec? Would be curious how much this speeds us up.

@steinarvk, @dseomn - Any feedback on this change as folks who've made changes to the current implementation with sed?

@jparise
Copy link
Author

jparise commented Jan 13, 2025

@jparise do we know if this reduces the overhead time of running bash-preexec? Would be curious how much this speeds us up.

In this artificial test, the new (non-sed) approach seems to be ~4× faster:

#!/usr/bin/env bash

old() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(
      export LC_ALL=C
      HISTTIMEFORMAT='' builtin history 1 | sed '1 s/^ *[0-9][0-9]*[* ] //'
    )
    ((++loop))
  done
}

new() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

echo "Old"
time -p old

echo "New"
time -p new
Old
real 16.98
user 5.47
sys 10.83
New
real 4.36
user 0.71
sys 3.25

@akinomyoga
Copy link
Contributor

If one cares about the overhead, one can also consider using a function substitution (also known as "no-fork command substitution") introduced in Bash 5.3 (which is still beta now, though) with a proper test on BASH_VERSINFO.

#!/usr/bin/env bash

old() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(
      export LC_ALL=C
      HISTTIMEFORMAT='' builtin history 1 | sed '1 s/^ *[0-9][0-9]*[* ] //'
    )
    ((++loop))
  done
}

new() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

new2() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    if (( BASH_VERSINFO[0] > 5 || (BASH_VERSINFO[0] == 5 && BASH_VERSINFO[1] >= 3) )); then
      this_command=${ LC_ALL=C HISTTIMEFORMAT='' builtin history 1; }
    else
      this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    fi
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

time old
time new
time new2
$ bash-5.3-beta gh0167.benchmark.sh

real    0m18.610s
user    0m9.430s
sys     0m12.251s

real    0m11.676s
user    0m4.243s
sys     0m7.507s

real    0m0.629s
user    0m0.578s
sys     0m0.046s

mitchellh added a commit to ghostty-org/ghostty that referenced this pull request Jan 16, 2025
We post-process history 1's output to extract the current command. This
processing needs to strip the leading history number, an optional *
character indicating whether the entry was modified (or a space), and
then a space separating character.

We were previously using sed(1) for this, but we can implement an
equivalent transformation using bash's native parameter expansion
syntax.

This also results in ~4x reduction in per-prompt command overhead.

Upstream: rcaloras/bash-preexec#167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants