interp: consider adding some coreutils as builtins #93

mvdan · 2017-04-25T13:59:56Z

coreutils as of 8.27 has a total of 103 executables, ranging from wc to test and sort.

Plenty of scripts depend on these. If we really want interp to be platform-independent, we will have to implement some of these as builtins as they won't be available in some systems like Windows (and Mac?).

Bash 4.4 already has a few as builtins:

 $ for c in $(pacman -Ql coreutils | grep '\/bin' | sed 's@.*/@@'); do type -a $c; done | grep builtin
[ is a shell builtin
echo is a shell builtin
false is a shell builtin
printf is a shell builtin
pwd is a shell builtin
test is a shell builtin
true is a shell builtin

We already have all of these except for [ and test - see #92.

I obviously won't implement all 103 in one go, but I could start with the most common and simple. Unfortunately, these being GNU programs they all have tons of options and gotchas, like cat having a dozen options.

One open question is whether these should always be builtins like echo. Other options include:

Use them as fallback only if the system version is not available
Enable them via a flag, like "extended builtins"

For the sake of simplicity, I'm inclined towards them always being builtins. The only downside is scripts that depend on GNU flags that are somewhat obscure. But these scripts wouldn't be portable to begin with.

The text was updated successfully, but these errors were encountered:

mvdan · 2017-04-25T14:07:18Z

A good way to make the list of 103 executable shorter is to filter it through the list of POSIX Shell utilities, which has 160 executables: http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html

Those docs are also a good start, as each command is simpler and has less options - cat only has a single option.

The number of executables that are both in coreutils and in the POSIX utilities list are 60:

$ comm -12 <(pacman -Ql coreutils | grep '\/bin' | sed 's@.*/@@') posix | column -c 70
basename        echo            mv              stty
cat             env             nice            tail
chgrp           expand          nl              tee
chmod           expr            nohup           test
chown           false           od              touch
cksum           fold            paste           tr
comm            head            pathchk         true
cp              id              pr              tsort
csplit          join            printf          tty
cut             link            pwd             uname
date            ln              rm              unexpand
dd              logname         rmdir           uniq
df              ls              sleep           unlink
dirname         mkdir           sort            wc
du              mkfifo          split           who

Some of these like true and echo we already have. And most of the ones we don't yet have are common and seem fairly simple to implement.

mvdan · 2017-04-25T14:19:41Z

@andreynering this might be of interest to you if you want unix scripts to work on Windows

andreynering · 2017-04-25T16:05:09Z

@mvdan Yeah, probably it's a lot of work, but would definitely be helpful.

The most basic buildins would already be very helpful:

rm
cp
mv
mkdir
ls
touch
chmod

mvdan · 2017-04-25T17:01:59Z

Also worth noting that there are some fairly common commands that aren't part of coreutils. sed and grep come to mind, for example.

mvdan · 2018-01-12T11:02:45Z

@andreynering see the issue above - I'm currently looking at piggybacking off someone else's coreutils implementation.

mvdan · 2018-10-28T17:19:01Z

It looks like that coreutils project has lost traction, so it could be years until anyone completes it and makes it importable. I see two alternatives.

Option one: we add an easy way to bundle a coreutils implementation like BusyBox, https://github.com/uutils/coreutils, or any other implementation that can be bundled as a single static binary. Then, one could include said static binary alongside the Go binary, or inside the Go binary as a compressed variable/constant somewhere.

Option two: we implement the most common coreutils tools ourselves.

Option one is much, much less work. I definitely don't want to reimplement coreutils, especially not with all of GNU's knobs and portability nightmares.

I realise that option one won't be easy to set up, may be slower than native Go implementations, and may make binaries larger. However, it's a much saner option in the long run, one can choose the coreutils implementation, and it still solves the portability issue, which is the main concern.

If anyone would like to work on it, please speak up. The API would likely be something like:

// ExecWithBusybox will use a busybox-like binary at path to execute all programs belonging to a
// busybox implementation. For example, executing {"cp", "foo", "bar"} would actually execute
// {"/path/to/busybox", "cp", "foo", "bar"}.
func ExecWithBusybox(path string, next ModuleExec) ModuleExec

Then it would be up to the application to decide how to bundle the busybox binary itself. I don't think that implementation detail belongs in our API.

ebfe · 2018-10-28T17:44:14Z

I'd like to throw in another idea which makes implementing both options viable: Allow custom builtins. API could look something like: ``` type Builtin func(ctxt context.Context, argv []string) error func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin) ``` The handler could then exec busybox or call into an coreutils like API.

andreynering · 2018-10-28T18:52:20Z

Is BusyBox compatible with Windows? I don't think so.

Another alternative would be forking the go-coreutils project and make some of the already implemented tools (mv, ls, rm, cp, etc...) importable.

mvdan · 2018-10-28T20:23:10Z

Is BusyBox compatible with Windows? I don't think so.

Good point. However, note that more modern implementations like uutils do support Windows, so that should be a better option if you prioritise portability.

Another alternative would be forking the go-coreutils project

The amount of work required would be non-trivial, and I don't intend to maintain such a project, so I'll give that a pass myself :) It's also not directly related to a shell package, so it doesn't need to live here. Others are welcome to do that work separately, of course.

func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin)

I'm starting to think that the current ModuleExec API is enough to do what you want here. For example, all that your wrapper would do is (ignoring the old bit):

func (r *Runner) SetBuiltin(name string, handler Builtin) {
    r.Exec = ModuleExec(func(ctx context.Context, path string, args []string) error {
        if args[0] == name {
            return handler(ctx, args)
        }
        return r.Exec(ctx, path, args)
    }
})

Of course, that doesn't really scale if one has hundreds of overriding layers like these. But one could have a single layer for all the busybox "builtins", which is kinda like what I was suggesting earlier. The earlier version would be like:

func ExecWithBusybox(path string, next ModuleExec) ModuleExec {
    return ModuleExec(func(ctx context.Context, path string, args []string) error {
        switch args[0] {
        case "cp", "all-other-coreutils":
            mc, _ := FromModuleContext(ctx)
            cmd := exec.CommandContext(ctx, path, args)
            cmd.Dir = mc.Dir
            cmd.Stdin = mc.Stdin
            cmd.Stdout = mc.Stdout
            cmd.Stderr = mc.Stderr
            // etc etc
            return cmd.Run()
        }
        return next(ctx, path, args)
    })
}

So, really, both APIs be implemented outside of the interp package. I think ExecWithBusybox is a bit more complex, but they're both just thin wrappers over what one can already do with ModuleExec. So perhaps we shouldn't do anything after all.

ebfe · 2018-10-28T21:36:02Z

> func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin) I'm starting to think that the current `ModuleExec` API is enough to do what you want here.

I don't think ModuleExec can override the existing builtins. (Though I'm unsure how useful this is outside maybe trap and adding currently unimplemented ones).

Of course, that doesn't really scale if one has hundreds of overriding layers like these.

I was thinking more of something like this, which doesn't stack up multiple layers: ``` type Runner struct { [...] builtins map[string]Builtin [...] } func (r *Runner) SetBuiltin(name, handler Builtin) (Builtin) { // Or just expose the builtins map. old = r[name] r.builtins[name] = handler return old } ``` and adjusting isBuiltin/builtinCode use the map.

mvdan · 2018-10-28T21:51:31Z

Ah, I had forgotten that builtins were hard-coded into the interpreter.

At first I wanted to leave that to ModuleExec, but I figured that it would make the module too complex. DefaultExec already does enough stuff - imagine if it had to also take care of builtins. I was worried that it would be too hard for others to replace r.Exec, since I imagined that most people didn't care about builtins.

But perhaps we could layer them too. That is, declare DefaultExec as var DefaultExec = ExecWithBuiltins(ExecStartProcess) (where ExecStartProcess would be the current DefaultExec, just using os/exec).

Then it would be possible to bypass the entirety of the builtins - one just has to do r.Exec = ExecStartProcess or r.Exec = myBuiltins(ExecStartProcess). And overriding a single builtin would still be possible, via r.Exec = myCpBuiltin(ExecWithBuiltins(ExecStartProcess))).

This seems like it exposes less API, and by folding builtins into ModuleExec, makes that module more powerful.

andreynering · 2021-09-06T14:04:32Z

I'd like to have more time to help this move forward, but as for everyone, time is limited. Anyway, bringing this some discussion may be helpful.

Someone in the Task repo mentioned that https://github.com/u-root/u-root has many builtins implemented in Go. This folder contains the main packages. In some cases the core implementation was done on a separated package here, which means we could import and use them.

As I said before, having just the basic builtins like cp, mv, rm and mkdir would add a huge value for those on Windows as these are the builtins used 90% of the time, at least on automation tools like Task.

@mvdan What would you expect of someone trying to move this forward? Would it be acceptable to import this project and use it directly on interp/builtin.go? Or maybe you prefer to have an abstraction layer using ModuleExec and move that into another package?

mvdan · 2021-09-07T21:54:51Z

I had seen the u-root coreutils, but I did not know they were exposed as non-main packages too. That sounds wonderful.

Their go.mod is multiple times as big as ours though, so I'm uneasy about adding a direct dependency. Even with Go 1.17's trimmed module graphs, forcing anyone that imports the interpreter to depend on u-root and its dependencies feels overkill.

That said, I wouldn't oppose a subpackage, like mvdan.cc/sh/v3/interp/coreutils exposing a ModuleExec API. That way, it will be possible to import the vanilla interpreter, and thanks to trimmed module graphs, one shouldn't even need to download u-root unless necessary.

mvdan · 2022-10-19T13:51:25Z

I started to look into u-root as a solution, as described above. See u-root/u-root#2527. I'll wait another week or so to see if they get back to me, as it would be best to coordinate the needed changes with them.

mvdan · 2022-10-23T13:09:16Z

But perhaps we could layer them too. That is, declare DefaultExec as var DefaultExec = ExecWithBuiltins(ExecStartProcess) (where ExecStartProcess would be the current DefaultExec, just using os/exec).

Unfortunately, layering like that with the existing ExecHandler API won't work, because the shell needs to know the set of builtins for commands like type or builtin. But perhaps a thought for v4 - rethink the "exec handler" so that it can absorb builtins entirely. Then the user can replace or extend the builtins as they please.

mvdan · 2022-12-15T23:33:47Z

I'm actually not sure why we were talking about builtins above. We can support providing coreutils implemented in Go without treating them as builtins, via ExecHandler. They would not execute a real program, instead running a bit of Go code. They wouldn't work if one ran builtin cp, but I don't think anyone would expect that - they are not actually builtins per the POSIX spec or Bash manual.

Anyway, I made some progress on this over the last couple of weeks. Thanks to @JohnHardy for nudging me along :) I made cp importable in u-root, which unfortunately required a non-trivial refactor, since half of the implementation was in a package main. See https://github.com/mvdan/u-root-coreutils/tree/coreutils.

You can see our side of the changes at https://github.com/mvdan/sh/tree/93-coreutils. The two changes are:

A new package, interp/coreutils, which exposes a func Handle of type interp.ExecHandlerFunc.
Allow chaining multiple exec handlers via interp.ExecHandlers, where one returning interp.SkipHandler means that we run the next one rather than stopping. This is helpful for coreutils, as one may want to attempt to run a bundled coreutil, falling back to executing a program normally.

I'll probably add support for similar chaining in the other handlers, because "run some logic in some cases, fall back to the default logic in others" is a rather common need.

mvdan · 2022-12-15T23:36:33Z

I forgot to mention: I am looking for feedback on this design, as well as the added dependency on u-root, before I spend more time modifying u-root to export more of their coreutils. Note that the added dependency is somewhat large, but it's only pulled in if one explicitly imports the new coreutils package.

cc @andreynering for go-task, as well as @zimbatm @riacataquian @ebfe @theclapp

mvdan · 2022-12-15T23:41:46Z

Err, worth flagging that the u-root Go module currently depends on our module:

https://github.com/u-root/u-root/blob/904692535c70f103396524ae535a2e7bc89cb75a/go.mod#L44

If we upstream our changes in the future, then we'd have a cyclic module dependency. Allowed, but still icky. I might have to make coreutils a nested or separate Go module for the sake of avoiding the cycle. The alternative would be to ask u-root to split their module in two, so that their shell on top of our interpreter isn't in the same module as their coreutils, but that seems somewhat invasive as a request.

mvdan · 2022-12-15T23:44:52Z

Yet another fun quirk: u-root appear to not support Windows at all: https://github.com/mvdan/sh/actions/runs/3708802693/jobs/6286748332

That's probably fine for the project in general, but it would be nice if their coreutils were somewhat portable. That could be more work in upstream, if they'd allow it. It's not great news given that I think the main driver for this feature request is Windows portability.

andreynering · 2022-12-16T11:37:04Z

Thanks @mvdan for taking the time to work on this!

I have the impression that bringing support to Windows should be easy, at least with regard to fixing that particular error.

It's looks to me like just a matter of declaring a _windows.go version of this file:

mvdan · 2022-12-19T12:20:27Z

Yeah, this particular one is easy to fix. The tricky part will be whether upstream cares about testing and supporting these coreutils on Windows. If they do not, it might fall on us to do that, and they might break the coreutils at any point due to lack of CI for Windows.

The tests and examples were already using a form of middlewares. For example, ExampleExecHandler would handle some specific cases, and fall back to DefaultExecHandler. However, this fall back was hard-coded to DefaultExecHandler. The function wasn't a reusable middleware because of that. Instead, borrow the design of middlewares from go-chi: func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler) In such an API, each middleware is a function which takes "next", the next handler, and returns its own handler. This way, each middleware can choose whether to handle all calls, or just some of them - while passing on the rest to "next". This makes our API more flexible and our tests less awkward. Most importantly, it enables #93, as a coreutils ExecHandler by design will only be able to handle some coreutil commands and nothing else. For #93.

mvdan · 2023-01-15T15:02:34Z

Allow chaining multiple exec handlers via interp.ExecHandlers, where one returning interp.SkipHandler means that we run the next one rather than stopping. This is helpful for coreutils, as one may want to attempt to run a bundled coreutil, falling back to executing a program normally.

This is now #964. I wanted to get it reviewed and merged before I send the first PR for coreutils.

The tests and examples were already using a form of middlewares. For example, ExampleExecHandler would handle some specific cases, and fall back to DefaultExecHandler. However, this fall back was hard-coded to DefaultExecHandler. The function wasn't a reusable middleware because of that. Instead, borrow the design of middlewares from go-chi: func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler) In such an API, each middleware is a function which takes "next", the next handler, and returns its own handler. This way, each middleware can choose whether to handle all calls, or just some of them - while passing on the rest to "next". This makes our API more flexible and our tests less awkward. Most importantly, it enables #93, as a coreutils ExecHandler by design will only be able to handle some coreutil commands and nothing else. For #93.

mvdan · 2023-01-22T17:22:52Z

The ExecHandlers API is now merged, I improved my u-root soft fork to also pass the current directory, and the 93-coreutils branch is updated with both of those changes. The tests now work :)

Posted another comment at u-root/u-root#2527 (comment) to hopefully start the upstreaming process. They seem open to it.

mvdan · 2023-01-22T18:05:40Z

Also, I'm pretty happy with the coreutils API in this module. You can see it at master...93-coreutils. It's currently one sub-package in the same module, but it might end up being a separate module (see my thoughts in u-root/u-root#2527 (comment)). In any case, @andreynering, I'd like to hear your thoughts as a future API user.

andreynering · 2023-01-23T12:14:04Z

The code looks great to me.

If it helps to prevent any problems, I see no problem in making it a separate module. In this case, it's important to add some documentation with the link to the README, so people know it exists.

mvdan · 2023-01-23T16:42:34Z

A separate package or module is indeed a bit harder to find, so I'd add a link in the interp package godoc. I don't think I want to add more content to the README specific to just one or two packages. The README is large enough as it is :)

leaanthony · 2024-01-29T09:51:56Z

Any news on this? Thanks 🙏

mvdan changed the title ~~interp: add more coreutils as builtins~~ interp: consider adding some coreutils as builtins May 4, 2017

mvdan added the enhancement label May 4, 2017

mvdan mentioned this issue Aug 9, 2017

interp: make pieces of the interpreter modular and replaceable #147

Closed

mvdan mentioned this issue Jan 12, 2018

Make tools importable ericlagergren/go-coreutils#114

Open

andreynering mentioned this issue Apr 13, 2019

Support built-in shell commands for an improved Windows support go-task/task#197

Open

andreynering mentioned this issue Mar 14, 2020

Windows 10 - "sleep": executable file not found in $PATH go-task/task#305

Closed

andreynering mentioned this issue Sep 6, 2021

Command can not be run in Windows go-task/task#319

Closed

andreynering mentioned this issue Nov 21, 2021

Message to the community go-task/task#610

Closed

mvdan mentioned this issue Sep 13, 2022

interp: support all Bourne shell (POSIX) builtins #97

Open

mvdan mentioned this issue Oct 16, 2022

cmds/core: provide importable main-like APIs for coreutils u-root/u-root#2527

Open

mvdan mentioned this issue Jan 15, 2023

interp: add ExecHandlers to support exec middlewares #964

Merged

joerdav mentioned this issue Feb 24, 2023

Support for multiple shells(os specific) joerdav/xc#41

Closed

pd93 mentioned this issue Jul 17, 2024

Errors with taskfile and windows go-task/task#1723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interp: consider adding some coreutils as builtins #93

interp: consider adding some coreutils as builtins #93

mvdan commented Apr 25, 2017 •

edited

Loading

mvdan commented Apr 25, 2017 •

edited

Loading

mvdan commented Apr 25, 2017

andreynering commented Apr 25, 2017

mvdan commented Apr 25, 2017

mvdan commented Jan 12, 2018

mvdan commented Oct 28, 2018

ebfe commented Oct 28, 2018 via email

andreynering commented Oct 28, 2018

mvdan commented Oct 28, 2018

ebfe commented Oct 28, 2018 via email

mvdan commented Oct 28, 2018

andreynering commented Sep 6, 2021

mvdan commented Sep 7, 2021

mvdan commented Oct 19, 2022

mvdan commented Oct 23, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

andreynering commented Dec 16, 2022

mvdan commented Dec 19, 2022

mvdan commented Jan 15, 2023

mvdan commented Jan 22, 2023

mvdan commented Jan 22, 2023

andreynering commented Jan 23, 2023

mvdan commented Jan 23, 2023

leaanthony commented Jan 29, 2024

interp: consider adding some coreutils as builtins #93

interp: consider adding some coreutils as builtins #93

Comments

mvdan commented Apr 25, 2017 • edited Loading

mvdan commented Apr 25, 2017 • edited Loading

mvdan commented Apr 25, 2017

andreynering commented Apr 25, 2017

mvdan commented Apr 25, 2017

mvdan commented Jan 12, 2018

mvdan commented Oct 28, 2018

ebfe commented Oct 28, 2018 via email

andreynering commented Oct 28, 2018

mvdan commented Oct 28, 2018

ebfe commented Oct 28, 2018 via email

mvdan commented Oct 28, 2018

andreynering commented Sep 6, 2021

mvdan commented Sep 7, 2021

mvdan commented Oct 19, 2022

mvdan commented Oct 23, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

mvdan commented Dec 15, 2022

andreynering commented Dec 16, 2022

mvdan commented Dec 19, 2022

mvdan commented Jan 15, 2023

mvdan commented Jan 22, 2023

mvdan commented Jan 22, 2023

andreynering commented Jan 23, 2023

mvdan commented Jan 23, 2023

leaanthony commented Jan 29, 2024

mvdan commented Apr 25, 2017 •

edited

Loading

mvdan commented Apr 25, 2017 •

edited

Loading