Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interp: consider adding some coreutils as builtins #93

Open
mvdan opened this issue Apr 25, 2017 · 27 comments
Open

interp: consider adding some coreutils as builtins #93

mvdan opened this issue Apr 25, 2017 · 27 comments

Comments

@mvdan
Copy link
Owner

mvdan commented Apr 25, 2017

coreutils as of 8.27 has a total of 103 executables, ranging from wc to test and sort.

Plenty of scripts depend on these. If we really want interp to be platform-independent, we will have to implement some of these as builtins as they won't be available in some systems like Windows (and Mac?).

Bash 4.4 already has a few as builtins:

 $ for c in $(pacman -Ql coreutils | grep '\/bin' | sed 's@.*/@@'); do type -a $c; done | grep builtin
[ is a shell builtin
echo is a shell builtin
false is a shell builtin
printf is a shell builtin
pwd is a shell builtin
test is a shell builtin
true is a shell builtin

We already have all of these except for [ and test - see #92.

I obviously won't implement all 103 in one go, but I could start with the most common and simple. Unfortunately, these being GNU programs they all have tons of options and gotchas, like cat having a dozen options.

One open question is whether these should always be builtins like echo. Other options include:

  • Use them as fallback only if the system version is not available
  • Enable them via a flag, like "extended builtins"

For the sake of simplicity, I'm inclined towards them always being builtins. The only downside is scripts that depend on GNU flags that are somewhat obscure. But these scripts wouldn't be portable to begin with.

@mvdan
Copy link
Owner Author

mvdan commented Apr 25, 2017

A good way to make the list of 103 executable shorter is to filter it through the list of POSIX Shell utilities, which has 160 executables: http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html

Those docs are also a good start, as each command is simpler and has less options - cat only has a single option.

The number of executables that are both in coreutils and in the POSIX utilities list are 60:

$ comm -12 <(pacman -Ql coreutils | grep '\/bin' | sed 's@.*/@@') posix | column -c 70
basename        echo            mv              stty
cat             env             nice            tail
chgrp           expand          nl              tee
chmod           expr            nohup           test
chown           false           od              touch
cksum           fold            paste           tr
comm            head            pathchk         true
cp              id              pr              tsort
csplit          join            printf          tty
cut             link            pwd             uname
date            ln              rm              unexpand
dd              logname         rmdir           uniq
df              ls              sleep           unlink
dirname         mkdir           sort            wc
du              mkfifo          split           who

Some of these like true and echo we already have. And most of the ones we don't yet have are common and seem fairly simple to implement.

@mvdan
Copy link
Owner Author

mvdan commented Apr 25, 2017

@andreynering this might be of interest to you if you want unix scripts to work on Windows

@andreynering
Copy link
Collaborator

@mvdan Yeah, probably it's a lot of work, but would definitely be helpful.

The most basic buildins would already be very helpful:

  • rm
  • cp
  • mv
  • mkdir
  • ls
  • touch
  • chmod

@mvdan
Copy link
Owner Author

mvdan commented Apr 25, 2017

Also worth noting that there are some fairly common commands that aren't part of coreutils. sed and grep come to mind, for example.

@mvdan mvdan changed the title interp: add more coreutils as builtins interp: consider adding some coreutils as builtins May 4, 2017
@mvdan
Copy link
Owner Author

mvdan commented Jan 12, 2018

@andreynering see the issue above - I'm currently looking at piggybacking off someone else's coreutils implementation.

@mvdan
Copy link
Owner Author

mvdan commented Oct 28, 2018

It looks like that coreutils project has lost traction, so it could be years until anyone completes it and makes it importable. I see two alternatives.

Option one: we add an easy way to bundle a coreutils implementation like BusyBox, https://github.com/uutils/coreutils, or any other implementation that can be bundled as a single static binary. Then, one could include said static binary alongside the Go binary, or inside the Go binary as a compressed variable/constant somewhere.

Option two: we implement the most common coreutils tools ourselves.

Option one is much, much less work. I definitely don't want to reimplement coreutils, especially not with all of GNU's knobs and portability nightmares.

I realise that option one won't be easy to set up, may be slower than native Go implementations, and may make binaries larger. However, it's a much saner option in the long run, one can choose the coreutils implementation, and it still solves the portability issue, which is the main concern.

If anyone would like to work on it, please speak up. The API would likely be something like:

// ExecWithBusybox will use a busybox-like binary at path to execute all programs belonging to a
// busybox implementation. For example, executing {"cp", "foo", "bar"} would actually execute
// {"/path/to/busybox", "cp", "foo", "bar"}.
func ExecWithBusybox(path string, next ModuleExec) ModuleExec

Then it would be up to the application to decide how to bundle the busybox binary itself. I don't think that implementation detail belongs in our API.

@ebfe
Copy link
Contributor

ebfe commented Oct 28, 2018 via email

@andreynering
Copy link
Collaborator

Is BusyBox compatible with Windows? I don't think so.

Another alternative would be forking the go-coreutils project and make some of the already implemented tools (mv, ls, rm, cp, etc...) importable.

@mvdan
Copy link
Owner Author

mvdan commented Oct 28, 2018

Is BusyBox compatible with Windows? I don't think so.

Good point. However, note that more modern implementations like uutils do support Windows, so that should be a better option if you prioritise portability.

Another alternative would be forking the go-coreutils project

The amount of work required would be non-trivial, and I don't intend to maintain such a project, so I'll give that a pass myself :) It's also not directly related to a shell package, so it doesn't need to live here. Others are welcome to do that work separately, of course.

func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin)

I'm starting to think that the current ModuleExec API is enough to do what you want here. For example, all that your wrapper would do is (ignoring the old bit):

func (r *Runner) SetBuiltin(name string, handler Builtin) {
    r.Exec = ModuleExec(func(ctx context.Context, path string, args []string) error {
        if args[0] == name {
            return handler(ctx, args)
        }
        return r.Exec(ctx, path, args)
    }
})

Of course, that doesn't really scale if one has hundreds of overriding layers like these. But one could have a single layer for all the busybox "builtins", which is kinda like what I was suggesting earlier. The earlier version would be like:

func ExecWithBusybox(path string, next ModuleExec) ModuleExec {
    return ModuleExec(func(ctx context.Context, path string, args []string) error {
        switch args[0] {
        case "cp", "all-other-coreutils":
            mc, _ := FromModuleContext(ctx)
            cmd := exec.CommandContext(ctx, path, args)
            cmd.Dir = mc.Dir
            cmd.Stdin = mc.Stdin
            cmd.Stdout = mc.Stdout
            cmd.Stderr = mc.Stderr
            // etc etc
            return cmd.Run()
        }
        return next(ctx, path, args)
    })
}

So, really, both APIs be implemented outside of the interp package. I think ExecWithBusybox is a bit more complex, but they're both just thin wrappers over what one can already do with ModuleExec. So perhaps we shouldn't do anything after all.

@ebfe
Copy link
Contributor

ebfe commented Oct 28, 2018 via email

@mvdan
Copy link
Owner Author

mvdan commented Oct 28, 2018

Ah, I had forgotten that builtins were hard-coded into the interpreter.

At first I wanted to leave that to ModuleExec, but I figured that it would make the module too complex. DefaultExec already does enough stuff - imagine if it had to also take care of builtins. I was worried that it would be too hard for others to replace r.Exec, since I imagined that most people didn't care about builtins.

But perhaps we could layer them too. That is, declare DefaultExec as var DefaultExec = ExecWithBuiltins(ExecStartProcess) (where ExecStartProcess would be the current DefaultExec, just using os/exec).

Then it would be possible to bypass the entirety of the builtins - one just has to do r.Exec = ExecStartProcess or r.Exec = myBuiltins(ExecStartProcess). And overriding a single builtin would still be possible, via r.Exec = myCpBuiltin(ExecWithBuiltins(ExecStartProcess))).

This seems like it exposes less API, and by folding builtins into ModuleExec, makes that module more powerful.

@andreynering
Copy link
Collaborator

I'd like to have more time to help this move forward, but as for everyone, time is limited. Anyway, bringing this some discussion may be helpful.

Someone in the Task repo mentioned that https://github.com/u-root/u-root has many builtins implemented in Go. This folder contains the main packages. In some cases the core implementation was done on a separated package here, which means we could import and use them.

As I said before, having just the basic builtins like cp, mv, rm and mkdir would add a huge value for those on Windows as these are the builtins used 90% of the time, at least on automation tools like Task.

@mvdan What would you expect of someone trying to move this forward? Would it be acceptable to import this project and use it directly on interp/builtin.go? Or maybe you prefer to have an abstraction layer using ModuleExec and move that into another package?

@mvdan
Copy link
Owner Author

mvdan commented Sep 7, 2021

I had seen the u-root coreutils, but I did not know they were exposed as non-main packages too. That sounds wonderful.

Their go.mod is multiple times as big as ours though, so I'm uneasy about adding a direct dependency. Even with Go 1.17's trimmed module graphs, forcing anyone that imports the interpreter to depend on u-root and its dependencies feels overkill.

That said, I wouldn't oppose a subpackage, like mvdan.cc/sh/v3/interp/coreutils exposing a ModuleExec API. That way, it will be possible to import the vanilla interpreter, and thanks to trimmed module graphs, one shouldn't even need to download u-root unless necessary.

@mvdan
Copy link
Owner Author

mvdan commented Oct 19, 2022

I started to look into u-root as a solution, as described above. See u-root/u-root#2527. I'll wait another week or so to see if they get back to me, as it would be best to coordinate the needed changes with them.

@mvdan
Copy link
Owner Author

mvdan commented Oct 23, 2022

But perhaps we could layer them too. That is, declare DefaultExec as var DefaultExec = ExecWithBuiltins(ExecStartProcess) (where ExecStartProcess would be the current DefaultExec, just using os/exec).

Unfortunately, layering like that with the existing ExecHandler API won't work, because the shell needs to know the set of builtins for commands like type or builtin. But perhaps a thought for v4 - rethink the "exec handler" so that it can absorb builtins entirely. Then the user can replace or extend the builtins as they please.

@mvdan
Copy link
Owner Author

mvdan commented Dec 15, 2022

I'm actually not sure why we were talking about builtins above. We can support providing coreutils implemented in Go without treating them as builtins, via ExecHandler. They would not execute a real program, instead running a bit of Go code. They wouldn't work if one ran builtin cp, but I don't think anyone would expect that - they are not actually builtins per the POSIX spec or Bash manual.

Anyway, I made some progress on this over the last couple of weeks. Thanks to @JohnHardy for nudging me along :) I made cp importable in u-root, which unfortunately required a non-trivial refactor, since half of the implementation was in a package main. See https://github.com/mvdan/u-root-coreutils/tree/coreutils.

You can see our side of the changes at https://github.com/mvdan/sh/tree/93-coreutils. The two changes are:

  • A new package, interp/coreutils, which exposes a func Handle of type interp.ExecHandlerFunc.
  • Allow chaining multiple exec handlers via interp.ExecHandlers, where one returning interp.SkipHandler means that we run the next one rather than stopping. This is helpful for coreutils, as one may want to attempt to run a bundled coreutil, falling back to executing a program normally.

I'll probably add support for similar chaining in the other handlers, because "run some logic in some cases, fall back to the default logic in others" is a rather common need.

@mvdan
Copy link
Owner Author

mvdan commented Dec 15, 2022

I forgot to mention: I am looking for feedback on this design, as well as the added dependency on u-root, before I spend more time modifying u-root to export more of their coreutils. Note that the added dependency is somewhat large, but it's only pulled in if one explicitly imports the new coreutils package.

cc @andreynering for go-task, as well as @zimbatm @riacataquian @ebfe @theclapp

@mvdan
Copy link
Owner Author

mvdan commented Dec 15, 2022

Err, worth flagging that the u-root Go module currently depends on our module:

https://github.com/u-root/u-root/blob/904692535c70f103396524ae535a2e7bc89cb75a/go.mod#L44

If we upstream our changes in the future, then we'd have a cyclic module dependency. Allowed, but still icky. I might have to make coreutils a nested or separate Go module for the sake of avoiding the cycle. The alternative would be to ask u-root to split their module in two, so that their shell on top of our interpreter isn't in the same module as their coreutils, but that seems somewhat invasive as a request.

@mvdan
Copy link
Owner Author

mvdan commented Dec 15, 2022

Yet another fun quirk: u-root appear to not support Windows at all: https://github.com/mvdan/sh/actions/runs/3708802693/jobs/6286748332

That's probably fine for the project in general, but it would be nice if their coreutils were somewhat portable. That could be more work in upstream, if they'd allow it. It's not great news given that I think the main driver for this feature request is Windows portability.

@andreynering
Copy link
Collaborator

Thanks @mvdan for taking the time to work on this!

I have the impression that bringing support to Windows should be easy, at least with regard to fixing that particular error.

It's looks to me like just a matter of declaring a _windows.go version of this file:

@mvdan
Copy link
Owner Author

mvdan commented Dec 19, 2022

Yeah, this particular one is easy to fix. The tricky part will be whether upstream cares about testing and supporting these coreutils on Windows. If they do not, it might fall on us to do that, and they might break the coreutils at any point due to lack of CI for Windows.

mvdan added a commit that referenced this issue Jan 15, 2023
The tests and examples were already using a form of middlewares.
For example, ExampleExecHandler would handle some specific cases,
and fall back to DefaultExecHandler.

However, this fall back was hard-coded to DefaultExecHandler.
The function wasn't a reusable middleware because of that.

Instead, borrow the design of middlewares from go-chi:

	func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler)

In such an API, each middleware is a function which takes "next",
the next handler, and returns its own handler.
This way, each middleware can choose whether to handle all calls,
or just some of them - while passing on the rest to "next".

This makes our API more flexible and our tests less awkward.
Most importantly, it enables #93, as a coreutils ExecHandler by design
will only be able to handle some coreutil commands and nothing else.

For #93.
@mvdan
Copy link
Owner Author

mvdan commented Jan 15, 2023

Allow chaining multiple exec handlers via interp.ExecHandlers, where one returning interp.SkipHandler means that we run the next one rather than stopping. This is helpful for coreutils, as one may want to attempt to run a bundled coreutil, falling back to executing a program normally.

This is now #964. I wanted to get it reviewed and merged before I send the first PR for coreutils.

mvdan added a commit that referenced this issue Jan 22, 2023
The tests and examples were already using a form of middlewares.
For example, ExampleExecHandler would handle some specific cases,
and fall back to DefaultExecHandler.

However, this fall back was hard-coded to DefaultExecHandler.
The function wasn't a reusable middleware because of that.

Instead, borrow the design of middlewares from go-chi:

	func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler)

In such an API, each middleware is a function which takes "next",
the next handler, and returns its own handler.
This way, each middleware can choose whether to handle all calls,
or just some of them - while passing on the rest to "next".

This makes our API more flexible and our tests less awkward.
Most importantly, it enables #93, as a coreutils ExecHandler by design
will only be able to handle some coreutil commands and nothing else.

For #93.
mvdan added a commit that referenced this issue Jan 22, 2023
The tests and examples were already using a form of middlewares.
For example, ExampleExecHandler would handle some specific cases,
and fall back to DefaultExecHandler.

However, this fall back was hard-coded to DefaultExecHandler.
The function wasn't a reusable middleware because of that.

Instead, borrow the design of middlewares from go-chi:

	func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler)

In such an API, each middleware is a function which takes "next",
the next handler, and returns its own handler.
This way, each middleware can choose whether to handle all calls,
or just some of them - while passing on the rest to "next".

This makes our API more flexible and our tests less awkward.
Most importantly, it enables #93, as a coreutils ExecHandler by design
will only be able to handle some coreutil commands and nothing else.

For #93.
@mvdan
Copy link
Owner Author

mvdan commented Jan 22, 2023

The ExecHandlers API is now merged, I improved my u-root soft fork to also pass the current directory, and the 93-coreutils branch is updated with both of those changes. The tests now work :)

Posted another comment at u-root/u-root#2527 (comment) to hopefully start the upstreaming process. They seem open to it.

@mvdan
Copy link
Owner Author

mvdan commented Jan 22, 2023

Also, I'm pretty happy with the coreutils API in this module. You can see it at master...93-coreutils. It's currently one sub-package in the same module, but it might end up being a separate module (see my thoughts in u-root/u-root#2527 (comment)). In any case, @andreynering, I'd like to hear your thoughts as a future API user.

@andreynering
Copy link
Collaborator

The code looks great to me.

If it helps to prevent any problems, I see no problem in making it a separate module. In this case, it's important to add some documentation with the link to the README, so people know it exists.

@mvdan
Copy link
Owner Author

mvdan commented Jan 23, 2023

A separate package or module is indeed a bit harder to find, so I'd add a link in the interp package godoc. I don't think I want to add more content to the README specific to just one or two packages. The README is large enough as it is :)

@leaanthony
Copy link

Any news on this? Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants