-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interp: consider adding some coreutils as builtins #93
Comments
A good way to make the list of 103 executable shorter is to filter it through the list of POSIX Shell utilities, which has 160 executables: http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html Those docs are also a good start, as each command is simpler and has less options - The number of executables that are both in
Some of these like |
@andreynering this might be of interest to you if you want unix scripts to work on Windows |
@mvdan Yeah, probably it's a lot of work, but would definitely be helpful. The most basic buildins would already be very helpful:
|
Also worth noting that there are some fairly common commands that aren't part of coreutils. |
@andreynering see the issue above - I'm currently looking at piggybacking off someone else's coreutils implementation. |
It looks like that coreutils project has lost traction, so it could be years until anyone completes it and makes it importable. I see two alternatives. Option one: we add an easy way to bundle a coreutils implementation like BusyBox, https://github.com/uutils/coreutils, or any other implementation that can be bundled as a single static binary. Then, one could include said static binary alongside the Go binary, or inside the Go binary as a compressed variable/constant somewhere. Option two: we implement the most common coreutils tools ourselves. Option one is much, much less work. I definitely don't want to reimplement coreutils, especially not with all of GNU's knobs and portability nightmares. I realise that option one won't be easy to set up, may be slower than native Go implementations, and may make binaries larger. However, it's a much saner option in the long run, one can choose the coreutils implementation, and it still solves the portability issue, which is the main concern. If anyone would like to work on it, please speak up. The API would likely be something like:
Then it would be up to the application to decide how to bundle the busybox binary itself. I don't think that implementation detail belongs in our API. |
I'd like to throw in another idea which makes implementing both options
viable: Allow custom builtins.
API could look something like:
```
type Builtin func(ctxt context.Context, argv []string) error
func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin)
```
The handler could then exec busybox or call into an coreutils like API.
|
Is BusyBox compatible with Windows? I don't think so. Another alternative would be forking the go-coreutils project and make some of the already implemented tools (mv, ls, rm, cp, etc...) importable. |
Good point. However, note that more modern implementations like uutils do support Windows, so that should be a better option if you prioritise portability.
The amount of work required would be non-trivial, and I don't intend to maintain such a project, so I'll give that a pass myself :) It's also not directly related to a shell package, so it doesn't need to live here. Others are welcome to do that work separately, of course.
I'm starting to think that the current
Of course, that doesn't really scale if one has hundreds of overriding layers like these. But one could have a single layer for all the busybox "builtins", which is kinda like what I was suggesting earlier. The earlier version would be like:
So, really, both APIs be implemented outside of the interp package. I think |
> func (r *Runner) SetBuiltin(name string, handler Builtin) (old Builtin)
I'm starting to think that the current `ModuleExec` API is enough to do what you want here.
I don't think ModuleExec can override the existing builtins. (Though I'm
unsure how useful this is outside maybe trap and adding currently
unimplemented ones).
Of course, that doesn't really scale if one has hundreds of overriding layers like these.
I was thinking more of something like this, which doesn't stack up
multiple layers:
```
type Runner struct {
[...]
builtins map[string]Builtin
[...]
}
func (r *Runner) SetBuiltin(name, handler Builtin) (Builtin) {
// Or just expose the builtins map.
old = r[name]
r.builtins[name] = handler
return old
}
```
and adjusting isBuiltin/builtinCode use the map.
|
Ah, I had forgotten that builtins were hard-coded into the interpreter. At first I wanted to leave that to But perhaps we could layer them too. That is, declare Then it would be possible to bypass the entirety of the builtins - one just has to do This seems like it exposes less API, and by folding builtins into |
I'd like to have more time to help this move forward, but as for everyone, time is limited. Anyway, bringing this some discussion may be helpful. Someone in the Task repo mentioned that https://github.com/u-root/u-root has many builtins implemented in Go. This folder contains the As I said before, having just the basic builtins like @mvdan What would you expect of someone trying to move this forward? Would it be acceptable to import this project and use it directly on |
I had seen the u-root coreutils, but I did not know they were exposed as non-main packages too. That sounds wonderful. Their go.mod is multiple times as big as ours though, so I'm uneasy about adding a direct dependency. Even with Go 1.17's trimmed module graphs, forcing anyone that imports the interpreter to depend on u-root and its dependencies feels overkill. That said, I wouldn't oppose a subpackage, like |
I started to look into u-root as a solution, as described above. See u-root/u-root#2527. I'll wait another week or so to see if they get back to me, as it would be best to coordinate the needed changes with them. |
Unfortunately, layering like that with the existing ExecHandler API won't work, because the shell needs to know the set of builtins for commands like |
I'm actually not sure why we were talking about builtins above. We can support providing coreutils implemented in Go without treating them as builtins, via Anyway, I made some progress on this over the last couple of weeks. Thanks to @JohnHardy for nudging me along :) I made You can see our side of the changes at https://github.com/mvdan/sh/tree/93-coreutils. The two changes are:
I'll probably add support for similar chaining in the other handlers, because "run some logic in some cases, fall back to the default logic in others" is a rather common need. |
I forgot to mention: I am looking for feedback on this design, as well as the added dependency on u-root, before I spend more time modifying u-root to export more of their coreutils. Note that the added dependency is somewhat large, but it's only pulled in if one explicitly imports the new cc @andreynering for go-task, as well as @zimbatm @riacataquian @ebfe @theclapp |
Err, worth flagging that the u-root Go module currently depends on our module: https://github.com/u-root/u-root/blob/904692535c70f103396524ae535a2e7bc89cb75a/go.mod#L44 If we upstream our changes in the future, then we'd have a cyclic module dependency. Allowed, but still icky. I might have to make |
Yet another fun quirk: u-root appear to not support Windows at all: https://github.com/mvdan/sh/actions/runs/3708802693/jobs/6286748332 That's probably fine for the project in general, but it would be nice if their coreutils were somewhat portable. That could be more work in upstream, if they'd allow it. It's not great news given that I think the main driver for this feature request is Windows portability. |
Thanks @mvdan for taking the time to work on this! I have the impression that bringing support to Windows should be easy, at least with regard to fixing that particular error. It's looks to me like just a matter of declaring a |
Yeah, this particular one is easy to fix. The tricky part will be whether upstream cares about testing and supporting these coreutils on Windows. If they do not, it might fall on us to do that, and they might break the coreutils at any point due to lack of CI for Windows. |
The tests and examples were already using a form of middlewares. For example, ExampleExecHandler would handle some specific cases, and fall back to DefaultExecHandler. However, this fall back was hard-coded to DefaultExecHandler. The function wasn't a reusable middleware because of that. Instead, borrow the design of middlewares from go-chi: func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler) In such an API, each middleware is a function which takes "next", the next handler, and returns its own handler. This way, each middleware can choose whether to handle all calls, or just some of them - while passing on the rest to "next". This makes our API more flexible and our tests less awkward. Most importantly, it enables #93, as a coreutils ExecHandler by design will only be able to handle some coreutil commands and nothing else. For #93.
This is now #964. I wanted to get it reviewed and merged before I send the first PR for coreutils. |
The tests and examples were already using a form of middlewares. For example, ExampleExecHandler would handle some specific cases, and fall back to DefaultExecHandler. However, this fall back was hard-coded to DefaultExecHandler. The function wasn't a reusable middleware because of that. Instead, borrow the design of middlewares from go-chi: func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler) In such an API, each middleware is a function which takes "next", the next handler, and returns its own handler. This way, each middleware can choose whether to handle all calls, or just some of them - while passing on the rest to "next". This makes our API more flexible and our tests less awkward. Most importantly, it enables #93, as a coreutils ExecHandler by design will only be able to handle some coreutil commands and nothing else. For #93.
The tests and examples were already using a form of middlewares. For example, ExampleExecHandler would handle some specific cases, and fall back to DefaultExecHandler. However, this fall back was hard-coded to DefaultExecHandler. The function wasn't a reusable middleware because of that. Instead, borrow the design of middlewares from go-chi: func (mx *Mux) Use(middlewares ...func(http.Handler) http.Handler) In such an API, each middleware is a function which takes "next", the next handler, and returns its own handler. This way, each middleware can choose whether to handle all calls, or just some of them - while passing on the rest to "next". This makes our API more flexible and our tests less awkward. Most importantly, it enables #93, as a coreutils ExecHandler by design will only be able to handle some coreutil commands and nothing else. For #93.
The Posted another comment at u-root/u-root#2527 (comment) to hopefully start the upstreaming process. They seem open to it. |
Also, I'm pretty happy with the coreutils API in this module. You can see it at master...93-coreutils. It's currently one sub-package in the same module, but it might end up being a separate module (see my thoughts in u-root/u-root#2527 (comment)). In any case, @andreynering, I'd like to hear your thoughts as a future API user. |
The code looks great to me. If it helps to prevent any problems, I see no problem in making it a separate module. In this case, it's important to add some documentation with the link to the README, so people know it exists. |
A separate package or module is indeed a bit harder to find, so I'd add a link in the interp package godoc. I don't think I want to add more content to the README specific to just one or two packages. The README is large enough as it is :) |
Any news on this? Thanks 🙏 |
coreutils as of 8.27 has a total of 103 executables, ranging from
wc
totest
andsort
.Plenty of scripts depend on these. If we really want
interp
to be platform-independent, we will have to implement some of these as builtins as they won't be available in some systems like Windows (and Mac?).Bash 4.4 already has a few as builtins:
We already have all of these except for
[
andtest
- see #92.I obviously won't implement all 103 in one go, but I could start with the most common and simple. Unfortunately, these being GNU programs they all have tons of options and gotchas, like
cat
having a dozen options.One open question is whether these should always be builtins like
echo
. Other options include:For the sake of simplicity, I'm inclined towards them always being builtins. The only downside is scripts that depend on GNU flags that are somewhat obscure. But these scripts wouldn't be portable to begin with.
The text was updated successfully, but these errors were encountered: