Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct shell quoting in REPL #13195

Closed
wants to merge 2 commits into from
Closed

Conversation

eschnett
Copy link
Contributor

Fixes #10120.

@StefanKarpinski
Copy link
Member

Uh, wat?

@vtjnash
Copy link
Member

vtjnash commented Sep 18, 2015

Does this work for cases like > redirection and ~ expansion?

@eschnett eschnett changed the title Correct shell quoting WIP: Correct shell quoting Sep 18, 2015
@eschnett
Copy link
Contributor Author

@vtjnash ...no. Oops.

@StefanKarpinski
Copy link
Member

This function is used to display Cmd objects, so gutting it like this is not acceptable.

@eschnett eschnett changed the title WIP: Correct shell quoting Correct shell quoting Sep 18, 2015
@eschnett
Copy link
Contributor Author

Made shell quoting more comfortable so that Cmd objects look nice again.

Corrected an error in the shell parser where backslashes within double quotes would not be properly handled. Added a test for this.

Changed the way the REPL parses shell arguments (see the respective commit). Don't use Julia's shell parser, which is the wrong tool, since Julia's backtick syntax is quite different from shell syntax, and does not support output redirection. Simply split the command via split, and pass it to the shell.

@eschnett eschnett changed the title Correct shell quoting Correct shell quoting in REPL Sep 18, 2015
@eschnett
Copy link
Contributor Author

Here is the use case that is corrected by my quoting changes. It has to do with double quoting, i.e. with using a Cmd in a Cmd.

cmd=`echo (hello)`;
run(cmd)
ecmd=Base.shell_escape(cmd)
cmd2=`sh -c $ecmd`
run(ecmd)

The first run(cmd) properly outputs (hello). The second run(ecmd) is currently broken and outputs an error such as

bash: -c: line 0: syntax error near unexpected token `hello'
bash: -c: line 0: `echo (hello)'

With the quoting changes here, this properly outputs (hello) as well.

Julia's "shell parsing" (as used for backticks) excplicitly states that output redirection, pipes etc. are not supported via their shell metacharacters. Thus parsing a REPL shell command via the backtick mechanism makes it impossible to redirect output, unless shell quoting has loopholes and lets these metacharacters through (which leads to severe problems if one e.g. wants to quote an already-quoted string).

For example, Julia's shell parsing returns the same result for these two different commands:
   echo '>a'
   echo >a
In both cases, the command ["echo", ">a"] is generated, assuming that the character ">" is not special. It is then impossible to extract the original REPL user's intention from this command. Thus the REPL needs to parse shell commands via `split`, which suffices to determine whether the command is `cd`. Everything else is left to the shell.
@StefanKarpinski
Copy link
Member

Honestly, this seems to me to be a quixotic endeavor. Every shell has different quoting rules and significant characters. Julia's Cmd parsing is a minimal but common subset of this. If you're writing shell commands using ; that are much more complex than this, I'm not sure how we can really make it work. The only really feasible approach is probably to do something that avoids escaping altogether.

@eschnett
Copy link
Contributor Author

@StefanKarpinski The semantics of /bin/sh are well-defined, and most shells adhere to it. The escaping rules I've used (single quotes, double quotes, backslashes) should work for all sh-like shells. For example, the configure script produced by autotools use such rules, and they run everywhere. So it's certainly possible.

Also, the corrections (and the example above) are independent of the REPL. This is about the behaviour of Julia's backquotes to create Cmd objects. The REPL just happens to use Cmd object to execute shell commands, and happens to run afoul of this when one uses complex REPL commands.

I wouldn't call it "quixotic", as I'm claiming that it works now...

@tkelman
Copy link
Contributor

tkelman commented Sep 18, 2015

This code also gets used in non-posix shell code paths where the rules are quite different and likely under-tested. Jameson knows more and can judge the correctness there better than I can.

@StefanKarpinski
Copy link
Member

Cmd doesn't execute shell commands – there's no shell involved.

@vtjnash
Copy link
Member

vtjnash commented Sep 18, 2015

I think the second commit is on the right track, but if you would also need to provide a custom parser to find and handle $ interpolation. See the second paragraph at #10120 (comment)

FWIW, I still think we should stop trying to use the shell at all and just directly implement ~ expansion and > redirection into julia's Cmd syntax (ie. directly support sh command syntax, sans alias).

@eschnett
Copy link
Contributor Author

@StefanKarpinski I know that Cmd doesn't invoke a shell, but the REPL wants to call the shell and thus constructs $shell -c some-command.

It seems you're opposed to making shell_escape do actual escaping for a shell. I'll work around the REPL issue differently.

However, I want to add docstring to shell_escape stating that this routine should only be used to print text to humans, and not to pass anything to an actual shell. I think the latter point is important, as wrong quoting to a shell can easily used to attack applications, and Google searches point to this function for shell escaping in Julia. Quoting problems are also easily confusing for people.

(Actually, the first Google hit is https://groups.google.com/forum/#!msg/julia-users/vo-jupVWyhs/2mxhSHWmvgIJ, where you yourself suggest to use Base.shell_escape for quoting, in just the manner that doesn't currently work?)

Maybe we should rename the function from shell_escape to shell_show or show_cmd? The related function shell_parse should then maybe be called parse_cmd or backtick_parse?

Maybe we should introduce the function that I am trying to introduce as posix_shell_escape? On the other hand, that can also easily be an external package (unless the REPL requires it; the REPL uses /bin/sh and requires some of its syntax on non-Windows systems.)

@StefanKarpinski
Copy link
Member

I agree that we should add those features to backtick syntax (which wouldn't always produce single Cmd objects anymore), but that doesn't help this. The ; mode should behave as much like your normal shell as possible, including aliases, etc., which is never going to work by us reimplementing shell features.

@StefanKarpinski
Copy link
Member

It should probably be called cmd_escape instead of shell_escape, although the two are approximately the same. I have no problem with making this escape more thoroughly so that it works for sh but not if it breaks the usability of its primary use case: escaping Cmd objects to print them in backticks.

@eschnett
Copy link
Contributor Author

Regarding reimplementing shell features: I completely agree. Here is the description of my second commit -- I should have copied this into the pull request message:

Don't shell-parse REPL shell commands to make redirection etc. work

Julia's "shell parsing" (as used for backticks) excplicitly states that output redirection, pipes etc. are not supported via their shell metacharacters. Thus parsing a REPL shell command via the backtick mechanism makes it impossible to redirect output, unless shell quoting has loopholes and lets these metacharacters through (which leads to severe problems if one e.g. wants to quote an already-quoted string).

For example, Julia's shell parsing returns the same result for these two different commands:

   echo '>a'
   echo >a

In both cases, the command ["echo", ">a"] is generated, assuming that the character ">" is not special. It is then impossible to extract the original REPL user's intention from this command. Thus the REPL needs to parse shell commands via split, which suffices to determine whether the command is cd. Everything else is left to the shell.

@eschnett
Copy link
Contributor Author

@StefanKarpinski I don't think it breaks printing objects in backticks. I've added more ways to print -- in addition to escaping via single and double quotes, I also added escaping via backslashes. For example, hello world can be quoted as hello\ world. If you prefer to avoid this, then that's easy to do.

@StefanKarpinski
Copy link
Member

It would be totally fine to always escape characters that are special to sh even if we don't consider them special when parsing backtick syntax. So, for example, that would mean that if you input echo >a it would get printed as echo '>a' – that would probably be helpful since it make it clear that backticks are not treating > as special.

Related: we may want to start making unescaped >, <, >>, etc. warnings soon if we want to make them mean something later. Ditto with * and such.

@eschnett
Copy link
Contributor Author

Yes, that's what the new escaping algorithm does. It has a white list of "good characters", and everything else is quoted. It tries quoting in three different ways, and chooses the shortest result.

@kshyatt kshyatt added the REPL Julia's REPL (Read Eval Print Loop) label Sep 18, 2015
if isempty(cmd.exec)
throw(ArgumentError("no cmd to execute"))
elseif cmd.exec[1] == "cd"
cmds = split(line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle paths with spaces, e.g.

shell> cd f\ g
shell> cd 'f g'
shell> cd "f g"

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that part seems pretty broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this needs to use Base.shell_split instead.

@StefanKarpinski
Copy link
Member

I don't particularly care for the backslashing. Except for pathological cases, it can only ever be one character shorter and it's always less clear, imo. Otherwise this seems to be on the right track.

@StefanKarpinski
Copy link
Member

@eschnett, how about breaking this PR into two parts:

  1. making shell_escape escape all characters that sh considers special.
  2. changing how ; parsing works.

The first part is pretty uncontroversial and should be easy to merge. The second part needs more work.

@ScottPJones
Copy link
Contributor

I agree with @StefanKarpinski here about the backslashing, everything else fine.

# terminal.
parline = "($line)"
ttyflag = isa(STDIN, TTY) ? "-i" : ""
unixcmd = `$shell $ttyflag -c $parline`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When stdin isn't a TTY, you get $shell '' -c ..., which will probably fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will. Got confused by string vs. command interpolation.

@eschnett
Copy link
Contributor Author

For backslashes, a good example will be escaping hello'world. You can escape this either as "hello'world" or as hello\'world. I found that using backslashes for quoting leads to particularly more readable results if words get quoted multiple times (e.g. if passed to a shell that calls ssh on a remote system where things are evaluated by a shell again). Otherwise I'm not too fond of it either, especially when used to quote spaces.

I'll prepare two separate pull requests.

@StefanKarpinski
Copy link
Member

Maybe require the backslashing approach to shorter by two characters to win?

@eschnett
Copy link
Contributor Author

I already went for 10 characters (instead of two).

@eschnett
Copy link
Contributor Author

Superseded by PRs #13214 and #13215.

@eschnett eschnett closed this Sep 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REPL Julia's REPL (Read Eval Print Loop)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants