-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/rename shell_escape_winsomely() and add escaping function for CMD.EXE syntax #38352
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -252,60 +252,103 @@ shell_escape_posixly(args::AbstractString...) = | |
sprint(print_shell_escaped_posixly, args...) | ||
|
||
|
||
function print_shell_escaped_winsomely(io::IO, args::AbstractString...) | ||
first = true | ||
for arg in args | ||
first || write(io, ' ') | ||
first = false | ||
# Quote any arg that contains a whitespace (' ' or '\t') or a double quote mark '"'. | ||
# It's also valid to quote an arg with just a whitespace, | ||
# but the following may be 'safer', and both implementations are valid anyways. | ||
quotes = any(c -> c in (' ', '\t', '"'), arg) || isempty(arg) | ||
quotes && write(io, '"') | ||
backslashes = 0 | ||
for c in arg | ||
if c == '\\' | ||
backslashes += 1 | ||
""" | ||
shell_escape_wincmd(s::AbstractString) | ||
shell_escape_wincmd(io::IO, s::AbstractString) | ||
|
||
The unexported `shell_escape_wincmd` function escapes Windows | ||
`cmd.exe` shell meta characters. It escapes `()!^<>&|` by placing a | ||
`^` in front. An `@` is only escaped at the start of the string. Pairs | ||
of `"` characters and the strings they enclose are passed through | ||
unescaped. Any remaining `"` is escaped with `^` to ensure that the | ||
number of unescaped `"` characters in the result remains even. | ||
|
||
Since `cmd.exe` substitutes variable references (like `%USER%`) | ||
_before_ processing the escape characters `^` and `"`, this function | ||
makes no attempt to escape the percent sign (`%`). | ||
|
||
Input strings with ASCII control characters that cannot be escaped | ||
(NUL, CR, LF) will cause an `ArgumentError` exception. | ||
|
||
With an I/O stream parameter `io`, the result will be written there, | ||
rather than returned as a string. | ||
|
||
See also: [`escape_microsoft_c_args`](@ref), [`shell_escape_posixly`](@ref) | ||
|
||
# Example | ||
```jldoctest | ||
julia> Base.shell_escape_wincmd("a^\\"^o\\"^u\\"") | ||
"a^^\\"^o\\"^^u^\\"" | ||
``` | ||
""" | ||
function shell_escape_wincmd(io::IO, s::AbstractString) | ||
# https://stackoverflow.com/a/4095133/1990689 | ||
occursin(r"[\r\n\0]", s) && | ||
throw(ArgumentError("control character unsupported by CMD.EXE")) | ||
i = 1 | ||
len = ncodeunits(s) | ||
if len > 0 && s[1] == '@' | ||
mgkuhn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
write(io, '^') | ||
end | ||
while i <= len | ||
c = s[i] | ||
if c == '"' && (j = findnext('"', s, nextind(s,i))) !== nothing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully follow https://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts/4095133#4095133 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The purpose of The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the clarification. Perhaps we should add an additional warning or note to the docstring to something of this effect. In other words, adding to the docstring that this method makes no additional attempts at escaping other control-structure syntax that may require a variety of other, context-dependent escaping rules. Ultimately, I think we still want to use this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing after phase 2 on the linked page is related to command line parsing, and there's no escape character to avoid the problems that may get introduced in phase 1. After phase 2 it gets to command line interpretation, and each interpreter then may have their own rules that would be run prior to this (most commonly, The parsing rules are not complex at all, and fit quite compactly in this function. The rules are unusual and a bit awkward (because after the delimiter tokens are parsed, they may be left in the stream). But there's no reason to fear them.
No escaping is quite entirely possible on these to be safe, since
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By contrast, PowerShell's rules are terrifyingly complex to me, ill-defined, and have numerous known bugs (PowerShell/PowerShell#1995, PowerShell/PowerShell#3049, PowerShell/PowerShell-RFC#90). There may be a simple way through that mess, but given that those issues have been open for a number of years, I'm not particularly hopeful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @musm I prefer the current approach, where the docstring simply describes the algorithm implemented, without making any promises of what use cases it is or isn’t suitable for:
It is the responsibility of the user to understand The question is: do you want this version of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that document confused me more than helped to be honest, since it's describing what cmd does after receiving the stream. In any case, we aren't trying to parse the command line, but to escape characters to they get parsed correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes let's keep it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The purpose of documentation is not to be a design document or describe the implementation of an algorithm. The purpose is to describe the usage of it and what it promises. If this promise nothing, than it would not likely be a candidate for merging into the standard library. Fortunately, it can promise useful properties, so we do want to merge it. |
||
write(io, SubString(s,i,j)) | ||
i = j | ||
else | ||
if c in ('"', '(', ')', '!', '^', '<', '>', '&', '|') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't we missing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fine. I was looking https://ss64.com/nt/syntax-esc.html (which granted may not be 100% correct)
and then down the page
Which kind of worried me since, we do not uniformly "quote strings", only if they contain spaces (see https://github.com/JuliaLang/julia/pull/38352/files#diff-92d17ed4c3ab24cca65d43dd11a040f4f8e96378384fa143e2d418c5b07656d3R341) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Information online about cmd is hilariously, and almost notoriously bad, imo. That page contradict their own list in their example that follows immediately afterwards, in which they show that
|
||
write(io, '^', c) | ||
else | ||
# escape all backslashes and the following double quote | ||
c == '"' && (backslashes = backslashes * 2 + 1) | ||
for j = 1:backslashes | ||
# backslashes aren't special here | ||
write(io, '\\') | ||
end | ||
backslashes = 0 | ||
write(io, c) | ||
end | ||
end | ||
# escape all backslashes, letting the terminating double quote we add below to then be interpreted as a special char | ||
quotes && (backslashes *= 2) | ||
for j = 1:backslashes | ||
write(io, '\\') | ||
end | ||
quotes && write(io, '"') | ||
i = nextind(s,i) | ||
end | ||
return nothing | ||
end | ||
|
||
shell_escape_wincmd(s::AbstractString) = sprint(shell_escape_wincmd, s; | ||
sizehint = 2*sizeof(s)) | ||
|
||
""" | ||
shell_escaped_winsomely(args::Union{Cmd,AbstractString...})::String | ||
|
||
Convert the collection of strings `args` into single string suitable for passing as the argument | ||
string for a Windows command line. Windows passes the entire command line as a single string to | ||
the application (unlike POSIX systems, where the list of arguments are passed separately). | ||
Many Windows API applications (including julia.exe), use the conventions of the [Microsoft C | ||
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) to | ||
split that command line into a list of strings. This function implements the inverse of such a | ||
C runtime command-line parser. It joins command-line arguments to be passed to a Windows console | ||
application into a command line, escaping or quoting meta characters such as space, | ||
double quotes and backslash where needed. This may be useful in concert with the `windows_verbatim` | ||
flag to [`Cmd`](@ref) when constructing process pipelines. | ||
escape_microsoft_c_args(args::Union{Cmd,AbstractString...}) | ||
escape_microsoft_c_args(io::IO, args::Union{Cmd,AbstractString...}) | ||
|
||
# Example | ||
```jldoctest | ||
julia> println(shell_escaped_winsomely("A B\\", "C")) | ||
"A B\\" C | ||
Convert a collection of string arguments into a string that can be | ||
passed to many Windows command-line applications. | ||
|
||
Microsoft Windows passes the entire command line as a single string to | ||
the application (unlike POSIX systems, where the shell splits the | ||
command line into a list of arguments). Many Windows API applications | ||
(including julia.exe), use the conventions of the [Microsoft C/C++ | ||
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) | ||
to split that command line into a list of strings. | ||
|
||
This function implements an inverse for a parser compatible with these rules. | ||
It joins command-line arguments to be passed to a Windows | ||
C/C++/Julia application into a command line, escaping or quoting the | ||
meta characters space, TAB, double quote and backslash where needed. | ||
|
||
See also: [`shell_escape_wincmd`](@ref), [`escape_raw_string`](@ref) | ||
""" | ||
shell_escape_winsomely(args::AbstractString...) = | ||
sprint(print_shell_escaped_winsomely, args..., sizehint=(sum(length, args)) + 3*length(args)) | ||
function escape_microsoft_c_args(io::IO, args::AbstractString...) | ||
# http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES | ||
first = true | ||
for arg in args | ||
if first | ||
first = false | ||
else | ||
write(io, ' ') # separator | ||
end | ||
if isempty(arg) || occursin(r"[ \t\"]", arg) | ||
# Julia raw strings happen to use the same escaping convention | ||
# as the argv[] parser in Microsoft's C runtime library. | ||
write(io, '"') | ||
escape_raw_string(io, arg) | ||
musm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
write(io, '"') | ||
else | ||
write(io, arg) | ||
end | ||
end | ||
end | ||
escape_microsoft_c_args(args::AbstractString...) = | ||
sprint(escape_microsoft_c_args, args...; | ||
sizehint = (sum(sizeof.(args)) + 3*length(args))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional documentation for users that would like to understand further the guarantees, usage, and gotchas of this function:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I have not myself done the research needed to convince myself that this is all 100% true and accurate, may I suggest you make this extended documentation a separate follow-up PR under your name? This way, I don't end up with any
git blame
for security advice that I haven't personally written or verified.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are good points to bring up and help understand what's going on, but I'm not sure that the
!important
note is relevant to theshell_escape_wincmd
method? I think this would be good to add somewhere in the docs instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome to make it a separate commit or add Co-authored-by to this commit.
This is adding it to the docs. Where else, other than the documentation for
shell_escape_wincmd
, would you suggest putting clarifications on the correct usage ofshell_escape_wincmd
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgkuhn Do you feel cmofrtable with @vtjnash 's suggestion to add this as a commit with a co-authored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we squash and merge or do a merge commit if that is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer "rebase and merge" on github if a PR is tidy, but don't know what's most customary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a separate PR for the extended documentation. That would also give me time to run some tests on the claims first before I can provide meaningful feedback. (All my tests so far have been in the ssh scenario, i.e. preparing command-line invocations of programs such as julia.exe with arguments, environment variables, and pushd, as they would be typed into the console keyboard. I have not yet studied what happens or needs to happen with metacharacters when invoking cmd.exe with run, or what escaping mechanisms libuv already offers.)