Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPL shell mode for Windows #33474

Closed
wants to merge 4 commits into from
Closed

REPL shell mode for Windows #33474

wants to merge 4 commits into from

Conversation

musm
Copy link
Contributor

@musm musm commented Oct 4, 2019

Previous Attempts: #31490 #23703 #21294 #20776 #11478 #33427 #11478

Some of these approaches tried to implement special Windows rule in the shell_escape mechanism, I think some also tried to pass 'raw' strings to the shell, all with varying degrees of success.

Fixes #6316 #23597

References

Powershell:

CMD:

include("regex.jl")
include("shell.jl")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed since we use substitution string in the shell.jl file.

base/client.jl Outdated
@@ -31,8 +31,15 @@ stackframe_lineinfo_color() = repl_color("JULIA_STACKFRAME_LINEINFO_COLOR", :bol
stackframe_function_color() = repl_color("JULIA_STACKFRAME_FUNCTION_COLOR", :bold)

function repl_cmd(cmd, out)
shell = shell_split(get(ENV, "JULIA_SHELL", get(ENV, "SHELL", "/bin/sh")))
shell = get(ENV, "JULIA_SHELL", nothing)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache this value in a reference? So that we don't have to look it up each time the Julia REPL is entered?

command = Cmd(`$shell /c $(shell_escape_CMDly(shell_escape_winsomely(cmd)))`, windows_verbatim=true)
elseif shell_name in ("powershell", "pwsh")
command = Cmd(`$shell -Command $(shell_escape_PWSHly(shell_escape_winsomely(cmd)))`, windows_verbatim=true)
elseif shell_name == "busybox"
Copy link
Contributor Author

@musm musm Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary ? We don't even ship with busy box anymore...

I think this is from older julia code, so my suggestion is to just remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg .. looks like it is needed for testing on windows... maybe we should look into WSL for testing on windows instead of downloading busybox?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still something about this rubs me the wrong way. We should still try to remove this and try to work around them in the tests. Thoughts welcome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this the else block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "making this the else block"?
Do you mean :

           if shell_name == "cmd"
                command = _CMD_execute(shell, cmd)
            elseif shell_name in ("powershell", "pwsh")
                command = _powershell_execute(shell, cmd)
            else
                  command  =  `$shell sh -c $(shell_escape_posixly(cmd))`  

Is the assumption here that if you don't want cmd or powershell then you probably want it to default to use a linux-based shell running in a windows environment?

base/shell.jl Outdated
function print_shell_escaped_BATCHly(io::IO, arg::AbstractString)
# see https://www.robvanderwoude.com/variableexpansion.php
# the rule for ! is a bit esoteric, but doesn't hurt to add it
any(in("\r\n"), arg) && throw("Encountered unsupported character by CMD")
Copy link
Contributor Author

@musm musm Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an ArgumentError error? same for above

base/shell.jl Outdated

function print_shell_escaped_BATCHly(io::IO, arg::AbstractString)
# see https://www.robvanderwoude.com/variableexpansion.php
# the rule for ! is a bit esoteric, but doesn't hurt to add it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just because folks are bad at it. It seems clearly part of the list to me. Some sources list ( and ), but—as explained in https://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts—they are not needed. Nor is @, as is claimed by this otherwise fairly good source: http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCMDRULES

base/shell.jl Outdated Show resolved Hide resolved
@musm musm force-pushed the shellmodewin branch 5 times, most recently from 5e5694a to 083cb4a Compare October 11, 2019 23:35
@ViralBShah ViralBShah added the system:windows Affects only Windows label Oct 12, 2019
@musm musm changed the title REPL shell mode for Windows REPL shell mode for Windows WIP Oct 12, 2019
@musm musm changed the title REPL shell mode for Windows WIP REPL shell mode for Windows Oct 14, 2019
base/shell.jl Outdated

function print_shell_escaped_PWSHly(io::IO, arg::AbstractString)
arg = replace(arg, r"[\$#;]" => s"`\0")
arg = replace(arg, r"[\"|\\\\(?=\\\\*(\"|$))]" => s"\\\0")
Copy link
Contributor Author

@musm musm Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/44981177/run-powershell-custom-function-in-new-window/44986341#44986341

It seems like since we are launching the process from a start-process we need to additionally escaping " and \ using this regex, which is described in the excellent stackoverflow answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That left me even more confused. It ends with but it’s not this simple, and the rules have changed over time.

One thing I’m fairly confident of is that this pwsh function needs us to add \r and \n to the winsomely whitespace list to pass through correctly here (no effect if it doesn’t go to pwsh, so might as well always do it there).

Copy link
Contributor Author

@musm musm Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I’m fairly confident of is that this pwsh function needs us to add \r and \n to the winsomely whitespace list to pass through correctly here (no effect if it doesn’t go to pwsh, so might as well always do it there).

Why's that?

Copy link
Contributor Author

@musm musm Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we needed powershell to work for this example

shell> gci C:\\Program\ Files

Note

julia> Base.shell_escape_winsomely(Base.shell_split("gci C:\\\\Program\\ Files")...) |>print
gci "C:\Program Files"

Running
gci "C:\Program Files" in a powershell environment works. But not in julia unless we
do the additional escaping so that powershell interprets the command as a single string.
arg = replace(arg, r"[\"|\\\\(?=\\\\*(\"|$))]" => s"\\\0")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash Please see

The string needs escaping, however, in order to be passed to the new Powershell process unmodified.
" instances embedded in the string are stripped, unless they are either represented as " or tripled (""").
(" rather than the usual `" is needed to escape double quotes in this case, because PowerShell expects " when passing a command to the powershell.exe executable.)

I feel confident that we now have a correct escaping solution to powershell and also for CMD

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what your source was for gci, so I wrote a simple one to be able to inspect what reaches the target application. Save as gci.c, compile with cc gci.c, then run as a.exe.

#include <windows.h>
#include <stdio.h>
int main (int argc, char **argv) {
  printf("%s\n", GetCommandLineA());
  for (int i = 0; i < argc; i++) { printf("%d: %s\n", i, argv[i]); }
}

base/shell.jl Outdated
shell_escape_CMDly(arg::AbstractString) = sprint(print_shell_escaped_CMDly, arg)

function print_shell_escaped_PWSHly(io::IO, arg::AbstractString)
arg = replace(arg, r"[\$#;]" => s"`\0")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\$ prevents powershell variables, ; prevents executing two commands, # escapes comment character

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arg = replace(arg, r"[\$#;]" => s"`\0")
arg = replace(arg, r"[`\"\$#;|><&(){}=]" => s"`\0")

(eg. see expanded list at https://www.red-gate.com/simple-talk/wp-content/uploads/imported/2264-PS%20Quoting%20Wallchart_1_0_1.pdf, although I haven't yet found the case where = is special, the rest are certainly good here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I think it additionally also needs "\r" => "`r", "\n" => "`n", "\t" => "`t"

@musm
Copy link
Contributor Author

musm commented Oct 14, 2019

There was a bug here when someone passed ENV["JULIA_SHELL"] = "", which is fixed here by also defaulting this case to either sh or cmd.

@musm
Copy link
Contributor Author

musm commented Oct 14, 2019

@vtjnash how does this PR look now?

@musm
Copy link
Contributor Author

musm commented Oct 15, 2019

Ok so I feel like I have finally converged to a solution here, and this should be ready for final review.

@musm
Copy link
Contributor Author

musm commented Oct 15, 2019

My only question to @vtjnash is why do you feel like we also need to quote \r and \n in the print_shell_escaped_winsomely function?

@musm
Copy link
Contributor Author

musm commented Oct 15, 2019

Should we default to CMD on Windows OR powershell with -noProfile ?

base/shell.jl Outdated
shell_escape_CMDly(arg::AbstractString) = sprint(print_shell_escaped_CMDly, arg)

function print_shell_escaped_PWSHly(io::IO, arg::AbstractString)
arg = replace(arg, r"[\$#;]" => s"`\0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arg = replace(arg, r"[\$#;]" => s"`\0")
arg = replace(arg, r"[`\"\$#;|><&(){}=]" => s"`\0")

(eg. see expanded list at https://www.red-gate.com/simple-talk/wp-content/uploads/imported/2264-PS%20Quoting%20Wallchart_1_0_1.pdf, although I haven't yet found the case where = is special, the rest are certainly good here)

base/client.jl Outdated
command = Cmd(`$shell /c $(shell_escape_CMDly(shell_escape_winsomely(cmd)))`, windows_verbatim=true)
elseif shell_name in ("powershell", "pwsh")
# enclose in double quotes to enusre we prevent whitespace normalization
command = Cmd(`$shell -Command "$(shell_escape_PWSHly(shell_escape_winsomely(cmd)))"`, windows_verbatim=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command = Cmd(`$shell -Command "$(shell_escape_PWSHly(shell_escape_winsomely(cmd)))"`, windows_verbatim=true)
command = Cmd(`$shell -Command "& $(shell_escape_PWSHly(shell_escape_winsomely(cmd)))"`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is only correct so long as powershell is a no-op though. If something is getting intercepted by powershell, we need to use completely different rules. Something like:

Suggested change
command = Cmd(`$shell -Command "$(shell_escape_PWSHly(shell_escape_winsomely(cmd)))"`, windows_verbatim=true)
CommandType = read(`powershell -Command "Get-Command -- $(shell_escape_PWSH_cmdlet_ly(cmd.exec[1])) | Select-Object -ExpandProperty CommandType"`, String)
# TODO: while CommandType == "Alias"; CommandType = ...; end
if CommandType == "Application"
command = Cmd(`$shell -Command "& $(shell_escape_PWSHly(shell_escape_winsomely(cmd)))"`)
else # handle Function and Cmdlet # TODO: what is the proper handling for the other types (ExternalScript, Script, Workflow, Configuration, and Filter)
command = Cmd(`$shell -Command "& $(shell_escape_PWSH_cmdlet_ly(cmd))"`)
end

base/shell.jl Outdated
# Similarily, if the string as a whole or a double-quoted string inside the function body
# ends in a non-empty run of \, that \ would be interpreted
# as a an escape character so \ must be doubled
arg = replace(arg, r"\"|\\(?=\\*(\"|$))" => s"\\\0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all mistaken and false. I think I understand where it's coming from, it just doesn't match the behavior I observe in my testing for this stage in the pipeline.

Suggested change
arg = replace(arg, r"\"|\\(?=\\*(\"|$))" => s"\\\0")


See also [`shell_escape_CMDly`](@ref).
"""
shell_escape_PWSHly(arg::AbstractString) = sprint(print_shell_escaped_PWSHly, arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my suggested source for the PWSH_cmdlet_ly function I mention above:

Suggested change
shell_escape_PWSHly(arg::AbstractString) = sprint(print_shell_escaped_PWSHly, arg)
shell_escape_PWSHly(arg::AbstractString) = sprint(print_shell_escaped_PWSHly, arg)
function print_shell_escaped_PWSH_cmdlet_ly(io::IO, args::AbstractString...)
# often the shortest way to escape a powershell string is to double any single quotes and then wrap the whole thing in single quotes (alternatively, we could prefix all the non-word characters with a back-tick and replace newlines with \`r and \`n)
# but skip the escaping for common cases we always know are safe (e.g. so that named parameters are typically still interpreted correctly)
isword(c::AbstractChar) = '0' <= c <= '9' || 'a' <= c <= 'z' || 'A' <= c <= 'Z' || c == '_' || c == '\\' || c == ':' || c == '/' || c == '-'
join(io, (all(isword, arg) ? arg : "'" * replace(arg, "'" => "''") * "'" for arg in args), " ")
end
"""
shell_escape_PWSH_cmdlet_ly(args::AbstractString...) -> String
Escapes special characters so they can be appropriately used with a PowerShell cmdlet (such as `echo`).
See also [`shell_escape_PWSHly`](@ref) and [`shell_escape_winsomely`](@ref).
"""
print_shell_escaped_PWSH_cmdlet_ly(io::IO, args::AbstractString...)

base/shell.jl Outdated
shell_escape_CMDly(arg::AbstractString) = sprint(print_shell_escaped_CMDly, arg)

function print_shell_escaped_PWSHly(io::IO, arg::AbstractString)
arg = replace(arg, r"[\$#;]" => s"`\0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I think it additionally also needs "\r" => "`r", "\n" => "`n", "\t" => "`t"

base/shell.jl Outdated

function print_shell_escaped_PWSHly(io::IO, arg::AbstractString)
arg = replace(arg, r"[\$#;]" => s"`\0")
arg = replace(arg, r"[\"|\\\\(?=\\\\*(\"|$))]" => s"\\\0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what your source was for gci, so I wrote a simple one to be able to inspect what reaches the target application. Save as gci.c, compile with cc gci.c, then run as a.exe.

#include <windows.h>
#include <stdio.h>
int main (int argc, char **argv) {
  printf("%s\n", GetCommandLineA());
  for (int i = 0; i < argc; i++) { printf("%d: %s\n", i, argv[i]); }
}

@musm
Copy link
Contributor Author

musm commented Oct 21, 2019

We should probably also allow pwsh on Linux and MacOS since Powershell Core is cross-platform

@musm musm force-pushed the shellmodewin branch 4 times, most recently from 6b46b98 to 0dfcedd Compare October 21, 2019 19:25
@musm musm force-pushed the shellmodewin branch 2 times, most recently from 8f2b093 to 164f274 Compare October 21, 2019 19:29
Introduce REPL shell mode on Windows.

Co-authored-by:    Mustafa M. <[email protected]>
Co-authored-by:    Jameson Nash <[email protected]>
if shell_name == "fish"
shell_escape_cmd = "begin; $(shell_escape_posixly(cmd)); and true; end"
command = `$shell -c $shell_escape_cmd`
elseif shell_name == "pwsh"
command = _powershell_execute(shell, cmd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this do the wrong escaping on Unix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not sure it's worth the effort to support pwsh on Unix. This is only used for interactive mode, after all.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Powershell's handling of arguments is known (PowerShell/PowerShell#1995) to be buggy, so can't reliably be used for launching programs anyways.

Copy link
Member

@stevengj stevengj Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash, that's argument handling on Windows. Presumably on Unix systems pwsh just uses *argv, no? I guess some amount of escaping is still needed for passing arguments through to subprocesses, but possibly the same problem does not appear on Unix because there is no CommandLineToArgvW layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no (e.g. PowerShell/PowerShell#10440)

no CommandLineToArgvW layer

Sadly, it is currently present (although both broken and undocumented—and eventually supposed to be unnecessary)

@musm
Copy link
Contributor Author

musm commented Dec 10, 2019

should we try to get this in, in some form before feature freze?

@StefanKarpinski
Copy link
Member

It is ready for re-review by @vtjnash and/or @stevengj then?

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove all of the powershell-related content? The rest I think seems ready to go. We should add the powershell portions in a pair of followup PRs (one to add the shell_escape_PWSHly layer and one to use it in the REPL)

command = Cmd(`$shell /c $(shell_escape_CMDly(shell_escape_winsomely(cmd)))`, windows_verbatim=true)
elseif shell_name in ("powershell", "pwsh")
command = Cmd(`$shell -Command $(shell_escape_PWSHly(shell_escape_winsomely(cmd)))`, windows_verbatim=true)
elseif shell_name == "busybox"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this the else block

run(Cmd(Cmd(["cmd /c \"break | echo \$(shell_escape_CMDly(shell_escape_CMDly(to_print)))"]), windows_verbatim=true))
```
"""
shell_escape_CMDly(arg::AbstractString) = sprint(print_shell_escaped_CMDly, arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to need tests. Also, can you split this aspect into a separate commit?

@musm
Copy link
Contributor Author

musm commented Dec 11, 2019

Can you remove all of the powershell-related content? The rest I think seems ready to go. We should add the powershell portions in a pair of followup PRs (one to add the shell_escape_PWSHly layer and one to use it in the REPL)

I think I'll just open a seperate PR with the suggestions

function print_shell_escaped_CMDly(io::IO, arg::AbstractString)
any(c -> c in ('\r', '\n'), arg) && throw(ArgumentError("Encountered unsupported character by CMD."))
# include " so to avoid toggling behavior of ^
arg = replace(arg, r"[%!^\"<>&|]" => s"^\0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arg = replace(arg, r"[%!^\"<>&|]" => s"^\0")
arg = replace(arg, r"[%)!^\"<>&|]" => s"^\0")

Upon closer inspection of https://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts/4095133#4095133, I realized that ) is a metacharacter even though ( is not. (or we can include both here for symmetry)

@musm
Copy link
Contributor Author

musm commented Jul 16, 2020

I'm looking back at this PR and I really dread to revisit this PR 😆😆.

PowerShell's handling of external arguments is so borked that the only reasonable solution is to use the iep script in PowerShell/PowerShell#1995 (comment) and for newer Powershell versions wait for PowerShell/PowerShell#13068 to be merged

@StefanKarpinski
Copy link
Member

Counter-proposal: do the same thing on Windows as we do everywhere else, i.e. just exec the cmd that's entered, but ship with busybox tools on Windows and put the path to the busybox tools in the PATH before evaluating shell mode commands. I feel like that will make many examples work on Windows in the same way they do on UNIX and doesn't require radically changing the way this works on only one platform.

Also: It would be great (and has been on my todo list for so long) to implement | and other POSIX shell features in Cmd syntax, in which case the shell mode would automatically get completely cross-platform support for a lot of shell-like features. Combine that with having the busybox tools on Windows and you might not even notice that you're not running on UNIX!

@stevengj
Copy link
Member

ship with busybox tools on Windows and put the path to the busybox tools in the PATH before evaluating shell mode commands.

Or just put the path to busybox tools in the PATH for all Cmd evaluations?

@musm
Copy link
Contributor Author

musm commented Apr 28, 2021

Closing as stale. I'll look at opening a new PR when I find a more satisfactory solution. Shipping busybox is reverting back to something that we removed and suboptimal (even though what we have currently is worse). Powershell folks have been trying to work around some issues with calling external commands. We updated some the cmd escaping rules, so I think we can update to that in another PR. We might have to limit the powershell option to newer powershell versions and perhaps use bash through WSL if that's available.

@musm musm closed this Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cannot shell out to cmd's builtin commands
5 participants