Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99442: Fix handling in py.exe launcher when argv[0] does not include a file extension #99542

Merged
merged 2 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion Lib/test/test_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def find_py(cls):
errors="ignore",
) as p:
p.stdin.close()
version = next(p.stdout).splitlines()[0].rpartition(" ")[2]
version = next(p.stdout, "\n").splitlines()[0].rpartition(" ")[2]
p.stdout.read()
p.wait(10)
if not sys.version.startswith(version):
Expand Down Expand Up @@ -467,6 +467,15 @@ def test_py3_default_env(self):
self.assertEqual("3.100-arm64", data["SearchInfo.tag"])
self.assertEqual("X.Y-arm64.exe -X fake_arg_for_test -arg", data["stdout"].strip())

def test_py_default_short_argv0(self):
with self.py_ini(TEST_PY_COMMANDS):
for argv0 in ['"py.exe"', 'py.exe', '"py"', 'py']:
with self.subTest(argv0):
data = self.run_py(["--version"], argv=f'{argv0} --version')
self.assertEqual("PythonTestSuite", data["SearchInfo.company"])
self.assertEqual("3.100", data["SearchInfo.tag"])
self.assertEqual(f'X.Y.exe --version', data["stdout"].strip())

def test_py_default_in_list(self):
data = self.run_py(["-0"], env=TEST_PY_ENV)
default = None
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fix handling in :ref:`launcher` when ``argv[0]`` does not include a file
extension.
82 changes: 32 additions & 50 deletions PC/launcher2.c
Original file line number Diff line number Diff line change
Expand Up @@ -491,62 +491,39 @@ dumpSearchInfo(SearchInfo *search)


int
findArgumentLength(const wchar_t *buffer, int bufferLength)
findArgv0Length(const wchar_t *buffer, int bufferLength)
{
if (bufferLength < 0) {
bufferLength = (int)wcsnlen_s(buffer, MAXLEN);
}
if (bufferLength == 0) {
return 0;
}
const wchar_t *end;
int i;

if (buffer[0] != L'"') {
end = wcschr(buffer, L' ');
if (!end) {
return bufferLength;
}
i = (int)(end - buffer);
return i < bufferLength ? i : bufferLength;
}

i = 0;
while (i < bufferLength) {
end = wcschr(&buffer[i + 1], L'"');
if (!end) {
return bufferLength;
}

i = (int)(end - buffer);
if (i >= bufferLength) {
return bufferLength;
}

int j = i;
while (j > 1 && buffer[--j] == L'\\') {
if (j > 0 && buffer[--j] == L'\\') {
// Even number, so back up and keep counting
} else {
// Odd number, so it's escaped and we want to keep searching
continue;
// Note: this implements semantics that are only valid for argv0.
// Specifically, there is no escaping of quotes, and quotes within
// the argument have no effect. A quoted argv0 must start and end
// with a double quote character; otherwise, it ends at the first
// ' ' or '\t'.
Comment on lines +498 to +500

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, any part of argv0 may be quoted. E.g. C:\"path to"\"py.exe" is parsed as C:\path to\py.exe by the crt. Being quoted just prevents tab or space from terminating the argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the associated bug (which I just fixed, because I put the wrong ID in the title originally).

We're following CreateProcess rules here, because it's going to be passed to CreateProcess.

Copy link
Contributor

@eryksun eryksun Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisDenton, The C runtime's more general parsing of argv[0] (i.e. multiple quoted parts terminated by space or tab) is only relevant if the executable path is passed explicitly to CreateProcessW() in lpApplicationName. If CreateProcessW() is called with lpApplicationName as NULL, as the launcher does, then the API parses the command to run from lpCommandLine. If the command line begins with a double quote, the API consumes all characters up to the next double quote or the end of the command line. If the command line doesn't begin with a double quote, the API tokenizes on space and tab characters, repeatedly searching until it finds a non-directory file.

For example, if the command line is r'C:\Program Files\Python311\python.exe', the API will execute "C:\Program.exe" if it exists. If the command line is r'"C:\Program Files\Python311\python.exe"spam', the API will execute "C:\Program Files\Python311\python.exe" if it exists. Note that there's an inconsistency with the C runtime's quoting rules. The CRT always splits arguments on an unquoted space or tab. For example:

>>> script = 'import sys; print(sys.orig_argv)'
>>> subprocess.call(fr'"{sys.executable}"spam -c "{script}"')
['C:\\Program Files\\Python311\\python.exespam', '-c', 'import sys; print(sys.orig_argv)']
0

The API executed "python.exe", not "python.exespam".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that inconsistency is what worried me. As far as I understand it, no normalization is being performed in the python code so the inconsistency remains? So the CRT and CreateProcessW will be understanding the arguments differently?

int quoted = buffer[0] == L'"';
for (int i = 1; bufferLength < 0 || i < bufferLength; ++i) {
switch (buffer[i]) {
case L'\0':
return i;
case L' ':
case L'\t':
if (!quoted) {
return i;
}
}

// Non-escaped quote with space after it - end of the argument!
if (i + 1 >= bufferLength || isspace(buffer[i + 1])) {
return i + 1;
break;
case L'"':
if (quoted) {
return i + 1;
}
break;
}
}

return bufferLength;
}


const wchar_t *
findArgumentEnd(const wchar_t *buffer, int bufferLength)
findArgv0End(const wchar_t *buffer, int bufferLength)
{
return &buffer[findArgumentLength(buffer, bufferLength)];
return &buffer[findArgv0Length(buffer, bufferLength)];
}


Expand All @@ -562,11 +539,16 @@ parseCommandLine(SearchInfo *search)
return RC_NO_COMMANDLINE;
}

const wchar_t *tail = findArgumentEnd(search->originalCmdLine, -1);
const wchar_t *end = tail;
search->restOfCmdLine = tail;
const wchar_t *argv0End = findArgv0End(search->originalCmdLine, -1);
const wchar_t *tail = argv0End; // will be start of the executable name
const wchar_t *end = argv0End; // will be end of the executable name
search->restOfCmdLine = argv0End; // will be first space after argv0
while (--tail != search->originalCmdLine) {
if (*tail == L'.' && end == search->restOfCmdLine) {
if (*tail == L'"' && end == argv0End) {
// Move the "end" up to the quote, so we also allow moving for
// a period later on.
end = argv0End = tail;
} else if (*tail == L'.' && end == argv0End) {
end = tail;
} else if (*tail == L'\\' || *tail == L'/') {
++tail;
Expand Down