Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the standard library implementation on Windows #17

Closed
ChrisDenton opened this issue Jun 20, 2021 · 3 comments
Closed

Improving the standard library implementation on Windows #17

ChrisDenton opened this issue Jun 20, 2021 · 3 comments
Labels
mcp A Major Change Proposal (lightweight RFC) T-libs-api

Comments

@ChrisDenton
Copy link
Member

Recently I've been looking at ways I can improve the standard library implementation on Windows.

There are many issues which have been open for a long time which I believe I can help to fix but I was worried about the disruption doing so may cause. @m-ou-se suggested I create an issue here documenting my current plans in case there were anything the libs team felt needed discussing. Or at least to give everyone a heads up.


A big chunk of the open Windows issues fall into one of two categories:

  • std::env and std::process. Specifically Command but also argument parsing (and #85270 fixes an issue with environment variables). I recently made two other PRs (#86406 and #86467) to address a couple of smaller issues with Command.
  • File paths. This is often an issue because Rust's understanding of Windows paths is limited. Which, to be fair, is quite an involved topic.

There are other issues particularly within std::fs (e.g. #81357 and #29497) and I've suggested a workaround (#86447) for a driver problem that is the root cause of a number of issues.

But anyway, back to the two categories mentioned above.

std::{env, process}

Argument parsing

In Windows, arguments are passed to a program as a string. This string is commonly parsed by the C runtime (CRT) in order to recover the arguments. In 2008 Microsoft subtly changed the way the CRT parses arguments. However, Rust still uses the old method. The change is small, it's centred around how consecutive double-quotes (e.g. "") are handled when encountered in a quoted block. See #44650. Fixing this should be fairly easy but is it technically a breaking a change?


Command uses the CRT rules to build an argument string to pass to an application. However, not every application parses commands according to these rules. Currently there's no good way to bypass Rust's handling. #85832 (by kornelski) seeks to address this. It might also be good to be able to override the zeroth argument, which by convention is set to the name of the binary being run but it needn't be. But I think both of these would be purely additions to the library and not change existing behaviour.

Command executable search

The way Command finds executables is weird. By default it uses the CreateProcessW search sequence. However if you use Commad::env at all then it first attempts to emulate Linux's behaviour, which is rather surprising. See #37519 and #50870. The specific implementation of that could be improved but I question if it's a good idea in the first place.

Also I'm not sure that the default search sequence is good. On internals, a CVE was discussed about Command looking for executables in the current directory. The problem is that CreateProcessW itself is, as with most of the Windows API, concerned with backwards compatibility rather than doing the "best" thing out of the box. Fortunately it provides for using a path to an executable instead of relying on its search.

It was brought up in that thread that the way of resolving the executable name could be made configurable, which is a good idea. However, I still think changing the defaults might be good. But it might be disruptive.


There are also a number of other issues, mostly around child processes inheriting handles from the parent (#70719, #67616, #54760 and #38227. But I have not looked too deeply into these yet.

File paths

Most of the open issues here centre around the legacy MAX_PATH limits and verbatim paths (aka \\?\ prefixed paths) which allow bypassing the limit.

  • #77576: rust-lang/rust CI build failed when generating test file paths exceeding about 260 chars on Windows
  • #76586: fs::write fails on long file name in deep directory path on windows
  • #67403: fs::copy on Windows has a 260 character filename limit
  • #32689: Automatically insert \\?\ where needed on Windows
  • #31789: Windows paths should not be prefixed with \\?\ when displayed
  • #42869: std::fs::canonicalize returns UNC paths on Windows, and a lot of software doesn't support UNC paths
  • #56030: Support path prefix which refers to per-user DosDevices ("\??\C:\...")
  • #49342: Windows: success of canonicalizing r"\" depends on whether set_current_dir has been called

Ideally verbatim paths would be purely an implementation detail that's not normally visible unless necessary. The std would use whatever path type makes most sense in the context, unless explicitly asked to do otherwise. I'm not sure if Rust can reach that ideal or not (at least not unless there's a std 2.0). What I think it can do is handle verbatim paths a little bit better.

In Win32, resolving . and .. paths is done purely lexically. That is, the path is made absolute without ever touching the filesystem. The Win32 APIs will usually do this for you. However, verbatim paths are passed to the kernel (almost) verbatim. So . and .. are never resolved and are instead treated as if they were actual files or directories (which they're not, according to Windows filesystem drivers). This has consequences for Rust. What should happen when using Path::push on such paths? Should it automatically resolve the path as you push? Should it panic? Should it resolve them at the point of use (e.g. fs::File::open)? What if the user really really did want to pass a .. component to the kernel? (note: I can't think why anyone would but it is hypothetically possible that some device could use such components in some way.)

There is also a problem with /. The Win32 API will automatically convert it to \ before passing the path to the kernel... unless verbatim paths are used. In that case it's again passed to the kernel verbatim, where it will be treated like any other character in a file name instead of being a path separator.

I'm saying all this not to find answers here (internals or the main repo would be a better venue for discussing the particulars) but to give a flavour of the changes that may be required to handle Windows paths better. But I worry that such improvements may mean breaking the documented behaviour of Path, if only in small ways. That said, I am looking for ways to improve the implementation that have much less risk of being disruptive.

@m-ou-se m-ou-se added I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. T-libs-api labels Jun 22, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jun 22, 2021

cc @rust-lang/libs-api

@m-ou-se
Copy link
Member

m-ou-se commented Jul 4, 2021

Thanks a lot for the detailed write up!

We discussed this in our meeting last week, and while we can't make a decision on every single one of these issues at once, we do generally feel positive about fixing these, even if they might cause some theoretical breakage.

A key point to keep in mind is that in Rust we care more about actual breakage than about theoretical breakage. So some change being observable isn't much of a problem when the change doesn't break (or maybe even improves) existing code.

Of course, we can't see or test all Rust code in existence, so such changes would need some reasoning why the change (most likely) won't cause actual breakage before we can merge it. We figured that in some of these cases, that reasoning might be based on input from Microsoft itself. For example, Microsoft already changed the CRT command line argument parsing, so there must be some data/experience available about how much was broken (or fixed) by that.

(There's a group of people who signed up specifically to provide input on Windows-specific issues. On the rust-lang/rust repo, they can be pinged with @rustbot ping windows. Most such issues are about implementation details, and not about public API, though.)

at least not unless there's a std 2.0

I don't see anything like that happen any time soon. So if there's any way to fix such issues by changing 'std 1' without breaking a lot, that's absolutely worth discussing.

But I worry that such improvements may mean breaking the documented behaviour of Path, if only in small ways. That said, I am looking for ways to improve the implementation that have much less risk of being disruptive.

We do sometimes change documented promises if we think the risk of breakage/disruption is very low. (Here's an example: rust-lang/rust#76932) Many changes to the standard library are technically breaking changes in subtle ways, but as long as we're careful, that usually works out okay.

@m-ou-se m-ou-se removed the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Jul 4, 2021
@ChrisDenton
Copy link
Member Author

Thanks for the thorough response!

My worry was partly that knowing the difference between theoretical and actual breakage can sometimes be challenging. But I feel you've addressed that sufficiently.

We do sometimes change documented promises if we think the risk of breakage/disruption is very low.

This is very useful to know. I do plan on being cautious about breaking documented behaviour but it's good to know that there's some leeway there. In some places it may be enough to add documentation on how Windows and *nix differ in important ways.

@m-ou-se m-ou-se added the mcp A Major Change Proposal (lightweight RFC) label Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mcp A Major Change Proposal (lightweight RFC) T-libs-api
Projects
None yet
Development

No branches or pull requests

2 participants