Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly support for Windows long file paths through Unicode Win32 API calls. #2410

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

digit-google
Copy link
Contributor

This is an attempt to properly support Windows long file paths by only using Win32 Unicode API calls, and getting rid of the ANSI ones (some of which are still plagued by MAX_PATH limitations, even when long paths support is enabled on the machine).

This is achieved in the following ways (see individual commits for details):

  • Add UTF8ToWin32Unicode() and Win32UnicodeToUTF8() functions to util.h

  • For simplicity, introduce SystemDiskInterface and NullDiskInterface, and make RealDiskInterface a derived class of SystemDiskInterface that adds a caching layer.
    (this opens the door to other caching implementations on Posix).

  • Augment the DiskInterface API to add RenameFile() and OpenFile() methods to ... rename a file (just like rename() / _rename()), and open an stdio FILE instance (just like fopen()).

  • Modify the DiskInterface implementation to only use Unicode Win32 APIs.

  • Modify sources that rely on direct calls to fopen(), unlink(), rename() to use a DiskInterface instance. This requires injecting such an instance in the constructors of BuildLog and DepsLog btw.

  • Modify other misc Win32-specific sources to use Win32 Unicode API calls directly (e.g. subprocess-win32.cc)

NOTE: VirtualFileSystem::OpenFile() only supports read access for now. Unlike its real SystemDiskInterface implementation, it doesn't perform \r\n to \n conversion on Windows (do we really want this?). That is why many tests (e.g. build_log_test.cc) do not use it, but rely on a real DiskInterface instance. This can be fixed by another PR after this one.

This should fix issue #1900 once and for all (crossing fingers here :-))

@JRStolle
Copy link

I tried this fork, but I still get the ninja error "Filename longer than 260 characters" on my jenkins when compiling my cmake project.
Is there any way, I could supply some more meaningful debug-infos for this?

@marius-alex-tache
Copy link

I've also tested this in an attempt to fix the issue when building a gn project, but it doesn't work. Same question as @JRStolle.

@markh-discord
Copy link

During early testing this fix seems to work for us (although other tooling outside of ninja may eventually fail too). However we did bump into an issue where many dependency targets were incorrectly re-run after the first invocation.

I added a fix on our side to resolve: discord@c5996dc

@digit-google
Copy link
Contributor Author

Thank you @markh-discord , I rebased and integrated a similar fix for the weird GetFullPathnameW behavior. Let me know if this works for you.

@jhasse, would you prefer to split this PR into a series of smaller ones for easier review?

@digit-google digit-google force-pushed the win32-paths branch 2 times, most recently from 947ed1e to 8d3ed14 Compare October 7, 2024 07:40
@jhasse
Copy link
Collaborator

jhasse commented Oct 8, 2024

As I've said in a previous PR: I'm actually very much against this change.

To convince me I would like to know exactly in which cases the current solution (UTF-8 and largeFileSupport in manifest) does NOT work?

@jochenmuell
Copy link

To convince me I would like to know exactly in which cases the current solution (UTF-8 and largeFileSupport in manifest) does NOT work?

For me, it is the Qt build using clang. The Qt build has very long paths in the clang-cl output. The output seems to be handled using the max path length restrictions for the maximum buffer size of a path.

@digit-google
Copy link
Contributor Author

I think we need exact reproduction steps that confirm that the problem is with Ninja and not some other tooling.
Can you provide these for your Qt build example, with exact details on how you setup your workspace, and what goes wrong?

UTF8ToWin32Unicode() converts an UTF-8 string into the
corresponding Windows UCS-2 one, and will be used
to call Unicode Win32 API directly, in order to
properly support long file paths everywhere.

Win32UnicodeToUTF8() does the opposite, and will
generally be used to display such native Win32
paths in error messages.
Add an intermediate DiskInterface derived class named
SystemDiskInterface that performs real disk i/o without
any intermediate caching.

RealDiskInterface is now a derived class of
SystemDiskInterface that adds a caching layer on
Win32 only. Keeping the same name reduces the number
of changes in this commit to the strict minimum.

NullDiskInterface is provided to simplify custom
DiskInterface implementations used in tests, in
particular because new DiskInterface methods are
going to be added in future commits, and adding
the same method overrides NullDiskInterface will
lower the changes to do in those implementations.
Ensure Win32 Unicode APIs are called when trying to
perform real disk i/o. This is necessary to properly
support long file paths on Windows.

For example, FindFirstFileExW() used in RealDiskInterface
has not MAC_PATH restriction, though FindFirstFileExA()
still does, even when long path support is enabled
on the host machine.

Note that generally speaking, it is unknown whether Windows
CRT functions, such as fopen(), rename(), _unlink(), always
properly support long file paths (it may very well depend
on the version of MSVCRT linked to the executable), so
it is better to err on the side of caution and always
try to use the wide-char versions of these functions,
when available, or to fall back to Unicode Win32 API
functions otherwise.

This applies here and the commits following this one
in the same pull request.
Use CreateFileW() to ensure proper long path support on Win32.
+ Allow it to compile with cross-toolchains like Mingw.
Make the BuildLog class take a DiskInterface
reference in its constructor, to ensure that all
i/o operations use the same interface.

+ Adjust call sites accordingly. For tests, always
  use a SystemDiskInterface instead of the VirtualFileSystem
  instance when the latter is available, as this is
  exactly what the previous code was doing.
@maxhmv
Copy link

maxhmv commented Feb 6, 2025

Are there any news on this PR?

We are running into the "path too long" error on a system where the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled registry entry is set to 1. We also compiled ninja ourselves to confirm that the manifest file windows/ninja.manifest is used and sets the correct property (as also explained here) - which is the case.

The error is triggered when msvc style dependencies are scanned. In this case https://github.com/ninja-build/ninja/blob/master/src/clparser.cc#L103 IncludesNormalize::Normalize() is called, and there the length is checked against _MAX_PATH leading to said error.

From our understanding @jhasse _MAX_PATH is fixed in windows.h and always set to 260, independent of any settings for long paths (Registry and Manifest). We also checked the SDK for alternative definitions of _MAX_PATH via rg -uuu "define\s*_?MAX_PATH" "c:\Program Files (x86)\Windows Kits\10" and did not find other occurrences. This brings us to the conclusion that you can only safely navigate around the Maximum File Path Limitation if you don't explicitly use _MAX_PATH but solely rely on the functions without MAX_PATH restrictions. So if this is correct and as long as checks in includes_normalize-win32.cc are performed explicitly against _MAX_PATH this will always potentially cause said errors.

Hence we are really looking forward to this PR to end up upstream. If there is something I can do to help in this matter please let me know.

Thank you for your work!

@jhasse
Copy link
Collaborator

jhasse commented Feb 6, 2025

I wasn't aware of that restriction in includes_normalize-win32.cc. I would merge a PR that adds a check for the return value of GetFullPathNameA to see if the buffer should be enlarged and therefore removes the usage of _MAX_PATH (no more fixed sized stack arrays for the buffer).

@maxhmv
Copy link

maxhmv commented Feb 7, 2025

@jhasse Ok that sounds good. Just a question because I don't fully understand which part of this PR you are opposed to. From my understanding fixing the issue with _MAX_PATH in includes_normalize-win32.cc would more or less require all the changes in includes_normalize-win32.cc @digit-google is suggesting here. Is there something you are opposed to in these changes?

Apart of _MAX_PATH there is also _MAX_DIR which also seems to be limited to 256 and could potentially cause problems. Should we fix this as well?

Thanks you!

@jhasse
Copy link
Collaborator

jhasse commented Feb 7, 2025

Is there something you are opposed to in these changes?

I don't want to use UTF-16 versions of the WinAPI like GetFullPathNameW, but keep the UTF-8 versions like GetFullPathNameA.

Apart of _MAX_PATH there is also _MAX_DIR which also seems to be limited to 256 and could potentially cause problems. Should we fix this as well?

Yes, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants