-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pigz when available for faster tar.gz #15038
Conversation
@@ -94,7 +94,7 @@ | |||
Overwrite="true" | |||
DestinationFile="$(_DestinationFileName)" | |||
Condition="'$(ArchiveFormat)' == 'zip'"/> | |||
<Exec Command="tar -C '$(_OutputPathRoot)' -czf $(_DestinationFileName) ." | |||
<Exec Command="command -v pigz >/dev/null 2>&1 && tar -C '$(_OutputPathRoot)' -cf - . | pigz > '$(_DestinationFileName)' || tar -C '$(_OutputPathRoot)' -czf '$(_DestinationFileName)' ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the --use-compress-program=pigz
tar option instead? that'd simplify the command
we can detect presence of pigz like this and use it:
<Exec Command="command -v pigz" IgnoreExitCode="true" StandardOutputImportance="Low">
<Output TaskParameter="ExitCode" PropertyName="_PigzFoundExitCode" />
</Exec>
...
<PropertyGroup>
<UsePigzCommand Condition="'$(_PigzFoundExitCode)' == '0'">--use-compress-program=pigz</UsePigzCommand>
</PropertyGroup>
btw. I think this runs on Windows too right? we should skip detecting pigz there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a perf difference, which is why i've used pipe. Copilot says:
Piping Data: In the first command, tar -cf - . | pigz > file.tar.gz, the data is piped directly from tar to pigz. This allows both commands to run concurrently, potentially utilizing multiple CPU cores more effectively.
Single Process: In the second command, tar -cf file.tar.gz --use-compress-program=pigz ., tar handles the compression internally, which might not be as efficient in utilizing multiple cores compared to the first method.
I applied this diffs in files found by find ~/.nuget -name archives.targets
and ran packs subset to measure the perf.
I can split the check if necessary, but we are using one liner compact syntax in other places as well.
The shipping archives in windows are normally .zip, but if we are using it on windows, can add support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a perf difference, which is why i've used pipe. Copilot says:
What's the diff? Also please don't simply trust what Copilot says. The manpage for tar on my Mac says
--use-compress-program program
Pipe the input (in x or t mode) or the output (in c mode) through program instead of using the builtin compression support.
So it sounds like it should pipe the data too?
The shipping archives in windows are normally .zip, but if we are using it on windows, can add support?
Yes you can opt in to .tar.gz on Windows:
arcade/src/Microsoft.DotNet.Build.Tasks.Archives/README.md
Lines 13 to 17 in 5602a0a
# Creating tar.gz archives on Windows | |
There is an override that you can use to opt into generating tar.gz archives instead of zip archives on Windows to get an consistent experience as with linux and macos. | |
That opt-in is setting ``ArchiveFormat`` to ``tar.gz`` on a project that uses this package when building for Windows. | |
This can also be used on Linux and MacOS to force creating ``zip`` archives as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried on my Mac and I see no difference between --use-compress-program
and piping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did measurements myself and there was difference which is why I used the pipe, can you share the stats? The explanation for why the difference occurred is what copilot provided..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
busybox tar (alpine linux etc.) doesn't support this option:
tar: unrecognized option: use-compress-program=gzip
BusyBox v1.36.1 (2023-07-27 17:12:24 UTC) multi-call binary.
Usage: tar c|x|t [-ZzJjahmvokO] [-f TARFILE] [-C DIR] [-T FILE] [-X FILE] [LONGOPT]... [FILE]...
Create, extract, or list files from a tar file
c Create
x Extract
t List
-f FILE Name of TARFILE ('-' for stdin/out)
-C DIR Change to DIR before operation
-v Verbose
-O Extract to stdout
-m Don't restore mtime
-o Don't restore user:group
-k Don't replace existing files
-Z (De)compress using compress
-z (De)compress using gzip
-J (De)compress using xz
-j (De)compress using bzip2
--lzma (De)compress using lzma
-a (De)compress based on extension
-h Follow symlinks
-T FILE File with names to include
-X FILE File with glob patterns to exclude
--exclude PATTERN Glob pattern to exclude
--overwrite Replace existing files
--strip-components NUM NUM of leading components to strip
--no-recursion Don't descend in directories
--numeric-owner Use numeric user:group
--no-same-permissions Don't restore access permissions
where piping is the only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that piping doesn't work on Windows. I think we should just not support using pigz on Windows to make the conditionals easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Piping does work on windows cmd.exe, just tested it with https://sourceforge.net/projects/pigz-for-windows/ (and i have C:\Program Files\Git\usr\bin
in PATH which contains tar.exe):
> dotnet new console -n hw1
> cd hw1
> dotnet publish -r win-x64
> tar cf - bin | pigz > foo.tar.gz
> mkdir g
> cd g
> tar xzf ..\foo.tar.gz
> bin\Release\net8.0\win-x64\hw1.exe
> hw1.exe
Hello World!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nice, TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PowerShell (which msbuild only uses when we explicitly specify Exec Command="powershell ..."
, we aren't using it here), pipe can have strange behavior based on the codepage. AFAIK, cmd.exe doesn't have this problem https://stackoverflow.com/q/59110563.
src/Microsoft.DotNet.Build.Tasks.Archives/build/archives.targets
Outdated
Show resolved
Hide resolved
32d507a
to
cb34cdb
Compare
|
||
<Message Text="$(_OutputPathRoot) -> $(_DestinationFileName)" Importance="high" /> | ||
<Message Text="Successfully created archive -> '$(_DestinationFileName)' from '$(_OutputPathRoot)" Importance="high" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why you changed the message? the old format matched what Roslyn produced for compiling so it makes more sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I matched it with what nuget pack was producing in nearby context.
Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Crossgen2.linux-x64.10.0.0-ci.nupkg'.
Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Crossgen2.linux-x64.10.0.0-ci.symbols.nupkg'.
/__w/1/s/artifacts/obj/dotnet-nethost/Release/net9.0/linux-x64/output/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-nethost-10.0.0-ci-linux-x64.tar.gz
/__w/1/s/artifacts/obj/dotnet-nethost/Release/net9.0/linux-x64/symbols/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-nethost-symbols-linux-x64-10.0.0-ci.tar.gz
Microsoft.NETCore.App.Runtime ->
Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Runtime.linux-x64.10.0.0-ci.nupkg'.
Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Runtime.linux-x64.10.0.0-ci.symbols.nupkg'.
/__w/1/s/artifacts/obj/Microsoft.NETCore.App.Bundle/Release/net9.0/linux-x64/output/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-runtime-10.0.0-ci-linux-x64.tar.gz
Microsoft.Interop.SourceGeneration -> /__w/1/s/artifacts/bin/Microsoft.Interop.SourceGeneration/Debug/netstandard2.0/Microsoft.Interop.SourceGeneration.dll
DownlevelLibraryImportGenerator -> /__w/1/s/artifacts/bin/DownlevelLibraryImportGenerator/Debug/netstandard2.0/Microsoft.Interop.LibraryImportGenerator.Downlevel.dll
Microsoft.NET.HostModel -> /__w/1/s/artifacts/bin/Microsoft.NET.HostModel/Release/netstandard2.0/Microsoft.NET.HostModel.dll
The package Microsoft.NET.HostModel.10.0.0-ci is missing a readme. Go to https://aka.ms/nuget/authoring-best-practices/readme to learn why package readmes are important.
Successfully created package '/__w/1/s/artifacts/packages/Release/NonShipping/Microsoft.NET.HostModel.10.0.0-ci.nupkg'.
Successfully created package '/__w/1/s/artifacts/packages/Release/NonShipping/Microsoft.NET.HostModel.10.0.0-ci.symbols.nupkg'.
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
Can you push some empty change so I can approve again :D |
Head branch was pushed to by a user without write access
@akoeplinger, ready. 😎 |
pigz
(Parallel Implementation of GZip) is much faster than the traditiongzip
utility on Unix. Ideally, we should move to System.Formats.Tar based task for this, but for now this uses pigz over gzip tool to parallaleize the compression task. Consequently,dotnet-runtime-10.0.0-dev-osx-arm64.tar.gz
creation goes from3.7s
to0.57s
afterbrew install pigz
.cc @akoeplinger, @ViktorHofer