Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pigz when available for faster tar.gz #15038

Merged
merged 4 commits into from
Sep 10, 2024

Conversation

am11
Copy link
Member

@am11 am11 commented Aug 29, 2024

pigz (Parallel Implementation of GZip) is much faster than the tradition gzip utility on Unix. Ideally, we should move to System.Formats.Tar based task for this, but for now this uses pigz over gzip tool to parallaleize the compression task. Consequently, dotnet-runtime-10.0.0-dev-osx-arm64.tar.gz creation goes from 3.7s to 0.57s after brew install pigz.

cc @akoeplinger, @ViktorHofer

@@ -94,7 +94,7 @@
Overwrite="true"
DestinationFile="$(_DestinationFileName)"
Condition="'$(ArchiveFormat)' == 'zip'"/>
<Exec Command="tar -C '$(_OutputPathRoot)' -czf $(_DestinationFileName) ."
<Exec Command="command -v pigz &gt;/dev/null 2&gt;&amp;1 &amp;&amp; tar -C '$(_OutputPathRoot)' -cf - . | pigz &gt; '$(_DestinationFileName)' || tar -C '$(_OutputPathRoot)' -czf '$(_DestinationFileName)' ."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the --use-compress-program=pigz tar option instead? that'd simplify the command

we can detect presence of pigz like this and use it:

    <Exec Command="command -v pigz" IgnoreExitCode="true" StandardOutputImportance="Low">
      <Output TaskParameter="ExitCode" PropertyName="_PigzFoundExitCode" />
    </Exec>
...
   <PropertyGroup>
     <UsePigzCommand Condition="'$(_PigzFoundExitCode)' == '0'">--use-compress-program=pigz</UsePigzCommand>
   </PropertyGroup>

btw. I think this runs on Windows too right? we should skip detecting pigz there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a perf difference, which is why i've used pipe. Copilot says:

Piping Data: In the first command, tar -cf - . | pigz > file.tar.gz, the data is piped directly from tar to pigz. This allows both commands to run concurrently, potentially utilizing multiple CPU cores more effectively.
Single Process: In the second command, tar -cf file.tar.gz --use-compress-program=pigz ., tar handles the compression internally, which might not be as efficient in utilizing multiple cores compared to the first method.

I applied this diffs in files found by find ~/.nuget -name archives.targets and ran packs subset to measure the perf.

I can split the check if necessary, but we are using one liner compact syntax in other places as well.

The shipping archives in windows are normally .zip, but if we are using it on windows, can add support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a perf difference, which is why i've used pipe. Copilot says:

What's the diff? Also please don't simply trust what Copilot says. The manpage for tar on my Mac says

--use-compress-program program
   Pipe the input (in x or t mode) or the output (in c mode) through program instead of using the builtin compression support.

So it sounds like it should pipe the data too?

The shipping archives in windows are normally .zip, but if we are using it on windows, can add support?

Yes you can opt in to .tar.gz on Windows:

# Creating tar.gz archives on Windows
There is an override that you can use to opt into generating tar.gz archives instead of zip archives on Windows to get an consistent experience as with linux and macos.
That opt-in is setting ``ArchiveFormat`` to ``tar.gz`` on a project that uses this package when building for Windows.
This can also be used on Linux and MacOS to force creating ``zip`` archives as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried on my Mac and I see no difference between --use-compress-program and piping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did measurements myself and there was difference which is why I used the pipe, can you share the stats? The explanation for why the difference occurred is what copilot provided..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

busybox tar (alpine linux etc.) doesn't support this option:

tar: unrecognized option: use-compress-program=gzip
BusyBox v1.36.1 (2023-07-27 17:12:24 UTC) multi-call binary.

Usage: tar c|x|t [-ZzJjahmvokO] [-f TARFILE] [-C DIR] [-T FILE] [-X FILE] [LONGOPT]... [FILE]...

Create, extract, or list files from a tar file

	c	Create
	x	Extract
	t	List
	-f FILE	Name of TARFILE ('-' for stdin/out)
	-C DIR	Change to DIR before operation
	-v	Verbose
	-O	Extract to stdout
	-m	Don't restore mtime
	-o	Don't restore user:group
	-k	Don't replace existing files
	-Z	(De)compress using compress
	-z	(De)compress using gzip
	-J	(De)compress using xz
	-j	(De)compress using bzip2
	--lzma	(De)compress using lzma
	-a	(De)compress based on extension
	-h	Follow symlinks
	-T FILE	File with names to include
	-X FILE	File with glob patterns to exclude
	--exclude PATTERN	Glob pattern to exclude
	--overwrite		Replace existing files
	--strip-components NUM	NUM of leading components to strip
	--no-recursion		Don't descend in directories
	--numeric-owner		Use numeric user:group
	--no-same-permissions	Don't restore access permissions

where piping is the only option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that piping doesn't work on Windows. I think we should just not support using pigz on Windows to make the conditionals easier.

Copy link
Member Author

@am11 am11 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piping does work on windows cmd.exe, just tested it with https://sourceforge.net/projects/pigz-for-windows/ (and i have C:\Program Files\Git\usr\bin in PATH which contains tar.exe):

> dotnet new console -n hw1
> cd hw1
> dotnet publish -r win-x64
> tar cf - bin | pigz > foo.tar.gz
> mkdir g
> cd g
> tar xzf ..\foo.tar.gz
> bin\Release\net8.0\win-x64\hw1.exe
> hw1.exe
Hello World!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice, TIL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PowerShell (which msbuild only uses when we explicitly specify Exec Command="powershell ...", we aren't using it here), pipe can have strange behavior based on the codepage. AFAIK, cmd.exe doesn't have this problem https://stackoverflow.com/q/59110563.

@am11 am11 force-pushed the feature/arcade/faster-targz branch from 32d507a to cb34cdb Compare September 4, 2024 19:22

<Message Text="$(_OutputPathRoot) -> $(_DestinationFileName)" Importance="high" />
<Message Text="Successfully created archive -> '$(_DestinationFileName)' from '$(_OutputPathRoot)" Importance="high" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why you changed the message? the old format matched what Roslyn produced for compiling so it makes more sense to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I matched it with what nuget pack was producing in nearby context.

  Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Crossgen2.linux-x64.10.0.0-ci.nupkg'.
  Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Crossgen2.linux-x64.10.0.0-ci.symbols.nupkg'.
  /__w/1/s/artifacts/obj/dotnet-nethost/Release/net9.0/linux-x64/output/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-nethost-10.0.0-ci-linux-x64.tar.gz
  /__w/1/s/artifacts/obj/dotnet-nethost/Release/net9.0/linux-x64/symbols/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-nethost-symbols-linux-x64-10.0.0-ci.tar.gz
  Microsoft.NETCore.App.Runtime -> 
  Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Runtime.linux-x64.10.0.0-ci.nupkg'.
  Successfully created package '/__w/1/s/artifacts/packages/Release/Shipping/Microsoft.NETCore.App.Runtime.linux-x64.10.0.0-ci.symbols.nupkg'.
  /__w/1/s/artifacts/obj/Microsoft.NETCore.App.Bundle/Release/net9.0/linux-x64/output/ -> /__w/1/s/artifacts/packages/Release/Shipping//dotnet-runtime-10.0.0-ci-linux-x64.tar.gz
  Microsoft.Interop.SourceGeneration -> /__w/1/s/artifacts/bin/Microsoft.Interop.SourceGeneration/Debug/netstandard2.0/Microsoft.Interop.SourceGeneration.dll
  DownlevelLibraryImportGenerator -> /__w/1/s/artifacts/bin/DownlevelLibraryImportGenerator/Debug/netstandard2.0/Microsoft.Interop.LibraryImportGenerator.Downlevel.dll
  Microsoft.NET.HostModel -> /__w/1/s/artifacts/bin/Microsoft.NET.HostModel/Release/netstandard2.0/Microsoft.NET.HostModel.dll
  The package Microsoft.NET.HostModel.10.0.0-ci is missing a readme. Go to https://aka.ms/nuget/authoring-best-practices/readme to learn why package readmes are important.
  Successfully created package '/__w/1/s/artifacts/packages/Release/NonShipping/Microsoft.NET.HostModel.10.0.0-ci.nupkg'.
  Successfully created package '/__w/1/s/artifacts/packages/Release/NonShipping/Microsoft.NET.HostModel.10.0.0-ci.symbols.nupkg'.
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@akoeplinger akoeplinger enabled auto-merge (squash) September 9, 2024 13:31
akoeplinger
akoeplinger previously approved these changes Sep 9, 2024
@am11
Copy link
Member Author

am11 commented Sep 9, 2024

Merging can be performed automatically with 1 approving review. guess github is confused today 😅

@akoeplinger
Copy link
Member

New changes require approval from someone other than akoeplinger because they were the last pusher.

Can you push some empty change so I can approve again :D

auto-merge was automatically disabled September 10, 2024 04:46

Head branch was pushed to by a user without write access

@am11
Copy link
Member Author

am11 commented Sep 10, 2024

@akoeplinger, ready. 😎

@akoeplinger akoeplinger merged commit 101a54b into dotnet:main Sep 10, 2024
11 checks passed
@am11 am11 deleted the feature/arcade/faster-targz branch September 10, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants