Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async ZipFile APIs #1541

Open
Tracked by #62658
ericstj opened this issue Jan 25, 2016 · 52 comments
Open
Tracked by #62658

Add async ZipFile APIs #1541

ericstj opened this issue Jan 25, 2016 · 52 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO.Compression
Milestone

Comments

@ericstj
Copy link
Member

ericstj commented Jan 25, 2016

All ZipFile APIs are currently synchronous. This means manipulations to zip files will always block a thread. We should investigate using an async file and calling async IO APIs (ReadAsync/WriteAsync) to free up the thread while IO is happening. dotnet/corefx#5680

@ianhays ianhays removed their assignment Oct 10, 2016
@danmoseley
Copy link
Member

This needs someone to write up a formal API proposal then we can review and if approved, offer up for implementation. Anyone?

Docs on the API review process and see Jon's API request for a great example of a strong proposal. The more concrete the proposal is (e.g. has examples of usage, real-world scenarios, fleshed out API), the more discussion it will start and the better the chances of us being able to push the addition through the review process.

@abdulbeard
Copy link

abdulbeard commented Jun 19, 2018

Forked the repo, and working on an api proposal.
Seems like most of the work outlined in this item is already done. e.g
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L682
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L732

That probably means this item is done, right?
@danmosemsft

@ianhays
Copy link
Contributor

ianhays commented Jun 19, 2018

Not quite, @abdulbeard. This particular issue is around adding ZipFile async APIs.

For example, currently we have ZipFile.ExtractToDirectory(string, string). This issue is around adding async versions of those API e.g. ZipFile.ExtractToDirectoryAsync(string, string).

@abdulbeard
Copy link

Gotcha @ianhays . I did a terrible job reading the title 😛

@abdulbeard
Copy link

Summary

Curently, System.IO.Compression.Zipfile doesn't have async methods for CreateFromDirectory, ExtractToDirectory, Open and OpenRead. This means that these zipfile operations are thread-blocking, and not asynchronous, and hence, comparatively less performant.

Proposed API

namespace System.IO.Compression {
    public partial class Zipfile {
        public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName, CancellationToken cancellationToken = default(CancellationToken));

        public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName,	CompressionLevel compressionLevel, bool includeBaseDirectory, CancellationToken cancellationToken = default(CancellationToken));

        public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName,	CompressionLevel compressionLevel, bool includeBaseDirectory, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));

        public static Task ExtractToDirectoryAsync(string sourceArchiveFileName, string destinationDirectoryName, CancellationToken cancellationToken = default(CancellationToken));

        public static Task ExtractToDirectoryAsync(string sourceArchiveFileName, string destinationDirectoryName, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));

        public static Task<ZipArchive> OpenAsync(string archiveFileName, ZipArchiveMode mode, CancellationToken cancellationToken = default(CancellationToken));

        public static Task<ZipArchive> OpenAsync(string archiveFileName, ZipArchiveMode mode, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));

        public static Task<ZipArchive> OpenReadAsync(string archiveFileName, CancellationToken cancellationToken = default(CancellationToken));
    }
}

Expected Use and Benefits

These new functions are asynchronous versions of their synchronous counterparts. This allows the .net runtime to allow the thread to do other work while the asynchronous operation completes (as mandated by the TPL and async/await scheduling).

  • Zipfile operations will no longer be thread-blocking, improving performance.
  • Long-running Zipfile operations can now be cancelled via the CancellationToken, by the user, allowing them more control.
  • Asynchronous FileStream or other OS/runtime optimizations can be utilized to improve performance here.

@abdulbeard
Copy link

Here's a proposal ^. Looking forward to constructive feedback and insight.

@svick
Copy link
Contributor

svick commented Jun 19, 2018

Should async APIs be added to ZipArchive at the same time? For example, consider the following code:

using (var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update))
{
    archive.CreateEntry("empty.txt");
}

using (var archive = await ZipFile.OpenReadAsync("archive.zip"))
{
    foreach (var entry in archive.Entries)
    {
        Console.WriteLine(entry.FullName);
    }
}

As far as I can tell from skimming the source code, even though the above code uses the proposed async APIs as much as it can, it still blocks when accessing Entries and, more insidiously, when disposing the first archive.

@abdulbeard
Copy link

abdulbeard commented Jun 27, 2018

Here's a proposal for ZipArchive class:

namespace System.IO.Compression {
    public class ZipArchive {
        public Task<ReadOnlyCollection<ZipArchiveEntry>> GetEntriesAsync();
        public Task<ZipArchiveEntry> GetEntryAsync(string entryName, CancellationToken cancellationToken = default(cancellationToken));
    }
}

@svick
Copy link
Contributor

svick commented Jun 27, 2018

@abdulbeard How does that help? As far as I can tell, CreateEntry does not block in the current implementation. On the other hand, accessing Entries and calling Dispose() can block, depending on the mode.

@abdulbeard
Copy link

@svick you're right. Async versions of CreateEntry don't do much, since their underlying functionality is not blocking.
On the matter of Dispose() being async, not sure there's a good way to do that.
In the matter of Entries, I can imagine replacing that with something like:

public Task<ReadOnlyCollection<ZipArchiveEntry>> GetEntriesAsync();
public Task<ZipArchiveEntry> GetEntryAsync(string entryName, CancellationToken cancellationToken = default(cancellationToken));

Amended the proposal above.

@danmoseley
Copy link
Member

@JeremyKuhne could you give feedback on this proposal? Perhaps we can move it forward

@svick
Copy link
Contributor

svick commented Jun 28, 2018

@abdulbeard

On the matter of Dispose() being async, not sure there's a good way to do that.

Yeah, there isn't a great approach until dotnet/roslyn#114 is added. But in the meantime, I think there are ways to make it work. The approaches I considered are:

  • Add FlushAsync(). This would write the current state of the ZipArchive to disk, which means that calling Dispose() right after FlushAsync() would not block.

    Example usage:

    using (var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update))
    {
        archive.CreateEntry("empty.txt");
        await archive.FlushAsync();
    }
  • Add DisposeAsync(). Once IAsyncDisposable, using statements, and async/await roslyn#114 is added, you would be able to use it with using.

    Example usage:

    var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update)
    try
    {
        archive.CreateEntry("empty.txt");
    }
    finally
    {
        await archive.DisposeAsync();
    }

Looking at the two examples above, I think adding FlushAsync() is the better option.

@abdulbeard
Copy link

@svick
To me it makes sense to wait on resolution of dotnet/roslyn/#114 before deciding on FlusyAsync vs DisposeAsync.
But I can also see value in finishing out this stage with FlushAsync, and then once we have an async dispose, let that replace FlushAsync.
But I'd love to hear from @JeremyKuhne etc. as well.

@abdulbeard
Copy link

@danmosemsft @JeremyKuhne any suggestions for changes to ^, or shall we move forward?

@abdulbeard
Copy link

Anything I can do to help move this along?

@abdulbeard
Copy link

crickets..........

@JeremyKuhne
Copy link
Member

@abdulbeard Sorry, the actual owner here moved on to another project and this got lost in the shuffle. 😦 Thanks for poking on it.

@stephentoub can you give some direction/guidance on the disposal questions? @ericstj do you think this proposal provides what you were looking for?

Presumably IAsyncEnumerable<ZipArchiveEntry> would be the right choice for GetEntriesAsync...

@JeremyKuhne
Copy link
Member

@ahsonkhan, @ViktorHofer can one of you take this and drive it forward?

@stephentoub
Copy link
Member

stephentoub commented Dec 7, 2018

can you give some direction/guidance on the disposal questions?

We have added IAsyncDisposable and C# is adding await using that works with it. If disposing a ZipArchive does work that could be asynchronous (e.g. flushing a stream), having ZipArchive implement it is quite reasonable. Note that Stream itself now implements IAsyncDisposable as well.

@stephentoub
Copy link
Member

Presumably IAsyncEnumerable would be the right choice for GetEntriesAsync

It depends. Is ZipArchive lazily reading from a stream, e.g. when you go to read the next entry, might that perform a read on the stream? If yes, then having it return an IAsyncEnumerable could make sense. If no, then it should probably return a synchronous enumerable.

@TianqiZhang
Copy link

Just curious is there any update on this issue? We have a .NET Core project that creates zip files in Azure blobs, and we are forced to use the sync version of ZipArchive apis because the async ones will cause unit tests to hang.

@kccsf
Copy link

kccsf commented Apr 9, 2019

I have some code that is generating and streaming a zip archive on the fly, which fails if AllowSynchronousIO is false, which is the default from core 3.0.0-preview3 onwards:
dotnet/aspnetcore#7644

System.InvalidOperationException: Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead.
at Microsoft.AspNetCore.Server.IIS.Core.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.AspNetCore.Server.IIS.Core.WrappingStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Compression.PositionPreservingWriteOnlyStreamWrapper.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.BinaryWriter.Write(UInt32 value)
at System.IO.Compression.ZipArchiveEntry.WriteCentralDirectoryFileHeader()
at System.IO.Compression.ZipArchive.WriteFile()
at System.IO.Compression.ZipArchive.Dispose(Boolean disposing)
at System.IO.Compression.ZipArchive.Dispose()

@m-valenta
Copy link

Like @kccsf I have simillar issue with the Kestrel server configured to disallow synchronous IO. With the current implementation I can not use ZipArchive directly with the Kestrel's response stream - because of synchronous disposing.

@ahsonkhan ahsonkhan removed their assignment Sep 11, 2019
@ahsonkhan
Copy link
Member

cc @carlossanlop

@juunas11
Copy link

Just hit this today as well. :\ After writing the entries, Dispose() throws the exception. Worked around by allowing sync I/O at that point for now, but would be nice to have some async API that I could call to remove the need for sync I/O.

@jozkee jozkee added the untriaged New issue has not been triaged by the area owner label Apr 24, 2021
@tjmoore
Copy link

tjmoore commented Jun 18, 2021

Is there any workaround for the "Synchronous operations are disallowed" issue writing to a response stream as it looks like any proposal is still a long way off and can't wait for .NET 6+ (and for platform reasons I'm currently limited to .NET Core 3.1) ?

Only options with ZipArchive I can see is to write the zip to a temp file and read that back as a stream asynchronously, or use a memory stream but that's not ideal with potentially large zip files.

Tried PushStreamContent but it didn't call the content method, and advice is it's not needed in .NET Core as we have Response.Body stream anyway.

@stephentoub
Copy link
Member

Is there any workaround for the "Synchronous operations are disallowed" issue writing to a response stream

ASP.NET exposes a flag to control that, e.g.

    services.Configure<KestrelServerOptions>(options =>
    {
        options.AllowSynchronousIO = true;
    });

@askolniak
Copy link

@tjmoore - as a workaround you can allow synchronous operations at controller's method level:
var feature = HttpContext.Features.Get<IHttpBodyControlFeature>(); feature.AllowSynchronousIO = true;

@tjmoore
Copy link

tjmoore commented Jun 18, 2021

@stephentoub @askolniak Is that wise though as synchronous IO was disabled for good reason (dotnet/aspnetcore#7644)?

Though it's an option. Currently resorting to a temporary file and not sure if that's better or the risk of hanging the app.

@stephentoub
Copy link
Member

stephentoub commented Jun 18, 2021

You asked for a workaround. That's the workaround :-)

ASP.NET enforces that by default in the name of scalability. It gives you the option of not enforcing it once you decide you're ok with the tradeoffs. It would likely depend on the size of what you're compressing, scalability needs, etc. as to which workaround is best.

@danmoseley
Copy link
Member

I wonder whether we can advance this proposal to reviewable. Then there would be time for someone passionate to get it into .NET 6.

@adamsitnik adamsitnik modified the milestones: Future, 7.0.0 Jun 21, 2021
@adamsitnik
Copy link
Member

I wonder whether we can advance this proposal to reviewable.

I have moved it to 7.0 from Future, will provide a proposal once we are done with clearing the 6.0 bugs backlog.

@adamsitnik adamsitnik self-assigned this Jun 21, 2021
@adamsitnik adamsitnik added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 21, 2021
@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2021
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 7, 2022
@gragra33
Copy link

I have found a work around for now ...

async Task DoZipFileWorkAsync()
{
     // ZipArchive is a Synchronous Block, so yield asynchronously
    await Task.Yield();

    // Open archive
    using ZipArchive zipArchive = new(File.OpenRead(path));

   // ... do work
}

All work is now done on a .Net TreadPool Worker Thread.

@Casper-Olsen
Copy link

I have found a work around for now ...

async Task DoZipFileWorkAsync()
{
     // ZipArchive is a Synchronous Block, so yield asynchronously
    await Task.Yield();

    // Open archive
    using ZipArchive zipArchive = new(File.OpenRead(path));

   // ... do work
}

All work is now done on a .Net TreadPool Worker Thread.

This is not really a work around, since all work with the ZipArchive is still done synchronously.
And it's not guaranteed that work will continue on the thread pool, after Task.Yield. Only if there is no SynchronizationContext.

@gragra33
Copy link

This is not really a work around, since all work with the ZipArchive is still done synchronously. And it's not guaranteed that work will continue on the thread pool, after Task.Yield. Only if there is no SynchronizationContext.

I do agree to some extent, however testing done works as expected in a worker thread, not the Main Thread. I would still require propper asynchronous support to replace this temporary work around.

@synek317
Copy link

synek317 commented Oct 3, 2022

I wanted to try adding async and ended up with copy-pasting dotnet source code of ZipFileExtensions.ExtractToDirectory and adding async here and there.

I even added a progress report.

The downside is, I had to use english exception message since the methods to get translated messages from resources are internal.

public static class ZipFileAsyncExtensions
{
  public static async Task ExtractToDirectoryAsync(
    this ZipArchive             source,
    string                      destinationDirectoryName,
    bool                        overwriteFiles,
    Action<IZipExtractProgress> progressCallback  = null,
    CancellationToken           cancellationToken = default
  )
  {
    var progress = progressCallback is null
      ? default
      : new ZipExtractProgress
      {
        EntriesCount = source.Entries.Count(),
        BytesTotal = source.Entries.Sum(e => e.Length)
      };

    foreach (ZipArchiveEntry entry in source.Entries)
    {
      await entry.ExtractRelativeToDirectoryAsync(destinationDirectoryName, overwriteFiles, progressCallback, progress, cancellationToken);
      progress.EntriesExtracted += 1;
    }

    progressCallback?.Invoke(progress);
  }

  private static async Task ExtractRelativeToDirectoryAsync(
    this ZipArchiveEntry       source,
    string                     destinationDirectoryName,
    bool                       overwrite,
    Action<ZipExtractProgress> progressCallback,
    ZipExtractProgress         progress,
    CancellationToken          cancellationToken = default
  )
  {
    string path1 = destinationDirectoryName != null
      ? Directory.CreateDirectory(destinationDirectoryName).FullName
      : throw new ArgumentNullException(nameof(destinationDirectoryName));
    if (!path1.EndsWith(Path.DirectorySeparatorChar))
      path1 += Path.DirectorySeparatorChar.ToString();
    string fullPath = Path.GetFullPath(Path.Combine(path1, source.FullName));
    if (!fullPath.StartsWith(path1))
      throw new IOException("Extracting Zip entry would have resulted in a file outside the specified destination directory.");
    if (Path.GetFileName(fullPath).Length == 0)
    {
      if (source.Length != 0L)
      {
        throw new IOException("Zip entry name ends in directory separator character but contains data.");
      }

      Directory.CreateDirectory(fullPath);
    }
    else
    {
      Directory.CreateDirectory(Path.GetDirectoryName(fullPath));
      await source.ExtractToFileAsync(fullPath, overwrite, progressCallback, progress, cancellationToken);
    }
  }

  private static async Task ExtractToFileAsync(
    this ZipArchiveEntry       source,
    string                     destinationFileName,
    bool                       overwrite,
    Action<ZipExtractProgress> progressCallback,
    ZipExtractProgress         progress,
    CancellationToken          cancellationToken = default
  )
  {
    var initialBytesExtracted = progress.BytesExtracted;
    const int bufferSize     = 81920;

    if (source == null)
      throw new ArgumentNullException(nameof(source));
    if (destinationFileName == null)
      throw new ArgumentNullException(nameof(destinationFileName));
    FileMode mode = overwrite ? FileMode.Create : FileMode.CreateNew;
    using (Stream destination = new FileStream(destinationFileName, mode, FileAccess.Write, FileShare.None, bufferSize, false))
    {
      using (Stream stream = source.Open())
      {
        if (progressCallback != null)
        {
          await stream.CopyToAsyncWithProgress(
            destination,
            bytesCopied =>
            {
              progress.BytesExtracted = initialBytesExtracted + bytesCopied;
              progressCallback(progress);
            },
            bufferSize,
            cancellationToken
          );
        }
        else
        {
          await stream.CopyToAsync(destination, bufferSize, cancellationToken);
        }
      }
    }

    File.SetLastWriteTime(destinationFileName, source.LastWriteTime.DateTime);
  }
}

public interface IZipExtractProgress
{
  int  EntriesCount     { get; }
  int  EntriesExtracted { get; }
  long BytesTotal       { get; }
  long BytesExtracted   { get; }
  bool IsFinished       => EntriesCount == EntriesExtracted && BytesExtracted == BytesTotal;
}

class ZipExtractProgress : IZipExtractProgress
{
  public int  EntriesCount     { get; set; }
  public int  EntriesExtracted { get; set; }
  public long BytesTotal       { get; set; }
  public long BytesExtracted   { get; set; }
}

@carlossanlop
Copy link
Member

All the ZipFile and ZipArchive async APIs should be implemented together. For that reason, I closed #1560 in favor of this issue. There was a good discussion there we should use for reference.

There are some problems with the way ZipArchive is implemented that would make it complex to implement the async methods:

  • BinaryReader/BinaryWriter needs to also implement async APIs, so we need the proposal for those first.
  • We perform many writes and reads using these binary classes. Ideally, we should first reduce the number of calls for each block, similar to how we implemented System.Formats.Tar: Improve performance of Tar library #74281
  • Additionally, we know where each metadata field is located in the zip blocks, so we should explore rewriting the code in a way that would let us handle unseekable streams. This is something I had to recently investigate for System.Formats.Tar: Tar: Improve unseekable stream handling #84279
  • And finally: the zip blocks we use were implemented using structs. These are not ideal for async code. We need to consider changing them to classes, but being careful to not affect perf. I had initially fallen into this trap when working on System.Formats.Tar, but we were able to address it before shipping: Convert tar header struct to class #72472

@lvde0
Copy link

lvde0 commented Apr 18, 2024

This issue is now 8 years old... Are there any plans to target this in the near future? 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO.Compression
Projects
None yet
Development

No branches or pull requests