Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar: Improve unseekable stream handling #84279

Merged
merged 22 commits into from
Jun 1, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
295613c
Add tests that verify we handle unseekable streams correctly.
carlossanlop Apr 3, 2023
d627497
Add expected data field locations for all supported formats.
carlossanlop Apr 3, 2023
018f709
Add exception message for when attempting to write an unseekable data…
carlossanlop Apr 3, 2023
9092cb4
Add seekability validation in public TarWriter entry writing methods.
carlossanlop Apr 3, 2023
6c460eb
Add TarFile stream roundtrip tests for unseekable streams.
carlossanlop Apr 4, 2023
6ea8e0f
Add missing async TarFile roundtrip tests.
carlossanlop Apr 4, 2023
f87b9e9
Support unseekable streams in TarHeader.Write.
carlossanlop Apr 4, 2023
e94fefb
Reuse and simplify the code.
carlossanlop Apr 4, 2023
15523ac
More reuse, remove unused and not needed.
carlossanlop Apr 4, 2023
5979be9
Remove TarFile.CreateFromDirectoryAsync.File.Roundtrip.cs. Submit it …
carlossanlop Apr 4, 2023
b702a44
Remove unnecessary resx comments.
carlossanlop May 23, 2023
625a619
Dedicated method for writing fields to buffer depending on the format.
carlossanlop May 23, 2023
4f702c4
Specify `Data` in name of method that expects unseekable data stream.…
carlossanlop May 23, 2023
0054f99
Delete unnecessary method.
carlossanlop May 23, 2023
2fc23de
Rename WritePadding to WriteEmptyPadding
carlossanlop May 23, 2023
d81b5a8
Rename test variables
carlossanlop May 24, 2023
796d542
Merge identical test arrays into one
carlossanlop May 24, 2023
94157b0
Invert if else to be more clear about conditions
carlossanlop May 24, 2023
1557885
remove size assign comment
carlossanlop May 24, 2023
943988b
Remove redundant debug assert
carlossanlop May 24, 2023
abff5c0
Async padding byte array creation simplification
carlossanlop May 24, 2023
66dc094
Apply suggestions from code review
adamsitnik May 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 4 additions & 60 deletions src/libraries/System.Formats.Tar/src/Resources/Strings.resx
Original file line number Diff line number Diff line change
@@ -1,64 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema

Version 2.0

The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.

Example:

... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data>
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>

There are any number of "resheader" rows that contain simple
name/value pairs.

Each data row contains a name, and value. The row also contains a
type or mimetype. Type corresponds to a .NET class that support
text/value conversion through the TypeConverter architecture.
Classes that don't support this are serialized and stored with the
mimetype set.

The mimetype is used for serialized objects, and tells the
ResXResourceReader how to depersist the object. This is currently not
extensible. For a given mimetype the value must be set accordingly:

Note - application/x-microsoft.net.object.binary.base64 is the format
that the ResXResourceWriter will generate, however the reader can
read any of the formats listed below.

mimetype: application/x-microsoft.net.object.binary.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
: and then encoded with base64 encoding.

mimetype: application/x-microsoft.net.object.soap.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Soap.SoapFormatter
: and then encoded with base64 encoding.

mimetype: application/x-microsoft.net.object.bytearray.base64
value : The object must be serialized into a byte array
: using a System.ComponentModel.TypeConverter
: and then encoded with base64 encoding.
-->
<xsd:schema id="root" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xsd:import namespace="http://www.w3.org/XML/1998/namespace" />
<xsd:element name="root" msdata:IsDataSet="true">
Expand Down Expand Up @@ -270,4 +211,7 @@
<data name="TarExtAttrDisallowedValueChar" xml:space="preserve">
<value>The value of the extended attribute key '{0}' contains a disallowed '{1}' character.</value>
</data>
</root>
<data name="TarStreamSeekabilityUnsupportedCombination" xml:space="preserve">
<value>Cannot write the unseekable data stream of entry '{0}' into an unseekable archive stream.</value>
</data>
</root>
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,9 @@ internal static class FieldLocations
internal const ushort V7Padding = LinkName + FieldLengths.LinkName;
internal const ushort PosixPadding = Prefix + FieldLengths.Prefix;
internal const ushort GnuPadding = RealSize + FieldLengths.RealSize;

internal const ushort V7Data = V7Padding + FieldLengths.V7Padding;
internal const ushort PosixData = PosixPadding + FieldLengths.PosixPadding;
internal const ushort GnuData = GnuPadding + FieldLengths.GnuPadding;
}
}

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ public void WriteEntry(TarEntry entry)
ObjectDisposedException.ThrowIf(_isDisposed, this);
ArgumentNullException.ThrowIfNull(entry);
ValidateEntryLinkName(entry._header._typeFlag, entry._header._linkName);
ValidateStreamsSeekability(entry);
WriteEntryInternal(entry);
}

Expand Down Expand Up @@ -270,6 +271,7 @@ public Task WriteEntryAsync(TarEntry entry, CancellationToken cancellationToken
ObjectDisposedException.ThrowIf(_isDisposed, this);
ArgumentNullException.ThrowIfNull(entry);
ValidateEntryLinkName(entry._header._typeFlag, entry._header._linkName);
ValidateStreamsSeekability(entry);
return WriteEntryAsyncInternal(entry, cancellationToken);
}

Expand All @@ -281,12 +283,8 @@ private void WriteEntryInternal(TarEntry entry)

switch (entry.Format)
{
case TarEntryFormat.V7:
entry._header.WriteAsV7(_archiveStream, buffer);
break;

case TarEntryFormat.Ustar:
entry._header.WriteAsUstar(_archiveStream, buffer);
case TarEntryFormat.V7 or TarEntryFormat.Ustar:
entry._header.WriteAs(entry.Format, _archiveStream, buffer);
break;

case TarEntryFormat.Pax:
Expand Down Expand Up @@ -323,8 +321,7 @@ private async Task WriteEntryAsyncInternal(TarEntry entry, CancellationToken can

Task task = entry.Format switch
{
TarEntryFormat.V7 => entry._header.WriteAsV7Async(_archiveStream, buffer, cancellationToken),
TarEntryFormat.Ustar => entry._header.WriteAsUstarAsync(_archiveStream, buffer, cancellationToken),
TarEntryFormat.V7 or TarEntryFormat.Ustar => entry._header.WriteAsAsync(entry.Format, _archiveStream, buffer, cancellationToken),
TarEntryFormat.Pax when entry._header._typeFlag is TarEntryType.GlobalExtendedAttributes => entry._header.WriteAsPaxGlobalExtendedAttributesAsync(_archiveStream, buffer, _nextGlobalExtendedAttributesEntryNumber++, cancellationToken),
TarEntryFormat.Pax => entry._header.WriteAsPaxAsync(_archiveStream, buffer, cancellationToken),
TarEntryFormat.Gnu => entry._header.WriteAsGnuAsync(_archiveStream, buffer, cancellationToken),
Expand Down Expand Up @@ -374,6 +371,14 @@ private async ValueTask WriteFinalRecordsAsync()
return (fullPath, actualEntryName);
}

private void ValidateStreamsSeekability(TarEntry entry)
{
if (!_archiveStream.CanSeek && entry._header._dataStream != null && !entry._header._dataStream.CanSeek)
{
throw new IOException(SR.Format(SR.TarStreamSeekabilityUnsupportedCombination, entry.Name));
}
}

private static void ValidateEntryLinkName(TarEntryType entryType, string? linkName)
{
if (entryType is TarEntryType.HardLink or TarEntryType.SymbolicLink)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.IO.Enumeration;
using System.Linq;
using Xunit;

Expand Down Expand Up @@ -204,5 +204,65 @@ public void PaxNameCollision_DedupInExtendedAttributes()
Assert.True(File.Exists(path1));
Assert.True(Path.Exists(path2));
}

[Theory]
[MemberData(nameof(GetTestTarFormats))]
public void UnseekableStreams_RoundTrip(TestTarFormat testFormat)
{
using TempDirectory root = new();

using MemoryStream sourceStream = GetTarMemoryStream(CompressionMethod.Uncompressed, testFormat, "many_small_files");
using WrappedStream sourceUnseekableArchiveStream = new(sourceStream, canRead: true, canWrite: false, canSeek: false);

TarFile.ExtractToDirectory(sourceUnseekableArchiveStream, root.Path, overwriteFiles: false);

using MemoryStream destinationStream = new();
using WrappedStream destinationUnseekableArchiveStream = new(destinationStream, canRead: true, canWrite: true, canSeek: false);
TarFile.CreateFromDirectory(root.Path, destinationUnseekableArchiveStream, includeBaseDirectory: false);

FileSystemEnumerable<FileSystemInfo> fileSystemEntries = new FileSystemEnumerable<FileSystemInfo>(
directory: root.Path,
transform: (ref FileSystemEntry entry) => entry.ToFileSystemInfo(),
options: new EnumerationOptions() { RecurseSubdirectories = true });

destinationStream.Position = 0;
using TarReader reader = new TarReader(destinationStream, leaveOpen: false);

// Size of files in many_small_files.tar are expected to be tiny and all equal
int bufferLength = 1024;
byte[] fileContent = new byte[bufferLength];
byte[] dataStreamContent = new byte[bufferLength];
TarEntry entry = reader.GetNextEntry();
do
{
Assert.NotNull(entry);
string entryPath = Path.TrimEndingDirectorySeparator(Path.GetFullPath(Path.Join(root.Path, entry.Name)));
FileSystemInfo fsi = fileSystemEntries.SingleOrDefault(file =>
file.FullName == entryPath);
Assert.NotNull(fsi);
if (entry.EntryType is TarEntryType.RegularFile or TarEntryType.V7RegularFile)
{
Assert.NotNull(entry.DataStream);

using Stream fileData = File.OpenRead(fsi.FullName);

// If the size of the files in manu_small_files.tar ever gets larger than bufferLength,
// these asserts should fail and the test will need to be updated
AssertExtensions.LessThanOrEqualTo(entry.Length, bufferLength);
AssertExtensions.LessThanOrEqualTo(fileData.Length, bufferLength);

Assert.Equal(fileData.Length, entry.Length);

Array.Clear(fileContent);
Array.Clear(dataStreamContent);

fileData.ReadExactly(fileContent, 0, (int)entry.Length);
entry.DataStream.ReadExactly(dataStreamContent, 0, (int)entry.Length);

AssertExtensions.SequenceEqual(fileContent, dataStreamContent);
}
}
while ((entry = reader.GetNextEntry()) != null);
}
}
}
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.IO.Enumeration;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
Expand Down Expand Up @@ -269,5 +269,65 @@ public async Task PaxNameCollision_DedupInExtendedAttributesAsync()
Assert.True(File.Exists(path1));
Assert.True(Path.Exists(path2));
}

[Theory]
[MemberData(nameof(GetTestTarFormats))]
public async Task UnseekableStreams_RoundTrip_Async(TestTarFormat testFormat)
{
using TempDirectory root = new();

await using MemoryStream sourceStream = GetTarMemoryStream(CompressionMethod.Uncompressed, testFormat, "many_small_files");
await using WrappedStream sourceUnseekableArchiveStream = new(sourceStream, canRead: true, canWrite: false, canSeek: false);

await TarFile.ExtractToDirectoryAsync(sourceUnseekableArchiveStream, root.Path, overwriteFiles: false);

await using MemoryStream destinationStream = new();
await using WrappedStream destinationUnseekableArchiveStream = new(destinationStream, canRead: true, canWrite: true, canSeek: false);
await TarFile.CreateFromDirectoryAsync(root.Path, destinationUnseekableArchiveStream, includeBaseDirectory: false);

FileSystemEnumerable<FileSystemInfo> fileSystemEntries = new FileSystemEnumerable<FileSystemInfo>(
directory: root.Path,
transform: (ref FileSystemEntry entry) => entry.ToFileSystemInfo(),
options: new EnumerationOptions() { RecurseSubdirectories = true });

destinationStream.Position = 0;
await using TarReader reader = new TarReader(destinationStream, leaveOpen: false);

// Size of files in many_small_files.tar are expected to be tiny and all equal
int bufferLength = 1024;
byte[] fileContent = new byte[bufferLength];
byte[] dataStreamContent = new byte[bufferLength];
TarEntry entry = await reader.GetNextEntryAsync();
do
{
Assert.NotNull(entry);
string entryPath = Path.TrimEndingDirectorySeparator(Path.GetFullPath(Path.Join(root.Path, entry.Name)));
FileSystemInfo fsi = fileSystemEntries.SingleOrDefault(file =>
file.FullName == entryPath);
Assert.NotNull(fsi);
if (entry.EntryType is TarEntryType.RegularFile or TarEntryType.V7RegularFile)
{
Assert.NotNull(entry.DataStream);

await using Stream fileData = File.OpenRead(fsi.FullName);

// If the size of the files in manu_small_files.tar ever gets larger than bufferLength,
// these asserts should fail and the test will need to be updated
AssertExtensions.LessThanOrEqualTo(entry.Length, bufferLength);
AssertExtensions.LessThanOrEqualTo(fileData.Length, bufferLength);

Assert.Equal(fileData.Length, entry.Length);

Array.Clear(fileContent);
Array.Clear(dataStreamContent);

await fileData.ReadExactlyAsync(fileContent, 0, (int)entry.Length);
await entry.DataStream.ReadExactlyAsync(dataStreamContent, 0, (int)entry.Length);

AssertExtensions.SequenceEqual(fileContent, dataStreamContent);
}
}
while ((entry = await reader.GetNextEntryAsync()) != null);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -161,13 +161,18 @@ public void GetNextEntry_CopyDataTrue_UnseekableArchive()
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(new byte[1]));
}

[Fact]
public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions()
[Theory]
[InlineData(TarEntryFormat.V7)]
[InlineData(TarEntryFormat.Ustar)]
[InlineData(TarEntryFormat.Pax)]
[InlineData(TarEntryFormat.Gnu)]
public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions(TarEntryFormat format)
{
MemoryStream archive = new MemoryStream();
using (TarWriter writer = new TarWriter(archive, TarEntryFormat.Ustar, leaveOpen: true))
TarEntryType fileEntryType = GetTarEntryTypeForTarEntryFormat(TarEntryType.RegularFile, format);
using MemoryStream archive = new MemoryStream();
using (TarWriter writer = new TarWriter(archive, format, leaveOpen: true))
{
UstarTarEntry entry1 = new UstarTarEntry(TarEntryType.RegularFile, "file.txt");
TarEntry entry1 = InvokeTarEntryCreationConstructor(format, fileEntryType, "file.txt");
entry1.DataStream = new MemoryStream();
using (StreamWriter streamWriter = new StreamWriter(entry1.DataStream, leaveOpen: true))
{
Expand All @@ -176,30 +181,34 @@ public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions()
entry1.DataStream.Seek(0, SeekOrigin.Begin); // Rewind to ensure it gets written from the beginning
writer.WriteEntry(entry1);

UstarTarEntry entry2 = new UstarTarEntry(TarEntryType.Directory, "dir");
TarEntry entry2 = InvokeTarEntryCreationConstructor(format, TarEntryType.Directory, "dir");
writer.WriteEntry(entry2);
}

archive.Seek(0, SeekOrigin.Begin);
using WrappedStream wrapped = new WrappedStream(archive, canRead: true, canWrite: false, canSeek: false);
UstarTarEntry entry;
TarEntry entry;
byte[] b = new byte[1];
using (TarReader reader = new TarReader(wrapped)) // Unseekable
{
entry = reader.GetNextEntry(copyData: false) as UstarTarEntry;
entry = reader.GetNextEntry(copyData: false);
Assert.NotNull(entry);
Assert.Equal(TarEntryType.RegularFile, entry.EntryType);
Assert.Equal(fileEntryType, entry.EntryType);
entry.DataStream.ReadByte(); // Reading is possible as long as we don't move to the next entry

// Attempting to read the next entry should automatically move the position pointer to the beginning of the next header
Assert.NotNull(reader.GetNextEntry());
TarEntry entry2 = reader.GetNextEntry();
Assert.NotNull(entry2);
Assert.Equal(format, entry2.Format);
Assert.Equal(TarEntryType.Directory, entry2.EntryType);
Assert.Null(reader.GetNextEntry());

// This is not possible because the position of the main stream is already past the data
Assert.Throws<EndOfStreamException>(() => entry.DataStream.Read(new byte[1]));
Assert.Throws<EndOfStreamException>(() => entry.DataStream.Read(b));
}

// The reader must stay alive because it's in charge of disposing all the entries it collected
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(new byte[1]));
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(b));
}

[Theory]
Expand Down
Loading