-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow spaces in octal attributes #74358
Allow spaces in octal attributes #74358
Conversation
Tagging subscribers to this area: @dotnet/area-system-io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, @am11. Left a few notes/questions.
@@ -208,6 +208,8 @@ internal static TarEntryType GetCorrectTypeFlagForFormat(TarEntryFormat format, | |||
T value = T.Zero; | |||
foreach (byte b in buffer) | |||
{ | |||
if (b == (byte)' ') continue; // skip space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes:
- You could avoid the cast by using 0x20 (or 32 in decimal). The comment can stay since it clarifies it's a space.
- We should add a note explaining why space can be allowed here (some exotic archives might contain entries where the octal value contains unexpected spaces).
- In the tar archives that failed with this error, did you notice that the space was in the middle of the octal number? If not, maybe spaces are only fine to be detected in the edges of the span. I'm conflicted because this could mean complicating the method. What do you think?
- Which octal field threw this exception? I think we should only limit allowing spaces to the fields that failed. I do not like the idea of allowing spaces in all octal fields. But will await for your confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(byte)' '
cast incurs no cost. In fact, it is the preferred style used extensively in this repo for readability (see e.g. two lines below in the same file).
In main
branch, we are already ignoring trailing nulls and spaces in all octal fields. while other implementations ignore both the leading and trailing spaces. I have not found any implementation which is allowing spaces in the middle of the value, so I agree that we should do something about it.
Given:
null null <space> null null 44 null 61 null null
null null <space> a b c null null 44 null 61 null null
lib-archive
octal parser parses 44 (octal 36) for both, while busybox tar parses first one as 44 and second one as 0. Note that it applies to all header attributes in both implementations.
I am leaning to follow lib-archive's behavior rather than busybox's, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlossanlop, I went ahead and made the change.
what do you think?
From other comments, I assume your reply is you don't like libarchive's behavior, neither busybox's nor GNU tar's. Could you point me to the spec which suggests that we should only trim the space and null from the end of octal values, but not the beginning? i.e. why is this acceptable in the main
branch
buffer = TrimEndingNullsAndSpaces(buffer); |
but not when these bytes appear at the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those tar implementations are way too flexible, they are basically ignoring what the spec describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consulted these sources:
https://www.freebsd.org/cgi/man.cgi?tar(5)
https://www.gnu.org/software/tar/manual/html_node/tar_toc.html#SEC_Contents
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html
https://www.ibm.com/docs/en/zos/2.3.0?topic=descriptions-pax-interchange-portable-archives
@@ -20,19 +20,6 @@ public void MalformedArchive_TooSmall() | |||
Assert.Throws<EndOfStreamException>(() => reader.GetNextEntry()); | |||
} | |||
|
|||
[Fact] | |||
public void MalformedArchive_HeaderSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these tests removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I am concerned that we now allow a whole header to contain invalid data. I do not think we want this level of flexiblity.
The test that is being removed would've thrown at some point due to having 512 consecutive 0x1 bytes. Let's analyze each one of the fields in the header:
From V7:
- name: This can be assumed to be valid since we store it as string and don't validate the value until we extract.
- mode, uid, guid, size, mtime, checksum: they all expect octal numbers. With your change, they would not throw if there are non-octal chars.
- linkname: Same as name.
From Ustar and Gnu:
- typeflag: We should be ok with any character here, even if it's not among the ones defined in our enum (
this should throw right now, but it's a bug unrelated to this change, we should allow any ASCII characterVerified, we allow values outside of the enum). - magic: this should throw. We only have three possible values: either it's all nulls, or it's the Ustar magic, or it's the Gnu magic.
- version: Similar to magic.
- uname/gname: would not throw, we store this as a string.
- devmajor/devminor: Octal, this would pass with your change.
- prefix: Same as name and linkname.
Then since the size had some data, but we didn't find any characters, we would set size as 0 with your change. So no data.
if (ds != null && ds.Length > 0) | ||
{ | ||
using var memoryStream = new MemoryStream(); | ||
ds.CopyTo(memoryStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is being done with the memory stream. What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into the implementation of the stream that Tar returns; I read it into a dummy stream to ensure it could successfully read to the end. It seems a reasonable small check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah alright, that's fine. Although we already have tests that verify both seekable and unseekable data streams can be read.
if (entry.EntryType == TarEntryType.Directory) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory entry check is unnecessary if the entry.DataStream is already being checked for null a few lines below.
[Theory] | ||
[InlineData("node-tar")] | ||
[InlineData("tar-rs")] | ||
public void Read_TestArchives_Successfully(string subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be improved so that we can investigate failures much more easily.
Instead of iterating through the files inside the test method, you can pass the file names as a MemberData. Plus they can be reused.
public static IEnumerable<object[]> GetNodeTarFiles()
{
string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", "note-tar");
foreach (string file in Directory.EnumerateFiles(nodeTarPath, "*.tar", SearchOption.AllDirectories))
{
yield return new object[] { file };
}
}
public static IEnumerable<object[]> GetTarRsFiles()
{
string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", "tar-rs");
foreach (string file in Directory.EnumerateFiles(nodeTarPath, "*.tar", SearchOption.AllDirectories))
{
yield return new object[] { file };
}
}
[Theory]
[MemberData(nameof("GetNodeTarFiles"))]
[MemberData(nameof("GetTarRsFiles"))]
public void Read_TestArchives_Successfully(string subset)
Then you can remove the forloop inside the test method.
if (digit >= 8) | ||
{ | ||
ThrowInvalidNumber(); | ||
} | ||
if (b < (byte)'0' || b > (byte)'7') break; | ||
|
||
value = checked((value * octalFactor) + T.CreateTruncating(digit)); | ||
value = checked((value * octalFactor) + T.CreateTruncating((uint)(b - '0'))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which assets have octal fields with non-octal characters on either end? I'm concerned that we are basically allowing garbage and not respecting the tar spec anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's not anywhere intentionally be more lenient than we discover existing tools are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danmoseley, it is exactly as lenient as libarchive's octal parsing, it allows any non-octal character at the beginning and end.
GNU tar and busybox implementations allow both leading and trailing whitespaces and nulls.
.NET's implementation is the strictest one, we are only ignoring null and spaces when they appear at the end, and we do not allow them at the beginning.
I haven't find formal specification calling out for and against these behaviors, so it is up to us to decide which existing implementation to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ignoring nulls and spaces is not the same as ignoring any character that is either smaller than '0' or larger than '7'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libarchive ignores any non-octal character. :)
https://github.com/libarchive/libarchive/blob/9e5279615033fa073d0bbae83b366359eab3b62f/contrib/untar.c#L45
if this is too lenient, I can confine it to only allow nulls and spaces.
ex = Record.Exception(() => entry.Length); | ||
Assert.Null(ex); | ||
Assert.NotNull(entry.Name); | ||
Assert.True(Enum.IsDefined(entry.EntryType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail if an entry of one of the node-tar and tar-rs asserts has an entry type that is not in our TarEntryType enum. I checked, and we pass the value directly without validating it is among the predefined enums:
runtime/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Line 378 in d80d71f
typeFlag: (TarEntryType)buffer[FieldLocations.TypeFlag]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those files have known entry types, so this assert is valid for the current set of assets. If in the future, we ever updated assets in those directories from upstream into dotnet/runtime-assets, and those new files have some unknown entry types (very unlikely), we can adjust the code accordingly.
842dc3f
to
96a88cc
Compare
Of those assets which fail when their octal fields have unexpected characters, do we know what tool was used to create them? |
Closing this in favor of #74412 . |
Contributes to #74316