[release/7.0] Tar: Remove invalidation of whitespace in PAX extended attributes #78785
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual backport of #78465, #78707 and #78744 to
release/7.0
.Customer Impact
A customer reported #78456, in which they described that when a PAX entry in a tar archive contained extended attributes with whitespace characters, these characters would get treated as the end of the key or value, or would get trimmed.
The bug surfaced when extracting an archive that contained entries with filenames longer than what could fit in the standard
name
metadata field. In these cases, the expected behavior is to have the full path name added to the extended attributes dictionary as the value of thepath
key, and the standardname
field is ignored.But when the path contains spaces, we were considering the space as the ending character of the value. If two entries with very long paths get truncated in the same space, then when we extracted them, the file path would be the same, causing our
TarFile
extraction methods to think we had an existing file in disk, and we would throw.The fix consisted in removing the logic that ignored or trimmed the whitespace in the middle of the path. We already know how long the key or the value in the dictionary is supposed to be, so there's no need to truncate.
Also, the Tar specs did not explicitly specify that spaces should be disallowed in keys or values, so we are providing more usage flexibility to users.
Testing
Added unit tests to verify:
Risk
Low. This is a bug in a new feature in 7.0, and we are removing logic that is not explicitly indicated in the Tar spec.