Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated TAR archive error during decompressing tar file #20269

Closed
meteorcloudy opened this issue Nov 20, 2023 · 17 comments
Closed

Truncated TAR archive error during decompressing tar file #20269

meteorcloudy opened this issue Nov 20, 2023 · 17 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@meteorcloudy
Copy link
Member

Description of the bug:

Context: #20090 (comment)

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Can be reproduced on macOS with the same repo as #20090 (comment)

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@meteorcloudy meteorcloudy added P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed untriaged labels Nov 20, 2023
@meteorcloudy meteorcloudy changed the title Truncated TAR archive Truncated TAR archive during decompressing tar file Nov 20, 2023
@meteorcloudy meteorcloudy changed the title Truncated TAR archive during decompressing tar file Truncated TAR archive error during decompressing tar file Nov 20, 2023
@meteorcloudy
Copy link
Member Author

@bazel-io fork 7.0.0

@meteorcloudy
Copy link
Member Author

I can confirm this still happens even if upgrading commons-compress to the latest version (1.25.0)

@meteorcloudy
Copy link
Member Author

/cc @tjgq @Wyverald

@meteorcloudy
Copy link
Member Author

The error is from https://github.com/search?q=repo%3Aapache%2Fcommons-compress+%22Truncated+TAR+archive%22&type=code, could there be an actual problem with the tar file?

@Wyverald
Copy link
Member

I can confirm this is an issue, but having spent a fair chunk of time trying to understand the TAR format, I can only deduce that the issue stems from somewhere within the Apache Commons compress library. In any case, this wouldn't be a 7.0.0 regression; I'm pretty sure that we never supported sparse TARs. So I'm inclined to treat this as a "soft blocker" -- that is, if all non-soft blockers are resolved, we should release 7.0.0 and look to maybe resolve this in a patch release.

could there be an actual problem with the tar file?

GNU tar extracts the file just fine, so I'd say this is some feature disparity in the Java library.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Nov 21, 2023
@meteorcloudy
Copy link
Member Author

@FrancoisPoinsot since the root cause lies in commons-compress, there is little we can do in Bazel without a upstream fix. I'll have to downgrade this to P2 and remove it as a release blocker for 7.0

@alexeagle
Copy link
Contributor

@meteorcloudy is there an issue filed on commons-compress for this? Do you need community help to file that issue with a minimal repro? I'd really like to see the upstream maintainers response to this.

As this was bumped from Bazel 7 I'm now going to be forced to add repository rules to call BSD tar to replace Bazel's extract logic, which will be some sad, long-lived tech debt :(

@meteorcloudy
Copy link
Member Author

is there an issue filed on commons-compress for this?

I tried, but didn't find any relevant issue.

Do you need community help to file that issue with a minimal repro? I'd really like to see the upstream maintainers response to this.

Yes, that would be very helpful! I'm currently stressed by some CI issues, unfortunately.

@FrancoisPoinsot
Copy link

@FrancoisPoinsot since the root cause lies in commons-compress, there is little we can do in Bazel without a upstream fix. I'll have to downgrade this to P2 and remove it as a release blocker for 7.0

As far as I know, the problem is not new to 7.0.0.
I can confirm it was also present in 6.x.

@FrancoisPoinsot
Copy link

My current workaround is to extract the file using tar command and reference the extracted file using an http_file rule.

@alexeagle
Copy link
Contributor

alexeagle commented Nov 21, 2023

Repro is trivial:

#!/usr/bin/env bash

set -o errexit -o nounset

echo "Downloading commons-compress"
wget https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.25.0/commons-compress-1.25.0.jar
echo "Downloading sample sparse archive"
wget https://github.com/astral-sh/ruff/releases/download/v0.1.6/ruff-aarch64-apple-darwin.tar.gz
gunzip ruff-aarch64-apple-darwin.tar.gz

echo "Testing with system tar"
tar -tf ruff-aarch64-apple-darwin.tar
echo "Testing with commons-compress"
java -jar commons-compress-1.25.0.jar ruff-aarch64-apple-darwin.tar

->

Testing with system tar
ruff
Testing with commons-compress
Analysing ruff-aarch64-apple-darwin.tar
Created org.apache.commons.compress.archivers.tar.TarArchiveInputStream@17f052a3
ruff
Exception in thread "main" java.io.IOException: Truncated TAR archive
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:694)
        at org.apache.commons.compress.utils.IOUtils.readFully(IOUtils.java:244)
        at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:355)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:451)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:426)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextEntry(TarArchiveInputStream.java:50)
        at org.apache.commons.compress.archivers.Lister.listStream(Lister.java:79)
        at org.apache.commons.compress.archivers.Lister.main(Lister.java:133)

The hard part is getting into my Jira account on the Apache foundation to file it. @tjgq do you have an account there to file it at https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-598?filter=allopenissues ? You're probably the better reporter as you've been doing the coding.

@rbtcollins
Copy link

@Wyverald
Copy link
Member

https://issues.apache.org/jira/browse/COMPRESS-124 seems relevant

This seems to be about the originally missing support for sparse tarballs altogether. Our issue is more about the newly added support potentially having bugs.


I tried to sign up for a Jira account, which apparently requires human review and could take a few days. In the meantime, I sent an email to the mailing list ([email protected]); let's see if anyone picks it up.

@Wyverald
Copy link
Member

@keith
Copy link
Member

keith commented Dec 14, 2023

As a workaround you can do:

http_file(
    name = "ruff_macos",
    sha256 = "263d8ec3fd317b47dfefeae84d96e1894f87526f788394df59a0c6b013dac5d7",
    url = "https://github.com/astral-sh/ruff/releases/download/v0.1.8/ruff-0.1.8-x86_64-apple-darwin.tar.gz",
)

and then:

genrule(
    name = "ruff_bin",
    srcs = ["@ruff_macos//file"],
    outs = ["ruff-bin"],
    cmd = "tar -xvf $< && mv ruff $@",
)

since macOS tar handles this fine

@alexeagle
Copy link
Contributor

Thanks Keith, I should have commented here that I worked around it in rules_lint in that way: https://github.com/aspect-build/rules_lint/pull/66/files#diff-88872655967d360b7907682cbc2461f815c86c2940469330183be99e6f1b3ec2R129-R137

Wyverald pushed a commit that referenced this issue May 8, 2024
Fixes #20269.

Update commons-compress to 1.26.1 and swap use of GZIPInputStream to commons-compress' GzipCompressorInputStream, which [deals correctly with concatenated gz files](https://github.com/apache/commons-compress/blob/53c5e19208caaf63946a41d2763cda1f1b7eadc8/src/main/java/org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream.java#L38-L70). Add a test to demonstrate this fixes the ruff extraction (thanks, fmeum) and update all related lockfiles.

Closes #22213.

PiperOrigin-RevId: 631509796
Change-Id: I4038244bfbdfbace747554e988587663ca580c16
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
Fixes bazelbuild#20269.

Update commons-compress to 1.26.1 and swap use of GZIPInputStream to commons-compress' GzipCompressorInputStream, which [deals correctly with concatenated gz files](https://github.com/apache/commons-compress/blob/53c5e19208caaf63946a41d2763cda1f1b7eadc8/src/main/java/org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream.java#L38-L70). Add a test to demonstrate this fixes the ruff extraction (thanks, fmeum) and update all related lockfiles.

Closes bazelbuild#22213.

PiperOrigin-RevId: 631509796
Change-Id: I4038244bfbdfbace747554e988587663ca580c16
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.2.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.2.0rc1. Thanks!

matsubara0507 added a commit to matsubara0507/rules_elm that referenced this issue Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants