Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidArchiveError raised for valid tarballs #71

Closed
desilinguist opened this issue Oct 9, 2020 · 16 comments · Fixed by #77
Closed

InvalidArchiveError raised for valid tarballs #71

desilinguist opened this issue Oct 9, 2020 · 16 comments · Fixed by #77
Labels
locked [bot] locked due to inactivity

Comments

@desilinguist
Copy link

desilinguist commented Oct 9, 2020

For some reason, some of my large conda packages are no longer installed properly and fail with InvalidArchiveError. I have narrowed down the problem to conda-package-handling. Here's what I see:

>>> from conda_package_handling.api import extract, libarchive_enabled
>>> libarchive_enabled
True
>>> extract('/tmp/gug-data-1.0-0.tar.bz2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python/conda_default/lib/python3.6/site-packages/conda_package_handling/api.py", line 58, in extract
    SUPPORTED_EXTENSIONS[ext].extract(fn, dest_dir, components=components)
  File "/opt/python/conda_default/lib/python3.6/site-packages/conda_package_handling/tarball.py", line 135, in extract
    _tar_xf(fn, dest_dir)
  File "/opt/python/conda_default/lib/python3.6/site-packages/conda_package_handling/tarball.py", line 88, in _tar_xf
    archive_utils.extract_file(tarball)
  File "/opt/python/conda_default/lib/python3.6/site-packages/conda_package_handling/archive_utils.py", line 15, in extract_file
    raise InvalidArchiveError(tarball, error_str.decode('utf-8'))
conda_package_handling.exceptions.InvalidArchiveError: Error with archive /tmp/gug-data-1.0-0.tar.bz2.  You probably need to delete and re-download or re-create this file.  Message from libarchive was:

Seek failed

This happens on a RHEL 7.9 machine with conda-package-handling v1.6.1. However, I am also able to reproduce the same issue on my MacBook Pro (macOS 10.14.6) with conda-package-handling v1.7.0. If I manually delete site-packages/conda_package_handling/archive_utils.py and site-packages/conda_package_handling/archive_utils.XXX.so from my conda environment to make it backoff to tarfile, the above snippet works fine without any errors.

>>> from conda_package_handling.api import extract, libarchive_enabled
>>> libarchive_enabled
False
>>> extract('/tmp/gug-data-1.0-0.tar.bz2')
>>> 

Here are the contents of gug-data-1.0-0.tar.bz2:

        84 Feb  4  2015 info/files
       140 Dec 31  1969 info/index.json
       482 Feb  4  2015 info/recipe.json
       232 Feb  4  2015 info/recipe/build.sh
       314 Feb  4  2015 info/recipe/meta.yaml
        68 Feb  4  2015 info/recipe/meta.yaml.orig
2241453926 Feb  4  2015 share/giga3gram.lm.binary
2145097736 Feb  4  2015 share/ngrams_gigaword_afp_apw_cna_ltw_nyt_xin.marisa_trie

Any ideas what could be happening? The same conda package was being installed just fine a few days ago.

@desilinguist
Copy link
Author

Interesting, it looks like downgrading to v1.6.0 works as well. So some change was introduced starting in 1.6.1 that broke something.

@desilinguist
Copy link
Author

Bump. Any updates on this?

@nehaljwani
Copy link
Contributor

@desilinguist Could you attach a zip of gug-data-1.0-0.tar.bz2 to this issue?

@desilinguist
Copy link
Author

Hi @nehaljwani, that file is about 3G in size. Do you really want me to attach it here?

@nehaljwani
Copy link
Contributor

If you could provide any tarball to reproduce the issue, it would help!

@desilinguist
Copy link
Author

desilinguist commented Oct 19, 2020

Hi @nehaljwani, I uploaded the tarball to my Google Drive. Here's the link. Please let me know once you have downloaded it so I can remove it from my drive. Thanks!

@desilinguist
Copy link
Author

Hi @nehaljwani, please confirm that you are able to access the file so I can remove it.

@desilinguist
Copy link
Author

Any progress on this issue? @nehaljwani

@seemethere
Copy link
Contributor

We're also currently experiencing this with nightlies from pytorch.

You can access them for testing with:

curl -fL -O https://9581561-65600975-gh.circle-artifacts.com/0/final_pkgs/pytorch-1.8.0.dev20201214-py3.6_cuda11.0.221_cudnn8.0.5_0.tar.bz2

seemethere added a commit to pytorch/pytorch that referenced this issue Dec 15, 2020
There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
above a certain size fail out when attempting to extract
see: conda/conda-package-handling#71

Signed-off-by: Eli Uriegas <[email protected]>

[ghstack-poisoned]
facebook-github-bot pushed a commit to pytorch/pytorch that referenced this issue Dec 16, 2020
Summary:
Pull Request resolved: #49434

There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
above a certain size fail out when attempting to extract
see: conda/conda-package-handling#71

coincides with pytorch/builder#611

Signed-off-by: Eli Uriegas <[email protected]>

Test Plan: Imported from OSS

Reviewed By: xuzhao9, janeyx99, samestep

Differential Revision: D25573390

Pulled By: seemethere

fbshipit-source-id: 82173804f1b30da6e4b401c4949e2ee52065e149
seemethere added a commit to seemethere/audio that referenced this issue Dec 18, 2020
There's an issue with the CUDA 11.0 package for conda with
conda-package-handling that relates to conda/conda-package-handling#71

Signed-off-by: Eli Uriegas <[email protected]>
seemethere added a commit to seemethere/text that referenced this issue Dec 18, 2020
Package installation for pytorch binaries was failing out due to
conda/conda-package-handling#71

Default to cpuonly since that won't fail out.

Signed-off-by: Eli Uriegas <[email protected]>
seemethere added a commit to seemethere/text that referenced this issue Dec 18, 2020
Package installation for pytorch binaries was failing out due to
conda/conda-package-handling#71

Default to cpuonly since that won't fail out.

Signed-off-by: Eli Uriegas <[email protected]>
mthrok pushed a commit to pytorch/audio that referenced this issue Dec 21, 2020
There's an issue with the CUDA 11.0 package for conda with
conda-package-handling that relates to conda/conda-package-handling#71

This should fix issues with the conda smoke testing we encountered previously.

Signed-off-by: Eli Uriegas <[email protected]>
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this issue Jan 6, 2021
Summary:
Pull Request resolved: pytorch#49434

There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
above a certain size fail out when attempting to extract
see: conda/conda-package-handling#71

coincides with pytorch/builder#611

Signed-off-by: Eli Uriegas <[email protected]>

Test Plan: Imported from OSS

Reviewed By: xuzhao9, janeyx99, samestep

Differential Revision: D25573390

Pulled By: seemethere

fbshipit-source-id: 82173804f1b30da6e4b401c4949e2ee52065e149
@mattip
Copy link

mattip commented Jan 25, 2021

The stacktrace seems to point to something in the c-extension module for archive handling.

Digging around a bit, I see that the 1.7.0 tag mentions "Updating libarchive-static to 3.4.2", but I could not find mention of this in the changelog, which seems to skip 1.7.0 entirely. Maybe that was a typo, since the conda recipe seems to link to libarchive-static 3.3.0. Another change to the archive handling in the past 12 months was the fix to #61, which added checks to strdup.

@mattip
Copy link

mattip commented Jan 25, 2021

When you downgrade conda-package-handling to 1.6.0, does it also downgrade any other packages?

@desilinguist
Copy link
Author

@mattip I don’t think so but I use the --force flag when downgrading to be on the safe side.

leej3 added a commit to leej3/conda-package-handling that referenced this issue Feb 23, 2021
@desilinguist
Copy link
Author

Is this issue being investigated at all?

@rgommers
Copy link

@desilinguist yes, see the PR from @leej3 linked right above this issue.

@desilinguist
Copy link
Author

desilinguist commented Feb 23, 2021

Oh, that's great! Sorry I totally missed that. Looking forward to the new release with the fixes. I am glad that the tarball I provided seems to have been useful in fixing the issue.

janeyx99 added a commit to pytorch/builder that referenced this issue Jul 22, 2021
* Parallelize nvcc for windows conda 11.3

* This also unpins conda-package-handling, as the 1.6 version was no longer compatible with python 3.9, and the updated versions resolve previous issues we had with 1.6.1. (see conda/conda-package-handling#71 for more context).
@github-actions
Copy link

Hi there, thank you for your contribution!

This issue has been automatically locked because it has not had recent activity after being closed.

Please open a new issue if needed.

Thanks!

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Mar 31, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants