Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build conda nightlies jobs are failing on main for aarch64 #659

Closed
raulcd opened this issue Apr 29, 2024 · 3 comments · Fixed by #689
Closed

Build conda nightlies jobs are failing on main for aarch64 #659

raulcd opened this issue Apr 29, 2024 · 3 comments · Fixed by #689
Labels
bug Something isn't working

Comments

@raulcd
Copy link
Member

raulcd commented Apr 29, 2024

Describe the bug
The aarch 64 jobs for conda nightlies are failing with:

Conda detected a mismatch between the expected content and downloaded content

See:

To Reproduce

  • trigger CI

Expected behavior
Jobs succeed and build for aarch64

Additional context
Add any other context about the problem here.

@raulcd raulcd added the bug Something isn't working label Apr 29, 2024
@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented May 3, 2024

Taking a look at the logs, a few things look off.

  1. conda-forge/linux-64 seems like it's the wrong cache for the linux-aarch64 job.
  2. warning libmamba Cache file was modified by another program makes me think concurrent jobs are modifying the same cache
  3. rust-std-x86_64 also seems like the wrong download, and that's the one that causes the mismatched hash error.
Attempting to finalize metadata for datafusion
conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
Reloading output folder: 
/home/runner/work/datafusion-python/datafusion-python/packages
warning  libmamba Cache file "/home/runner/conda_pkgs_dir/cache/09cdf8bf.json" was modified by another program

...

Conda detected a mismatch between the expected content and downloaded content
for url 'https://conda.anaconda.org/conda-forge/noarch/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda'.
  download saved to: /home/runner/conda_pkgs_dir/rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda
  expected sha256: a482597672076f47c83d0dd3f204eb437007b99ada4d630d56fa64b4b193c5db
  actual sha256: 73f7537db6bc0471135a85a261798abe77e7e83794f945a0355c4068973f31f6

The things I'd like to try, in order.

  1. set the concurrency to hard lock of 1 conda job at a time.
  2. upgrade miniconda action (v3 has automatic aarch detection) EDIT: looks like there's a PR for that and it failed build(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 #658
  3. blow out the conda and start fresh

Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 3, 2024
The builds are failing for `aarch64`, and one log message hints that concurrent builds are messing with the same cache.

Ref: apache#659
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 6, 2024
The `actual` sha256 hashes match both what I calculate by downloading and running `sha256sum` and what is posted on conda-forge.

I suspect then that our build is using some bad cached value as the "expected".

conda-forge: https://conda.anaconda.org/conda-forge/noarch/

Ref: apache#659
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 6, 2024
The builds are failing for `aarch64`, and one log message hints that concurrent builds are messing with the same cache.

Ref: apache#659
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 6, 2024
The `actual` sha256 hashes match both what I calculate by downloading and running `sha256sum` and what is posted on conda-forge.

I suspect then that our build is using some bad cached value as the "expected".

conda-forge: https://conda.anaconda.org/conda-forge/noarch/

Ref: apache#659
@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented May 6, 2024

Investigating further. The actual sha256sum that the CI report matches both what I calculate when downloading the files and what conda-forge lists.

file: rust-std-aarch64-unknown-linux-gnu-1.77.2-hbe8e118_0.conda
sha256sum: 9d583f04bfdbccc82ac2f0653de571f8371df04633727e714b71efd7e4a0140a

file:  rust-std-x86_64-unknown-linux-gnu-1.77.2-h2c6d0dc_0.conda
sha256: 73f7537db6bc0471135a85a261798abe77e7e83794f945a0355c4068973f31f6

So, I tried cleaning out the cache, but that only caused the builds to break in a new way...

 Adding in variants from internal_defaults
Adding in variants from config.variant
Adding in variants from argument_variants
Error: bad character '-' in package/version: publish-docs.1a240507

Closing the PR for now.

@Michael-J-Ward
Copy link
Contributor

Ah, publish-docs is the tag @andygrove used to trigger the docs generation & publication.

Apparently, conda doesn't like that for a package/version.

Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 13, 2024
The `actual` sha256 hashes match both what I calculate by downloading and running `sha256sum` and what is posted on conda-forge.

I suspect then that our build is using some bad cached value as the "expected".

conda-forge: https://conda.anaconda.org/conda-forge/noarch/

Ref: apache#659
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 13, 2024
The `actual` sha256 hashes match both what I calculate by downloading and running `sha256sum` and what is posted on conda-forge.

I suspect then that our build is using some bad cached value as the "expected".

conda-forge: https://conda.anaconda.org/conda-forge/noarch/

Ref: apache#659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants