Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35658: [Packaging] Sync conda recipes with feedstocks #35637

Merged
merged 10 commits into from
Jul 26, 2023

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented May 17, 2023

Corresponds to conda-forge/arrow-cpp-feedstock#1053 for pyarrow (one failing test on our infra that needs debugging), as well as the state of the feedstock for r-arrow after conda-forge/r-arrow-feedstock#65

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@h-vetinari
Copy link
Contributor Author

@github-actions crossbow submit -g conda

@github-actions
Copy link

Revision: 8ef347b

Submitted crossbow builds: ursacomputing/crossbow @ actions-ca516ae62e

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r41 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r41 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r41 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r41 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@h-vetinari
Copy link
Contributor Author

Interestingly, the failure of test_total_bytes_allocated on linux from conda-forge/arrow-cpp-feedstock#1053 is not present here. But a weird error on windows:

E   pyarrow.lib.ArrowInvalid: Cannot locate timezone 'UTC': Timezone database not found at "C:\Users\VssAdministrator\Downloads\tzdata"

@kou
Copy link
Member

kou commented May 17, 2023

I don't know why we didn't need timezone database for Windows jobs before but we have a shell script to download timezone database: https://github.com/apache/arrow/blob/main/ci/scripts/download_tz_database.sh

We're using the script on Windows too.

@h-vetinari h-vetinari changed the title Sync conda recipes with feedstocks GH-35658: Sync conda recipes with feedstocks May 18, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35658 has been automatically assigned in GitHub to PR creator.

@h-vetinari
Copy link
Contributor Author

I don't know why we didn't need timezone database for Windows jobs before but we have a shell script to download timezone database:

Sounds like we should just add a dependency on the tzdata conda-forge package?

@assignUser
Copy link
Member

Sounds like we should just add a dependency on the tzdata conda-forge package?

That sound's good. Thanks for taking care of the sync work, much appreciated!

@kou
Copy link
Member

kou commented May 18, 2023

It may not work because it seems that tzdata doesn't include WindowsZones.xml.

@h-vetinari
Copy link
Contributor Author

It may not work because it seems that tzdata doesn't include WindowsZones.xml.

Why does arrow depend on this in the first place? It's the first package I see that doesn't work with tzdata. Also, based on the error message, it seems to be looking in the wrong place? I.e. C:\Users\VssAdministrator\Downloads\tzdata instead of %LIBRARY_PREFIX% (aside from the fact that inndeed, our tzdata does not contain WindowsZones.xml).

The tzdata stuff is an arrow 13 problem though. More urgently, in conda-forge/arrow-cpp-feedstock#1053, I currently have a failing test test_total_bytes_allocated on arrow 12 (though not arrow 11 or before), but some commit between arrow 12 and current main seems to have fixed this - any ideas?

@raulcd
Copy link
Member

raulcd commented May 18, 2023

As a note, our nightly builds for wheels on Windows are also failing at the moment with the same error:

 E   pyarrow.lib.ArrowInvalid: Cannot locate timezone 'UTC': Timezone database not found at "C:\Users\ContainerAdministrator\Downloads\tzdata"

The first failure is from 10 days ago since this commit was introduced:
053b5ee

@kou
Copy link
Member

kou commented May 19, 2023

Ah, sorry. I misread this problem.

It seems that WindowsZones.xml isn't related to this problem.

But tzdata path was hard-coded on Windows: https://github.com/apache/arrow/blob/main/cpp/src/arrow/vendored/datetime/tz.cpp#L265

So our code will not find tzdata installed by conda.

Could you open a new issue for tzdata related problem? We can work on this as a separated issue.

@raulcd
Copy link
Member

raulcd commented May 19, 2023

The datetime vendored library seems to have had a change lately: https://github.com/apache/arrow/pull/35612/files
And is quite important for the R release to be packed for 12.0.1. I am wondering whether this issue will have to be fixed for the 12.0.1 release too.

It won't as this test was added on a different ticket that won't be added to 12.0.1 .

edit: strikethrough

@h-vetinari
Copy link
Contributor Author

h-vetinari commented May 24, 2023

The tzdata stuff is an arrow 13 problem though. More urgently, in conda-forge/arrow-cpp-feedstock#1053, I currently have a failing test test_total_bytes_allocated on arrow 12 (though not arrow 11 or before), but some commit between arrow 12 and current main seems to have fixed this - any ideas?

As I mentioned, the tzdata thing is much less urgent for us in conda-forge than the issue of that failing tests on the feedstock. I now opened a separate issue so this gets (hopefully...) more traction: #35728

@jorisvandenbossche
Copy link
Member

Why does arrow depend on this in the first place? It's the first package I see that doesn't work with tzdata.

Arrow unfortunately can't work with the tz database from tzdata, because the underlying date library we use doesn't support the binary format, see #31472 and linked issues.

In other places we skip tests that requite tz database access on Windows, so that should be done for the new scalar test as well, I assume.

@jorisvandenbossche
Copy link
Member

I opened #35735 to skip the test requiring a tz database on windows.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented May 24, 2023

Arrow unfortunately can't work with the tz database from tzdata, because the underlying date library we use doesn't support the binary format, see #31472 and linked issues.

I don't mind shipping arrow's vendored WindowsZones.xml in a pyarrow-specific location (i.e. site-packages/pyarrow; we can patch pyarrow to point to the right location). I think it would be better to do that than just fail? timezone-operations are pretty common (unless I misunderstand something), so not supporting that for the conda-forge builds would suck IMO

@h-vetinari
Copy link
Contributor Author

One issue with vendoring is that this doesn't make it easy to update the version / ensure you always have a recent tz database, also when installing an somewhat older pyarrow version.

Absolutely, that's why it would be nice if this could be handled through a "proper" dependency like tzdata. I haven't dived deep into #31472, but I don't understand why the format that tzdata has couldn't be transformed or otherwise made to work for arrow?

@jorisvandenbossche
Copy link
Member

That's probably technically possible to implement such transformation, but someone still needs to do the work for that. A potential easier path forward is to update date.h to work with the compiled database on Windows (HowardHinnant/date#564)

@h-vetinari
Copy link
Contributor Author

@github-actions crossbow submit -g conda

@@ -313,7 +323,7 @@ outputs:
- {{ pin_subpackage('pyarrow', exact=True) }}
- clangdev {{ llvm_version }}
- llvmdev {{ llvm_version }}
- cython
- cython <3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to wait for #36745 (and backport it 12.0.1), but that PR has not landed as quickly as I hoped.

@github-actions
Copy link

Revision: e9e6605

Submitted crossbow builds: ursacomputing/crossbow @ actions-b1a7f0eaed

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r41 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r41 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r41 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r41 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 24, 2023
@h-vetinari
Copy link
Contributor Author

@github-actions crossbow submit -g conda

@github-actions
Copy link

Revision: 17fc588

Submitted crossbow builds: ursacomputing/crossbow @ actions-dd4f8afbf0

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r41 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r41 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r41 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r41 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jul 25, 2023

@kou @pitrou @raulcd @assignUser @jorisvandenbossche
This is green now and matches the state of the feedstocks. There are some more minor improvements on the way (e.g. a few - though not all - of the test skips can be removed), but I can also put those in the next sync PR after arrow 13 is officially out.

@kou kou changed the title GH-35658: Sync conda recipes with feedstocks GH-35658: [Packaging] Sync conda recipes with feedstocks Jul 25, 2023
Copy link
Member

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks @h-vetinari ! I will merge in a day or so if there are not objections

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

sed -ie "s;protoc-gen-grpc.*$;protoc-gen-grpc=${BUILD_PREFIX}/bin/grpc_cpp_plugin\";g" ../src/arrow/flight/CMakeLists.txt
sed -ie 's;"--with-jemalloc-prefix\=je_arrow_";"--with-jemalloc-prefix\=je_arrow_" "--with-lg-page\=14";g' ../cmake_modules/ThirdpartyToolchain.cmake
sed -ie 's;"--with-jemalloc-prefix\=je_arrow_";"--with-jemalloc-prefix\=je_arrow_" "--with-lg-page\=16";g' ../cmake_modules/ThirdpartyToolchain.cmake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use -DARROW_JEMALLOC_LG_PAGE=16 instead of rewriting ThirdpartyToolchain.cmake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great, I'll change that on the feedstock and in the next sync

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jul 26, 2023
@kou kou merged commit 8503c86 into apache:main Jul 26, 2023
@kou kou removed the awaiting merge Awaiting merge label Jul 26, 2023
@h-vetinari h-vetinari deleted the conda_update branch July 26, 2023 23:43
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8503c86.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…e#35637)

Corresponds to conda-forge/arrow-cpp-feedstock#1053 for pyarrow (one failing test on our infra that needs debugging), as well as the state of the feedstock for r-arrow after conda-forge/r-arrow-feedstock#65
* Closes: apache#35658

Authored-by: H. Vetinari <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…e#35637)

Corresponds to conda-forge/arrow-cpp-feedstock#1053 for pyarrow (one failing test on our infra that needs debugging), as well as the state of the feedstock for r-arrow after conda-forge/r-arrow-feedstock#65
* Closes: apache#35658

Authored-by: H. Vetinari <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Packaging] Sync with conda-forge feedstocks
5 participants