-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUILD: missing test data in 2.1.0 sdist/install #54907
Comments
Ah, sorry, I was wrong. I was looking at the git repo and not sdist — the files are missing from sdist, so I guess that's why they aren't in the wheel either. |
Thanks for the report. This was done (silently) to shrink down the wheel size #54052 The "public" way to run the tests from the install is to call cc @lithomas1 |
Yes, you'll need to pass IIRC, the data files for IO were never shipped, but we recently changed things to error on default without the data files. |
I can understand shrinking wheel sizes but what I don't really understand is why you're also stripping it from GitHub archives that are our last fallback for when sdists are unsuitable for testing. |
The Github archive is the sdist, or it should be at least. (see Is there a reason you can't pass (You might want to consider building from the git tag of the release then, if you need all the files.) |
No it is not. See #54903 And more specifically, before 2.1 the Github archive contained the test data while the PyPI published sdist does not. |
I think we are mentioning different things here. In the assets tab of the release, are you talking about the I am talking about the first one. |
The second one is not something that I control IIUC. |
Check the contents. The second one does not have I guess it is triggered by the change of .gitattributes. |
But it is the "github archive", we are talking about. It was our fallback to the sdist published on PyPI and as asset on github ("the first one") |
Sorry, I deleted my previous comment, I wanted to expand on it more. I checked the How are you building pandas? Also, you mention the Github archive is a fallback, is there something wrong with the sdist on PyPI? |
That's the problem: The direct download link is the same as the "Source code" on the release page: https://github.com/pandas-dev/pandas/archive/refs/tags/v2.1.0.zip does not contain the data. Only a proper git clone will have it.
The directories in it are empty.
Only if there is a setup.py for versioneer to use. But that one is also missing.
Yes, it lacks the test data. We distribution packagers need to run the test suite as completely as possible in order to ensure package integrity |
@mgorny: https://github.com/gentoo/gentoo/blob/627f40ace2ae5cbeb2d6d82ada0cd6502286a429/dev-python/pandas/pandas-2.1.0.ebuild |
The version from https://github.com/gentoo/gentoo/blob/1761e8fcdfda09370046cdd0e382c3aa206d3f61/dev-python/pandas/pandas-2.1.0.ebuild is more up-to-date. I've done I've learned that apparently My only idea so far would be to move all the undesirable test data from subdirectories into one git submodule. That should prevent it from being included in sdist, and make it easy for us to fetch it independently and merge with the rest. |
It's not just the test data - the documentation, and possibly also some smaller items, have also been removed. In Debian, I've switched to using the git repository itself, so this isn't blocking for my packaging. I don't know whether the Gentoo or openSUSE build tools have an equivalent option.
If you do that, you might find my patch for loading test data from a different path useful. (Debian prefers to (also) run tests against the as-installed package, and we don't want test data taking up space in the user package either.) |
Yes that is intentional. I don't really think you can do anything with the raw .rst files for the docs. Is there any way SUSE and Gentoo can just switch to using the git tag? (I have a fix in mind for this issue, but I'm not sure its going to work with the current state of meson/meson-python. |
We can't. Gentoo users build from source, and we require sources that are available via plain HTTPS download. Fetching via git poses too many problems, in particular it doesn't support resuming and doesn't work over shoddy connections, let alone dealing with existing mirroring infrastructure. |
What do you mean by that? In Debian, we do build the documentation from source.
This was broken with 2.1 when I posted that; I think it is fixed now, but this has not yet been tested. |
Ah, I wasn't aware that Debian also built our docs. |
We're doing stuff like that because there are actually people who need to work without reliable Internet access, or without Internet access at all (e.g. while traveling). |
The Debian-built documentation is offered as an installable package (python-pandas-doc). And yes, it's relatively rarely used. |
I'm using IMO, the preferred way to get I'd like to see $ tar -cf - asv_bench > asv_bench.tar
$ gzip -k asv_bench.tar
$ ls -lh asv_bench.*
-rw-r--r-- 1 400K Oct 5 00:13 asv_bench.tar
-rw-r--r-- 1 68K Oct 5 00:13 asv_bench.tar.gz Proposed solution:
|
We don't ship these in the package, but do want to run the tests that use them tests_path() is removed completely because it is unclear whether it should point to the tests code or the directory above the test data Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
@mgorny @rebecca-palmer @bnavigator Just a heads up. I'm planning on removing tests from the pandas source distributions altogether. The plan is probably to make a separate package called One thing to note, though is that the tests will stay in the main pandas repo, so if you're building from the tag, the end result will be a compiled version of pandas with the tests included (unless you built an sdist and then a wheel from the sdist like we are planning on doing). (Now everything is not set in stone yet, but the plan is to do this for 3.0, so we still have at least several months to do this). Although tests will not be in the regular source distribution, I will be uploading a separate Will this severely affect packaging for the Linux distros in any way? |
I think it'd be fine, if it's just a matter of extracting a second archive, and possibly moving stuff around. |
We don't ship these in the package, but do want to run the tests that use them tests_path() is removed completely because it is unclear whether it should point to the tests code or the directory above the test data Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them tests_path() is removed completely because it is unclear whether it should point to the tests code or the directory above the test data Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them tests_path() is removed completely because it is unclear whether it should point to the tests code or the directory above the test data Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
We don't ship these in the package, but do want to run the tests that use them Author: Rebecca N. Palmer <[email protected]> Forwarded: pandas-dev/pandas#54907 Gbp-Pq: Name find_test_data.patch
Installation check
Platform
Linux-6.4.7-gentoo-dist-x86_64-AMD_Ryzen_5_3600_6-Core_Processor-with-glibc2.38
Installation Method
pip install
pandas Version
2.1.0
Python Version
3.11.5
Installation Logs
The file's in source directory, so I guess it isn't installed by meson.The text was updated successfully, but these errors were encountered: