-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD: Clang error while installing snappy in requirements-dev.txt #32417
Comments
Looks like this was added in #26657 and has a comment on master that it is required for pyarrow - @datapythonista any chance you know what this is? |
That PR was just to get the missing dependencies from the other builds into This library is to connect to gbq afaik. But this error is expected I think. Libraries use binary dependencies, and pip is not good at those, and you need to install them manually, including their C headers. That's why conda exist, to avoid having to install binary dependencies manually. So, @kendricng, please use conda, or install snappy, including its dependencies files manually. Closing this issue, let me know if I miss anything and needs to be reopen. |
Yea I figured something google-related but didn't see it in the pandas-gbq requirements; are we sure we even need it? |
Not sure if we need it. But it's a compression library, I guess it's an optional gbq dependency. |
Gotcha. Yea so if we don't actually need it I would be OK to remove from the requirements files @kendricng something you would be interested in submitting a PR for? |
is this fixed by |
doesn't that defeat the purpose of having "the missing dependencies from the other builds into environment.yml so we had all the dependencies locally." #32417 (comment) for testing/development also see #32327 for another snappy issue. |
snappy is also an optional dependency for fastparquet and pyarrow. Regardless, I don't think there's anything for pandas to do. If the library doesn't provide a wheel, users will need to install from source. And to do that they need the C library (built from source or through the system package manager). If we deem that having pip users take that extra step is too large, then we could find where in the docs snappy is used and avoid it. |
I would like to. I tried submitting one this morning but it got closed. I was wondering what the proper procedure for submitting one would be? |
Yes, this is the fix. Thank you for pointing this out, and I've been able to set up my dev environment! My bigger concern is that this type of error should not be happening when setting up a dev environment from scratch. |
It doesn’t if you use conda. But how that library is packaged is out of our control. What we can control is avoiding its use in the docs. Does anyone know where it’s used? |
@kendricng if interested can you try building the docs from a clean environment that doesn't have |
@TomAugspurger looks like what fails without snappy is user_guide/io in the code: df.to_parquet('example_fp.parquet', engine='fastparquet') The default argument for the |
Thanks for tracking that down! Yes, I think it'd be best to specify a different (or just no) compression with |
Just realized that this will also affect the result in the performance comparisson: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#performance-considerations Other than that it's 4 calls to |
Personally, I would leave the docs as is. |
My preference is to close this issue as invalid and leave everything as it is. pip not being able to easily install non pure Python dependencies is known and not pandas related. That's why conda exists, and why we recommend it. Users who still want to use pip will have to deal with these problems. Today is snappy, but tomorrow may be something else that we can't consider removing from the docs dependencies. If anything, what I would do is stop providing the |
@datapythonista so I think we disagree but probably because we are coming to this from different angles. I generally am not clear on why we put optional dependencies for optional dependencies in the requirements file, so if you have insights into that maybe would be helpful. IMO the downside of doing that is manifested in an error here, but even if conda doesn't throw an error it still adds complexity to the solvability of our environments that might not be necessary and could still cause issues down the road |
I think we have three options in this case:
None of them is perfect. I think 1 is the simplest, and the drawback is IMHO quite reasonable. 3 is also quite ok for me, except we're limited on what we show in the docs, and they will become trickier. We expect users to use snappy (that's why it's by default), but we always show how to use no compression. I don't think it's great to have notes in the docs "we remove compression because we don't have snappy installed", and another "we show to_parquet without compression, but with compression would be much faster". But it's a reasonable option. Having snappy only for conda is the best of both worlds. But at the cost of a hacky solution IMO. Things become trickier in the dependency synchronization, and if the same happens with another library, things will start to be out of control, as this doesn't escalate well I think. In any case, I'm ok with whatever the rest decide. But my preference is not to complicate things more for what IMO is a small problem (I don't think we should support pip for pandas development). |
Note that you always get snappy if you install pyarrow with conda (at least for recent versions). (I suppose this was the reason it was added originally: not because it was needed for the conda env, but to make the converted requirements file equivalent) |
We've got |
Ah yes, I forgot that. |
So an option could also be to only put |
We just ran into this again during the sprint at EuroSciPy. It is quite confusing for new contributors and I would like to make setting up a new environment as friction-less as possible. With the suggestion in #32417 (comment) we can remove it from both the conda environment and the requirements-dev as well as not have any errors building the documentation. |
On of the things that was mentioned above is that python-snappy is needed for building the IO user guide. From a quick test, after removing it from my dev env, the page still seems to build fine. Also, fastparquet no longer depends on python-snappy (according to https://fastparquet.readthedocs.io/en/latest/install.html#requirements it now uses So I think we can actually just remove this altogether. |
The only other mention of snappy in our codebase is in the context of HDF IO, but there So I am removing |
Problem description
I was going through the instructions in setting up the dev environment on Mac here:
Contributing to pandas
And while I was trying to execute the below code:
I got the below clang error (redacted):
I went through the rest of the virtual environment set up fine besides this issue.
The text was updated successfully, but these errors were encountered: