BLD: Clang error while installing snappy in requirements-dev.txt #32417

kendricng · 2020-03-03T18:57:07Z

Problem description

I was going through the instructions in setting up the dev environment on Mac here:

And while I was trying to execute the below code:

# Install the build dependencies
python -m pip install -r requirements-dev.txt

I got the below clang error (redacted):

Running setup.py install for python-snappy ... error
    ERROR: Command errored out with exit status 1:
     …
    clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/include -I/usr/local/opt/[email protected]/include -I/usr/local/opt/sqlite/include -I/Users/xxx/virtualenvs/pandas-dev/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c snappy/snappymodule.cc -o build/temp.macosx-10.15-x86_64-3.7/snappy/snappymodule.o
    snappy/snappymodule.cc:31:10: fatal error: 'snappy-c.h' file not found
    #include <snappy-c.h>
             ^~~~~~~~~~~~
    1 error generated.
    error: command 'clang' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: …

I went through the rest of the virtual environment set up fine besides this issue.

The text was updated successfully, but these errors were encountered:

WillAyd · 2020-03-03T21:21:18Z

Looks like this was added in #26657 and has a comment on master that it is required for pyarrow - @datapythonista any chance you know what this is?

datapythonista · 2020-03-04T00:52:04Z

That PR was just to get the missing dependencies from the other builds into environment.yml so we had all the dependencies locally.

This library is to connect to gbq afaik.

But this error is expected I think. Libraries use binary dependencies, and pip is not good at those, and you need to install them manually, including their C headers. That's why conda exist, to avoid having to install binary dependencies manually.

So, @kendricng, please use conda, or install snappy, including its dependencies files manually.

Closing this issue, let me know if I miss anything and needs to be reopen.

WillAyd · 2020-03-04T00:55:33Z

Yea I figured something google-related but didn't see it in the pandas-gbq requirements; are we sure we even need it?

datapythonista · 2020-03-04T01:06:57Z

Not sure if we need it. But it's a compression library, I guess it's an optional gbq dependency.

WillAyd · 2020-03-04T01:08:44Z

Gotcha. Yea so if we don't actually need it I would be OK to remove from the requirements files

@kendricng something you would be interested in submitting a PR for?

jbrockmendel · 2020-03-04T04:58:13Z

is this fixed by brew install snappy?

simonjayhawkins · 2020-03-04T12:44:22Z

Gotcha. Yea so if we don't actually need it I would be OK to remove from the requirements files

doesn't that defeat the purpose of having "the missing dependencies from the other builds into environment.yml so we had all the dependencies locally." #32417 (comment) for testing/development

also see #32327 for another snappy issue.

TomAugspurger · 2020-03-04T12:54:44Z

snappy is also an optional dependency for fastparquet and pyarrow.

Regardless, I don't think there's anything for pandas to do. If the library doesn't provide a wheel, users will need to install from source. And to do that they need the C library (built from source or through the system package manager).

If we deem that having pip users take that extra step is too large, then we could find where in the docs snappy is used and avoid it.

kendricng · 2020-03-04T22:28:54Z

Gotcha. Yea so if we don't actually need it I would be OK to remove from the requirements files

@kendricng something you would be interested in submitting a PR for?

I would like to. I tried submitting one this morning but it got closed. I was wondering what the proper procedure for submitting one would be?

kendricng · 2020-03-04T22:33:00Z

is this fixed by brew install snappy?

Yes, this is the fix. Thank you for pointing this out, and I've been able to set up my dev environment!

My bigger concern is that this type of error should not be happening when setting up a dev environment from scratch.

TomAugspurger · 2020-03-04T23:57:25Z

It doesn’t if you use conda. But how that library is packaged is out of our control.

What we can control is avoiding its use in the docs. Does anyone know where it’s used?

WillAyd · 2020-03-05T02:24:04Z

@kendricng if interested can you try building the docs from a clean environment that doesn't have python-snappy? May clue in on actual requirement

datapythonista · 2020-03-05T08:48:15Z

@TomAugspurger looks like what fails without snappy is user_guide/io in the code:

df.to_parquet('example_fp.parquet', engine='fastparquet')

The default argument for the compression parameter is snappy, I guess if we specify the gzip compression there, no dependency is required. Do you think we should do that?

TomAugspurger · 2020-03-05T12:44:52Z

Thanks for tracking that down! Yes, I think it'd be best to specify a different (or just no) compression with compression=None.

datapythonista · 2020-03-05T13:00:04Z

Just realized that this will also affect the result in the performance comparisson: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#performance-considerations

Other than that it's 4 calls to to_parquet that would be updated.

jorisvandenbossche · 2020-03-09T19:57:31Z

Personally, I would leave the docs as is.
But to smooth the development on pip, can we just remove snappy during the conversion from env -> requirements file? It's only when they would do a full doc build, contributors will locally see some errors in the parquet examples. I don't think that's too bad.

datapythonista · 2020-03-09T20:45:27Z

My preference is to close this issue as invalid and leave everything as it is. pip not being able to easily install non pure Python dependencies is known and not pandas related. That's why conda exists, and why we recommend it. Users who still want to use pip will have to deal with these problems. Today is snappy, but tomorrow may be something else that we can't consider removing from the docs dependencies.

If anything, what I would do is stop providing the requirements-dev.txt file.

WillAyd · 2020-03-10T00:41:08Z

@datapythonista so I think we disagree but probably because we are coming to this from different angles. I generally am not clear on why we put optional dependencies for optional dependencies in the requirements file, so if you have insights into that maybe would be helpful.

IMO the downside of doing that is manifested in an error here, but even if conda doesn't throw an error it still adds complexity to the solvability of our environments that might not be necessary and could still cause issues down the road

datapythonista · 2020-03-10T01:05:52Z

I think we have three options in this case:

Leave snappy for conda and pip and pip will fail for snappy if the user doesn't have the right dependencies. What we have now, and my favourite option.
Leave snappy for conda and remove it for pip, which adds some complexity to the keeping the dependencies synchronized, and will cause pip users get errors when building the docs. Joris favourite option.
Remove snappy from both conda and pip, which requires not using snappy in the docs (using compression=False in all to_parquet calls, and causes a bit of trouble in the benchmarks. We can't have to_parquet with snappy on them, so we can add a note and show to_parquet without compression.

None of them is perfect. I think 1 is the simplest, and the drawback is IMHO quite reasonable.

3 is also quite ok for me, except we're limited on what we show in the docs, and they will become trickier. We expect users to use snappy (that's why it's by default), but we always show how to use no compression. I don't think it's great to have notes in the docs "we remove compression because we don't have snappy installed", and another "we show to_parquet without compression, but with compression would be much faster". But it's a reasonable option.

Having snappy only for conda is the best of both worlds. But at the cost of a hacky solution IMO. Things become trickier in the dependency synchronization, and if the same happens with another library, things will start to be out of control, as this doesn't escalate well I think.

In any case, I'm ok with whatever the rest decide. But my preference is not to complicate things more for what IMO is a small problem (I don't think we should support pip for pandas development).

jorisvandenbossche · 2020-03-10T07:16:25Z

Note that you always get snappy if you install pyarrow with conda (at least for recent versions).
So another option is also to just leave it out: with conda we still get it automatically as a dependency of pyarrow, and with pip it is not installed (since it is not listed as a install requirement in setup.py). That gives no maintenance overhead in tracking it in the conversion script.

(I suppose this was the reason it was added originally: not because it was needed for the conda env, but to make the converted requirements file equivalent)

datapythonista · 2020-03-10T09:43:56Z

We've got pyarrow>=0.13.1 in environmrnt.yml, and when I removed snappy in #32454, it wasn't installed and the doc build failed.

jorisvandenbossche · 2020-03-10T10:25:39Z

Ah yes, I forgot that.
So we mixed up python-snappy and snappy. snappy is required for pyarrow (and with conda it is installed by default, and I suppose the wheels include it as well), python-snappy is required for fastparquet to support snappy compression.
(so the comment in the environment.yml file is wrong, it's not for pyarrow but for fastparquet)

jorisvandenbossche · 2020-03-10T10:27:16Z

So an option could also be to only put compression=None in the fastparquet examples, and keep the normal compression (eg in the benchmark in io.rst) where pyarrow is used.

rhshadrach · 2023-08-18T08:03:15Z

We just ran into this again during the sprint at EuroSciPy. It is quite confusing for new contributors and I would like to make setting up a new environment as friction-less as possible. With the suggestion in #32417 (comment) we can remove it from both the conda environment and the requirements-dev as well as not have any errors building the documentation.

cc @datapythonista

jorisvandenbossche · 2023-08-18T19:35:33Z

On of the things that was mentioned above is that python-snappy is needed for building the IO user guide. From a quick test, after removing it from my dev env, the page still seems to build fine.

Also, fastparquet no longer depends on python-snappy (according to https://fastparquet.readthedocs.io/en/latest/install.html#requirements it now uses cramjam for all different compression codecs). So we can actually remove some skips from the parquet tests.

So I think we can actually just remove this altogether.

jorisvandenbossche · 2023-08-18T19:51:40Z

The only other mention of snappy in our codebase is in the context of HDF IO, but there pytables includes this (I assume through blosc, in any case it doesn't list python-snappy as dependency).

So I am removing python-snappy completely (not only from the pip dev requirements, but from all CI envs and the installation docs etc as well): #54633

WillAyd added the Dependencies Required and optional dependencies label Mar 3, 2020

datapythonista closed this as completed Mar 4, 2020

WillAyd reopened this Mar 4, 2020

kendricng mentioned this issue Mar 4, 2020

BLD: Remove snappy from requirements-dev.txt #32446

Closed

datapythonista mentioned this issue Mar 5, 2020

DEPS: Removing snappy from local/docs dependencies #32454

Closed

1 task

jbrockmendel added the Build Library building on various platforms label Sep 21, 2020

mroeschke added the Needs Discussion Requires discussion from core team before further action label Jul 29, 2021

jorisvandenbossche mentioned this issue Aug 18, 2023

DEP: remove python-snappy and brotli as optional dependencies (no longer used) #54633

Merged

lithomas1 closed this as completed in #54633 Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLD: Clang error while installing snappy in requirements-dev.txt #32417

BLD: Clang error while installing snappy in requirements-dev.txt #32417

kendricng commented Mar 3, 2020

WillAyd commented Mar 3, 2020

datapythonista commented Mar 4, 2020

WillAyd commented Mar 4, 2020

datapythonista commented Mar 4, 2020

WillAyd commented Mar 4, 2020

jbrockmendel commented Mar 4, 2020

simonjayhawkins commented Mar 4, 2020

TomAugspurger commented Mar 4, 2020

kendricng commented Mar 4, 2020

kendricng commented Mar 4, 2020

TomAugspurger commented Mar 4, 2020 •

edited

Loading

WillAyd commented Mar 5, 2020

datapythonista commented Mar 5, 2020 •

edited

Loading

TomAugspurger commented Mar 5, 2020

datapythonista commented Mar 5, 2020

jorisvandenbossche commented Mar 9, 2020

datapythonista commented Mar 9, 2020

WillAyd commented Mar 10, 2020

datapythonista commented Mar 10, 2020

jorisvandenbossche commented Mar 10, 2020

datapythonista commented Mar 10, 2020

jorisvandenbossche commented Mar 10, 2020 •

edited

Loading

jorisvandenbossche commented Mar 10, 2020

rhshadrach commented Aug 18, 2023 •

edited

Loading

jorisvandenbossche commented Aug 18, 2023

jorisvandenbossche commented Aug 18, 2023

BLD: Clang error while installing snappy in requirements-dev.txt #32417

BLD: Clang error while installing snappy in requirements-dev.txt #32417

Comments

kendricng commented Mar 3, 2020

Problem description

WillAyd commented Mar 3, 2020

datapythonista commented Mar 4, 2020

WillAyd commented Mar 4, 2020

datapythonista commented Mar 4, 2020

WillAyd commented Mar 4, 2020

jbrockmendel commented Mar 4, 2020

simonjayhawkins commented Mar 4, 2020

TomAugspurger commented Mar 4, 2020

kendricng commented Mar 4, 2020

kendricng commented Mar 4, 2020

TomAugspurger commented Mar 4, 2020 • edited Loading

WillAyd commented Mar 5, 2020

datapythonista commented Mar 5, 2020 • edited Loading

TomAugspurger commented Mar 5, 2020

datapythonista commented Mar 5, 2020

jorisvandenbossche commented Mar 9, 2020

datapythonista commented Mar 9, 2020

WillAyd commented Mar 10, 2020

datapythonista commented Mar 10, 2020

jorisvandenbossche commented Mar 10, 2020

datapythonista commented Mar 10, 2020

jorisvandenbossche commented Mar 10, 2020 • edited Loading

jorisvandenbossche commented Mar 10, 2020

rhshadrach commented Aug 18, 2023 • edited Loading

jorisvandenbossche commented Aug 18, 2023

jorisvandenbossche commented Aug 18, 2023

TomAugspurger commented Mar 4, 2020 •

edited

Loading

datapythonista commented Mar 5, 2020 •

edited

Loading

jorisvandenbossche commented Mar 10, 2020 •

edited

Loading

rhshadrach commented Aug 18, 2023 •

edited

Loading