Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large, unnecessary, proprietary mkl package included in numpy and pandas install, inflates binary by 600MB #84

Closed
answerquest opened this issue Apr 28, 2018 · 13 comments

Comments

@answerquest
Copy link

answerquest commented Apr 28, 2018

Ref:

mkl package is co-installed when we install either pandas or numpy using conda. It is a very large package clocking at ~200MB for download, and is ~600MB when installed in the pkgs folder of my MiniConda installation. The pip installer does not include this package when installing pandas. It is not there among conda feedstocks list and it has no description given on https://pypi.org/project/mkl/ . And..

License: Other/Proprietary License (Proprietary - Intel)
Author: Intel Corporation

I do not know more about this subject, but when I searched for mkl I came across more results for mkl-fft and mkl-random which is are not the same as mkl, and are under free licenses. mkl-fft's description on pypi also seems more numpy-involved. https://pypi.org/project/mkl-fft/

My hunch is that mkl-fft and mkl-random were the ones supposed to be included in the numpy installs and mkl got included by accident.

Where this is really causing a problem : when generating self-contained binaries for distribution, the mkl packages gets roped in for programs that import either numpy or pandas if conda has installed it in the python environment. For windows binary that the PyInstaller program creates, it balloons up the dist by about 600MBs.

Please investigate this and if it's not essential to numpy then remove mkl from the numpy installation by conda.

Info: Conda version: 4.5.1, on Windows 7 64-bit. As part of MiniConda Python3 64-bit.

Sharing lines from the numpy json file I found in my MiniConda installation's conda-meta folder:

 "arch": "x86_64",
  "build": "py36h5c71026_1",
  "build_number": 1,
  "channel": "https://repo.anaconda.com/pkgs/main/win-64",
  "constrains": [],
  "depends": [
    "icc_rt >=16.0.4",
    "mkl >=2018.0.2",
    "mkl_fft",
    "mkl_random",
    "python >=3.6,<3.7.0a0",
    "vc 14.*"
  ],

Sharing lines from [Miniconda3]\pkgs\mkl-2018.0.2-1\info\LICENSE.txt :

Intel Simplified Software License (Version January 2018)

For: Intel(R) Math Kernel Library (Intel(R) MKL)
     Intel(R) Integrated Performance Primitives (Intel(R) IPP)
     Intel(R) Machine Learning Scaling Library (Intel(R) MLSL)
     Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL)
     Intel(R) Threading Building Blocks (Intel(R) TBB)
     Intel(R) Distribution for Python*
     Intel(R) MPI Library
@jakirkham
Copy link
Member

First we don’t currently build numpy against mkl. Only defaults does that currently. Though they have a nomkl package that can be installed to opt-out. We build against openblas, which is BSD 3-Clause. That said, it’s possible in the future that we ship both options, OpenBLAS and MKL, letting users choose one much like defaults. MKL is actually Open License (not Open Source), which means we can link to it and share it freely should we wish to.

@rgommers
Copy link
Contributor

MKL is actually Open License (not Open Source), which means we can link to it and share it freely should we wish to.

That's not quite complete. There is a potential issue here, especially when using PyInstaller or a similar such tool: it's possible that it's a GPL violation to distribute an executable with both MKL and a GPL component. The NumPy team has talked to Intel about this (answer, Intel will not give definitive legal advice) and gotten good independent advice (answer, GPL violation potentially possible here but the likelihood of that is case-specific).

To add to the answer to @answerquest: MKL or another BLAS package is definitely necessary for numpy. You're getting MKL because you have installed the Anaconda default numpy. If you use conda install -c conda-forge numpy you will get this package, and will then get OpenBLAS instead of MKL.

@answerquest
Copy link
Author

answerquest commented Apr 29, 2018

Thanks for the clarification. Anyways as you can see in the support links posted, with the programs working perfectly fine without mkl installed, I'm going ahead with not using conda for installing the numpy and pandas packages for the time being and that will be my recommendation in the support forums when questions about the too large size pop up again. [Edit] it'll be better to use conda install -c conda-forge numpy to install numpy : it replaces mkl with OpenBLAS

What could help in this matter is if we could have a list of numpy/pandas commands that actually do need mkl, then people can have an objective way of determining whether their programs need it or not. The difference is a whopping 600MB in program size, so that is significant for any program creator (my program's binary is just 30MB when I go the no-conda way, and none of the functions are failing. It doesn't make any sense for me to include mkl just out of a sense of formality/loyalty) and is well worth the disambiguation.

Also, in a conda install, if there can be a way to manually specify which dependency is to be excluded, then that can also be a good workaround, as the other benefits of conda over pip are still there and I still want to use conda.

@rgommers
Copy link
Contributor

@answerquest that's not the best recommendation unfortunately. It works in that case, but installing numpy with pip inside a conda env is not a good idea. numpy is special-cased by conda, so it's about the only thing that you really shouldn't install with pip. Two better alternatives:

  1. conda install -c conda-forge numpy (will give you the same OpenBLAS dependency as the official numpy wheel has that pip grabs)
  2. Don't use conda, but create a clean virtualenv and install with pip into that.

@answerquest
Copy link
Author

@rgommers my bad, sorry, I had not read the OpenBLAS line correctly. If -c conda-forge helps to exclude mkl then that's a good solution indeed. I'm guessing OpenBLAS is not 600MB in size?

Definitely using virtual environment to create the binary.

@rgommers
Copy link
Contributor

Indeed, should be <10 MB.

ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
Use the conda-forge packages instead of the default packages.
Especially for numpy, this means using OpenBLAS instead of MKL.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ax3l added a commit to ax3l/openPMD-api that referenced this issue Jul 31, 2018
The latest update on the anaconda `default` channel has a broken,
32bit Windows MKL FFT lib that crashes the numpy import on it.

conda-forge/numpy-feedstock#84
conda-forge/numpy-feedstock#97
ocefpaf added a commit that referenced this issue Aug 31, 2018
@whekman
Copy link

whekman commented Oct 19, 2018

For anyone trying to do as @rgommers suggests (option 1. - it worked in the end!). The following might save you 1 hour of puzzling: stackoverflow thread.

I was having difficulty installing pyinstaller AND numpy with openblas just now because my "conda install -c conda-forge pyinstaller" command resulted in numpy being "upgraded" to an mkl-linked one. The link explained a great deal and pyinstaller now makes my .py "import numpy" into an exe (on windows) of <14mb :)

Still, scary to be so dependent on what version is available/downloaded via conda. Would be a shame not to be able to make small executables which make use of numpy. Should I be worried?

@jakirkham
Copy link
Member

FWIW what I typically do is conda install conda-forge::blas=*=openblas. This ensures you will get OpenBLAS backed NumPy and friends.

@msarahan
Copy link
Member

Yes, or add

blas=*=openblas

To your condarc, https://conda.io/docs/user-guide/configuration/use-condarc.html#always-add-packages-by-default-create-default-packages

@FSund
Copy link

FSund commented Nov 14, 2018

@msarahan How do I add this? I tried adding it at the bottom of the .condarc in my environment, but then I get the following error

LoadError: Load Error: in C:\Users\filip\Anaconda3\envs\sci\.condarc on line 3, column 15. Invalid YAML

@answerquest
Copy link
Author

Hi, just FYI (not replying to any earlier post here), I've since had no problems in using just pip to install numpy and pandas modules for my application. If they're leaving anything out, then my prog isn't using it anyways and I haven't experienced any problems off it. The pyinstaller-generated .exe (single-file) is only around 30mb that too without upx compression.
(Note: don't use upx compression if making single-file exe using pyinstaller, as upx screws up one of the dll's)

Earlier pip was having a problem with pandas, which was why I was using conda, but that got resolved just some days after I had posted here. This update isn't relevant for this repo but seeing that there's activity here and I was the OP, I have an obligation to disclose how I finally solved the problem on my end. I went with pip and it worked out fine.
No hard feelings for conda folks, hope you don't mind this update.

@FSund
Copy link

FSund commented Nov 14, 2018

Hi, just FYI (not replying to any earlier post here), I've since had no problems in using just pip to install numpy and pandas modules for my application. If they're leaving anything out, then my prog isn't using it anyways and I haven't experienced any problems off it. The pyinstaller-generated .exe (single-file) is only around 30mb that too without upx compression.
(Note: don't use upx compression if making single-file exe using pyinstaller, as upx screws up one of the dll's)

This issue is fixed in newer versions, so installing numpy from conda-forge should get you a openblas-version.

But there is no openblas/nomkl version of scipy on Windows yet, so I'm using pip to install scipy. I have the same experience as you, no issues, but something is probably not getting installed correctly. But I prefer not mixing pip and conda, so I'd love a conda-forge version of scipy. Work on that is going on here: conda-forge/scipy-feedstock#78

@Gnomic20

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants