Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install deepspeed 0.12.6, fail to produce metadata. #4914

Closed
simonou99 opened this issue Jan 8, 2024 · 7 comments
Closed

Cannot install deepspeed 0.12.6, fail to produce metadata. #4914

simonou99 opened this issue Jan 8, 2024 · 7 comments
Assignees
Labels
bug Something isn't working build Improvements to the build and testing systems.

Comments

@simonou99
Copy link

Describe the bug
Fail to pip-install deepspeed 0.12.6

To Reproduce
pip install deepspeed

Expected behavior
Expected successful installation.

Error report output
(Truncated):

Collecting deepspeed!=0.11.0
  Using cached deepspeed-0.12.6.tar.gz (1.2 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [19 lines of output]
      [2024-01-08 14:52:17,604] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      [2024-01-08 14:52:18,545] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      test.c
      LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\ouzho\AppData\Local\Temp\pip-install-sl7ccsbf\deepspeed_244f8b986fe84fa2be10e453c7ca05c5\setup.py", line 192, in <module>
          ext_modules.append(builder.builder())
        File "C:\Users\ouzho\AppData\Local\Temp\pip-install-sl7ccsbf\deepspeed_244f8b986fe84fa2be10e453c7ca05c5\op_builder\builder.py", line 637, in builder
          extra_link_args=self.strip_empty_entries(self.extra_ldflags()))
        File "C:\Users\ouzho\AppData\Local\Temp\pip-install-sl7ccsbf\deepspeed_244f8b986fe84fa2be10e453c7ca05c5\op_builder\inference_cutlass_builder.py", line 71, in extra_ldflags
          import dskernels
      ModuleNotFoundError: No module named 'dskernels'
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
       [WARNING]  Filtered compute capabilities ['6.0', '6.1', '7.0']
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

BTW, that line in Chinese characters ('无法打开输入文件') means 'cannot open input file' or similar.

Screenshots
N/A

System info (please complete the following information):

  • OS: win11
  • GPU: 1x RTX3060 Mobile
  • DeepSpeed-0.12.6 (trying to install)
  • Python 3.10.6

Docker context
N/A

Additional context
Error actually happened while trying to install the ludwig[full] package, which requires deepspeed.

@simonou99 simonou99 added bug Something isn't working inference labels Jan 8, 2024
@simonou99 simonou99 changed the title [BUG] Cannot install deepspeed 0.12.6, fail to produce metadata. Jan 8, 2024
@loadams
Copy link
Collaborator

loadams commented Jan 8, 2024

@simonou99 - it looks like you are on Windows, we know Windows support isn't completely there, and it looks like part of why you are erroring out is because it cannot find dskernels:

ModuleNotFoundError: No module named 'dskernels'

However, this package is not supported for Windows due to the kernels not all being available there. There are largely two approaches here:

  1. DeepSpeed is fully supported on WSL the Windows Subsystem for Linux, which you can use on your Windows machine, that is usually the easiest way forward.
  2. There are a few other issues tracking where users have worked around various issues, a few are here and here.

@loadams loadams self-assigned this Jan 8, 2024
@loadams loadams added build Improvements to the build and testing systems. and removed inference labels Jan 8, 2024
@loadams
Copy link
Collaborator

loadams commented Jan 16, 2024

@simonou99 - does that help to answer your question? There's not much else to add for this issue, unfortunately.

@simonou99
Copy link
Author

@simonou99 - does that help to answer your question? There's not much else to add for this issue, unfortunately.

Hi, many thanks for the help. sry, i got pretty busy lately and couldnt make time for replying or checking my project. unfortunately its kinda inconvenient for me to move my current stuff onto wsl. so, thanks again for the help, it does provoke some thoughts for my future projects, but I might have to close this thread for now...

@simonou99 simonou99 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2024
@nctu6
Copy link

nctu6 commented Jul 5, 2024

Hi,

Even in linux, this error happened: ModuleNotFoundError: No module named 'dskernels'.
Linux v08pytsurveyllm-6wlcc 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

deepspeed: install from source
torch 2.1.1+cu121
torchaudio 2.1.1+cu121
torchvision 0.16.1+cu121

@loadams
Copy link
Collaborator

loadams commented Jul 9, 2024

Hi @nctu6 - have you run pip install deepspeed-kernels? Or can you share the output of pip list?

@nctu6
Copy link

nctu6 commented Jul 10, 2024

Hi,

I have resolved the installation issue.

  1. Run the following command to install the required package:

    conda install nvidia/label/cuda-12.1.0::cuda-nvcc
    
  2. Export the PATH variable:

    export PATH=[your conda path]/envs/llamafactory/bin:$PATH
    
  3. Execute the installation with the following command:

    DS_SKIP_CUDA_CHECK=1 DS_BUILD_OPS=1 DS_BUILD_EVOFORMER_ATTN=0 DS_BUILD_FP_QUANTIZER=0 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_RAGGED_DEVICE_OPS=0 pip install . --global-option="build_ext" --global-option="-j8"
    

Best regards,

@ankan8145
Copy link

I resolved the installation issue by executing the following command:
conda install deepspeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Improvements to the build and testing systems.
Projects
None yet
Development

No branches or pull requests

4 participants