Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? #2427

Closed
d8ahazard opened this issue Oct 15, 2022 · 41 comments · Fixed by #5609
Closed

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? #2427

d8ahazard opened this issue Oct 15, 2022 · 41 comments · Fixed by #5609
Labels
enhancement New feature or request

Comments

@d8ahazard
Copy link

While "I get it"...I really don't get why this still doesn't even have BASIC Windows support.

It is published by Microsoft, right?

Compiling from source on windoze doesn't actually seem to generate a .whl file so it could be re-distributed or something.

Pulling from PIP throws any number of errors, from ADAM not being supported because it requires 'lscpu', or just failing because libaio.so can't be found.

Meaning, that for the past several years, this M$-produced piece of software is mostly useless on the OS they create.

This is one of the most annoying things about Python in general. "It's soooo cross-platform". Until you need a specific library, and realize it was really only ever developed for Linux users until someone threw a slug in the readme about how it MIGHT work with windows, but only if you do a hundred backflips while wearing a blue robe and sacrifice a chicken to Cthulhu.

Python does still support releasing different packages for different operating systems, right?

If that's still true, then it would be fantastic if someone out there could release a proper .whl to pypi for us second-class Windoze users. I really don't feel like spending the next several hours trying to upgrade my instance of WSL2 to the right version that won't lose it's mind if I try to use a specific amount of RAM...

@d8ahazard d8ahazard added the enhancement New feature or request label Oct 15, 2022
@d8ahazard
Copy link
Author

d8ahazard commented Oct 15, 2022

I mean, this only has open issues for the past two years or more...
#435, #1189, #1631, #1769, #2099, #2191, #2406

@n00mkrad
Copy link

n00mkrad commented Oct 16, 2022

+1

DeepSpeed is nearly (if not entirely) impossible to install on Windows.

@tjruwase
Copy link
Contributor

We hear you. Please try #2428

@RezaYazdaniAminabadi
Copy link
Contributor

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue?
Thanks,
Reza

@n00mkrad
Copy link

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

@d8ahazard
Copy link
Author

Hi @n00mkrad and @d8ahazard,
I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

Solved this by installing the windows 10 SDK...but this is also precisely what I'm grumbling about.

Even after getting it to compile, there's no /dist folder and no .whl file, despite the setup.py file clearly indicating this is what should happen.

The .bat file is calling python setup.py bdist_whl...yet we get a .egg.info file.

If I edit the bat to call pip install setup.py, it gets really mad at me...can't find the error it throws ATM.

Like, within the app I'm trying to use deepspeed, I can easily do a try: / import deepspeed command to determine if that dependency exists. Why can't the setup.py script do the same for opts that may be unavailable in Windoze?

Last - when I do finally jump through all the hoops and get setup.py to create something in the /build folder, I have to manually spoof the whl-info directory in order for accelerate to recognize this, and even then, it refuses to load due to a missing module.

"Distributed package doesn't have MPI built in. MPI is only included if you build PyTorch from source on a host that has MPI installed."

@camenduru
Copy link

@tjruwase @RezaYazdaniAminabadi Hi
Can DeepSpeed work without libaio? if the answer is no there is no way to run DeepSpeed on windows right?

@tjruwase
Copy link
Contributor

@d8ahazard, yes DeepSpeed can work without libaio. This library is only used by zero-infinity and zero-inference.

@camenduru
Copy link

@tjruwase thanks ❤️ if we don't need libaio why this error LINK : fatal error LNK1181: cannot open input file 'aio.lib'
set DS_BUILD_AIO=0
set DS_BUILD_SPARSE_ATTN=0

@ChenYFan
Copy link

ChenYFan commented Oct 24, 2022

Did Microsoft really consider adapting to windows when developing it? When I start pytorch, it forces linking a GPU with nccl even though I train under cpu only

As we all know, nccl cannot be used on win fucking at all

@camenduru
Copy link

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

@tjruwase
Copy link
Contributor

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

How did you resolve the libaio link error?

@n00mkrad
Copy link

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

So it's still not working on Windows.

WSL is not always an option depending on the use case.

@camenduru
Copy link

@tjruwase I can't manage to run on native windows. 😭 and ubuntu already comes with libaio and this issue helped a lot
huggingface/diffusers#807

@tjruwase
Copy link
Contributor

@camenduru, can you share the log of the link error? Thanks!

@camenduru
Copy link

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3
errors coming from:

pytorch/pytorch#81642 (this one looks serious) 🥵
https://github.com/pytorch/pytorch/blob/v1.12.1/c10/util/safe_numerics.h

const char *cusparseGetErrorString(cusparseStatus_t status);
https://github.com/pytorch/pytorch/blob/v1.12.1/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp

is this one necessary?
[WARNING] please install triton==1.0.0 if you want to use sparse attention (Supported Platforms: Linux)
https://github.com/openai/triton/

@camenduru
Copy link

error C3861: '_addcarry_u64': identifier not found this one is very interesting it is in the list 🤷

@Thomas-MMJ
Copy link
Contributor

@camenduru for wsl2, is it passing the pytest-3 tests/unit and other tests? I got it compiled on wsl2 but it is failing almost every test due to nccl issues.

If you could provide details as to your installation and whether you are passing the unit tests would be appreciated.

@camenduru
Copy link

@Thomas-MMJ DeepSpeed very slow with wsl2 and I deleted everything sorry I can't help 😞 we need working DeepSpeed on native windows maybe 1 year later idk also why we are putting linux kvm between gpu and cpu we will lose ~5% right?

@tjruwase
Copy link
Contributor

tjruwase commented Nov 1, 2022

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:

I think the problem is that it is trying to build all the ops because of the following environment variable setting
image

Can you try setting that env var to zero?

@PleezDeez
Copy link

have you tried using Chat GPT3 to solve it? 1 of the other requirements is Triton and a Russian managed to build a working 2.0 version for Windows a couple days ago but Chat GPT could likely find the other holes keeping it from building properly

@PleezDeez
Copy link

well if anyone feels like tinkering around with this, here's a whl that installs deepspeed version 0.8.0 on windows
https://transfer.sh/eDLOMJ/deepspeed-0.8.0+cd271a4a-cp310-cp310-win_amd64.whl
requires the cracked triton 2.0.0 whl first and the files from its folder dropped into the triton folder in xformers before it will install but it installs... heres the triton whl https://transfer.sh/me0xpC/triton-2.0.0-cp310-cp310-win_amd64.whl

@PleezDeez
Copy link

It'll throw up c10d flags looking for NCCL which is Linux only when turned on but this is an issue with either accelerate or my computer bc I get the same error when trying to turn on any sort of distributed training at all in windows and I don't know if I possess the coding knowledge to fix it so I leave it up to y'all

@PleezDeez
Copy link

Oh and it'll error out during accelerate config after saying no to using a deepspeed json file you'd like to use but I got around this by replacing the accelerate config file in windows with a config file I made in WSL

@78Alpha
Copy link

78Alpha commented Feb 1, 2023

I must point out that those wheel links redirect to Not Found

@JeffMII
Copy link

JeffMII commented Mar 14, 2023

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

@d8ahazard
Copy link
Author

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

Not without compiling it yourself, sacrificing three chickens to the dark lord Cthulhu, and playing "Hit me baby one more time" on reverse.

@camenduru
Copy link

Oh no 😐 I was playing the wrong song.

@yadalik
Copy link

yadalik commented Mar 31, 2023

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@Trace2333
Copy link

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Same problem,seems no way to solve the problem,but it works fine on linux...

@EntropicBlackhole
Copy link

What if we all custom build a branch supporting win? I'm honestly tired of so, and so many things not being supported on windows, not allowing me to work with certain packages. Unless we all keep bugging Microsoft with it, they won't really support it on windows, not sure why though. I can only assume something about backwards compatibility and trying to make it work on win 95

@marcoseduardopm
Copy link

marcoseduardopm commented Apr 11, 2023

(Note: these steps are for the interference only mode)
After trying forever, I got it working. That's what I have done:

  • Install the vs build tool 2019. If you already have it installed, repair it;
  • Install Miniconda (if you haven't it already);
  • Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
  • Open "Anaconda Prompt (MiniConda3)";
  • Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
  • Activate the conda env using "conda activate dsenv";
  • Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
  • Close anaconda prompt;
  • Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
  • Initialize conda on the Command prompt using "conda init cmd.exe";
  • Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
  • Activate the conda env using "conda activate dsenv";
  • Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
  • Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
  • Go to the deepspeed folder using "cd DeepSpeed";
  • Make 10 prayers to your god and try to install using "build_win.bat";
  • A .whl will be created in the dist folder.

To install the generated .whl, just use:
For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl
For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes:
Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540:
New Lines 531 and 532:
{static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()),
static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540:
{static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()),
static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed):
https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

@hamed-d
Copy link

hamed-d commented Apr 11, 2023

(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:

  • Install the vs build tool 2019. If you already have it installed, repair it;
  • Install Miniconda (if you haven't it already);
  • Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
  • Open "Anaconda Prompt (MiniConda3)";
  • Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
  • Activate the conda env using "conda activate dsenv";
  • Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
  • Close anaconda prompt;
  • Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
  • Initialize conda on the Command prompt using "conda init cmd.exe";
  • Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
  • Activate the conda env using "conda activate dsenv";
  • Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
  • Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
  • Go to the deepspeed folder using "cd DeepSpeed";
  • Make 10 prayers to your god and try to install using "build_win.bat";
  • A .whl will be created in the dist folder.

To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540: New Lines 531 and 532: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

The wheels worked for me in PyTorch 1.13.1 with CUDA 11.7 and python 3.10.9. Thank you. Although, when running a command like

deepspeed script.py --deepspeed

Windows tries to open deepspeed using an application and asks what app it should use to open it. But when importing and running code for deepspeed in python, it works.

@daxijiu
Copy link

daxijiu commented May 24, 2023

(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:

  • Install the vs build tool 2019. If you already have it installed, repair it;
  • Install Miniconda (if you haven't it already);
  • Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
  • Open "Anaconda Prompt (MiniConda3)";
  • Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
  • Activate the conda env using "conda activate dsenv";
  • Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
  • Close anaconda prompt;
  • Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
  • Initialize conda on the Command prompt using "conda init cmd.exe";
  • Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
  • Activate the conda env using "conda activate dsenv";
  • Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
  • Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
  • Go to the deepspeed folder using "cd DeepSpeed";
  • Make 10 prayers to your god and try to install using "build_win.bat";
  • A .whl will be created in the dist folder.

To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540: New Lines 531 and 532: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

Thank you for the method you provided, but it doesn't work for me with v0.9.2 version (win10+python3.10+vs2019). Could you please provide a solution or a whl file for v0.9.2 version?

@ai-robert
Copy link

Does DeepSpeed training work with WSL2? I've been going around in circles and have heard 3 different things. I ran into my own errors while installing it on WSL2 but I don't know if I should expect success with a few hours more work or if it's a hopeless cause? I'm also fine using a docker container if that's what it takes, I just can't find a straightforward answer on if training with deepspeed is reasonably expected to work on WSL2 at all

@vTuanpham
Copy link

Yeah, having the same problem, just thought that giving up and switching to wsl might solve the problem but when running, it just fail with: "FAILED: custom_cuda_kernel.cuda.o".

GotAudio added a commit to GotAudio/data that referenced this issue Oct 29, 2023
Deepspeed v0.11.1: Patch release cloned from https://github.com/microsoft/DeepSpeed on 10-28-2003. Compiled for Windows Torch 2.1.0 and CUDA 12.1 .rar because .whl was slightly too big for github.com. Includes 4 fixes described here microsoft/DeepSpeed#2427 (comment) and 4 fixes in other places shown below.  I know nothing about C++.  I just asked ChatGPT to fix the errors. 
diff --git a/build_win.bat b/build_win.bat
index ec8c8a36..f21d79cc 100644
--- a/build_win.bat
+++ b/build_win.bat
@@ -1,5 +1,10 @@
 @echo off

+REM begin-KAS
+set DS_BUILD_EVOFORMER_ATTN=0
+set DISTUTILS_USE_SDK=1
+REM end-KAS
+
 set DS_BUILD_AIO=0
 set DS_BUILD_SPARSE_ATTN=0

diff --git a/csrc/quantization/pt_binding.cpp b/csrc/quantization/pt_binding.cpp
index a4210897..12777603 100644
--- a/csrc/quantization/pt_binding.cpp
+++ b/csrc/quantization/pt_binding.cpp
@@ -241,11 +241,12 @@ std::vector<at::Tensor> quantized_reduction(at::Tensor& input_vals,
                               .device(at::kCUDA)
                               .requires_grad(false);

-    std::vector<long int> sz(input_vals.sizes().begin(), input_vals.sizes().end());
-    sz[sz.size() - 1] = sz.back() / devices_per_node;  // num of GPU per nodes
-    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
+    std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
+    sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node;  // num of GPU per nodes
+    at::IntArrayRef sz(sz_vector);
     auto output = torch::empty(sz, output_options);

+    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
     const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
     const int elems_per_out_group = elems_per_in_tensor / out_groups;

diff --git a/csrc/transformer/inference/csrc/pt_binding.cpp b/csrc/transformer/inference/csrc/pt_binding.cpp
index b7277d1e..a26eaa40 100644
--- a/csrc/transformer/inference/csrc/pt_binding.cpp
+++ b/csrc/transformer/inference/csrc/pt_binding.cpp
@@ -538,8 +538,8 @@ std::vector<at::Tensor> ds_softmax_context(at::Tensor& query_key_value,
     if (layer_id == num_layers - 1) InferenceContext::Instance().advance_tokens();
     auto prev_key = torch::from_blob(workspace + offset,
                                      {bsz, heads, all_tokens, k},
-                                     {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
-                                      k * InferenceContext::Instance().GetMaxTokenLength(),
+                                     {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
+                                      static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),
                                       k,
                                       1},
                                      options);
@@ -547,8 +547,8 @@ std::vector<at::Tensor> ds_softmax_context(at::Tensor& query_key_value,
     auto prev_value =
         torch::from_blob(workspace + offset + value_offset,
                          {bsz, heads, all_tokens, k},
-                         {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(),
-                          k * InferenceContext::Instance().GetMaxTokenLength(),
+                         {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
+                          static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),
                           k,
                           1},
                          options);
@@ -1578,7 +1578,7 @@ std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor& input,
     auto output = at::from_blob(output_ptr, input.sizes(), options);
     auto inp_norm = at::from_blob(inp_norm_ptr, input.sizes(), options);
     auto intermediate_gemm =
-        at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options);
+        at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);

     auto act_func_type = static_cast<ActivationFuncType>(activation_type);
@GotAudio
Copy link

I compiled Deepspeed v0.11.1 for windows, cuda 12.1. [python 3.10+Torch 2.1.0+cu121]
I know nothing about C++. I just verified and fixed marcoseduardopm's 4 errors and fixed 4 others as ChatGPT suggested.
See change details, screenshot and commit comment. Download and unrar. (the .whl was too big for github); GotAudio/data@5c5657f

pip install deepspeed-0.11.2+244040c1-cp310-cp310-win_amd64.whl

I had to use these settings;
set DS_BUILD_EVOFORMER_ATTN=0
set DISTUTILS_USE_SDK=1
set DS_BUILD_AIO=0
set DS_BUILD_SPARSE_ATTN=0

@cdfpaz
Copy link

cdfpaz commented Oct 31, 2023

this shit still open!!!!??? Bye Bye Microsoft

@XeonKHJ
Copy link

XeonKHJ commented Jan 18, 2024

Why the heck this is still open?

@ai-robert
Copy link

NOTE: Training will not work on Windows AT ALL, not even with WSL/WSL2, and not by running Linux in a Virtual Machine.

Since my last post I learned some relevant things:
Training will not work on Windows AT ALL, not even with WSL/WSL2, and not by running Linux in a Virtual Machine. This is because deepspeed is programmed to use the linux CUDA drivers specifically.

  1. You may think that since WSL/WSL2 can run CUDA then this is not a problem, but this is not the case! That is because WSL/WSL2 actually use the Windows CUDA drivers. You can not install the Linux CUDA drivers on WSL/WSL2. If you try then you will see a message from Nvidia telling you to download the Windows CUDA drivers instead, informing you that WSL/WSL2 makes use of the Windows CUDA drivers.
  2. You may try to get around this by running Linux in a virtual machine like virtual box, however this also will not work because no virtualization software that runs on Windows* will pass through access to the GPU resources.
    *I read something somewhere that some kind of Windows Server version in some datacenters may actually allow this, but I have not checked, I'm referring to any Windows you might be able to run on consumer hardware.

costin-eseanu added a commit that referenced this issue Jun 24, 2024
Fix #2427

---------

Co-authored-by: Costin Eseanu <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
@d8ahazard
Copy link
Author

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.