Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added MPS support #550

Merged
merged 6 commits into from
Aug 11, 2023
Merged

Added MPS support #550

merged 6 commits into from
Aug 11, 2023

Conversation

Jerry-Master
Copy link
Contributor

I managed to make the code work for do_tts.py and the is_this_from_tortoise.py on Apple Silicon. It takes the following times to generate a single phrase on various presets:

  • ultra-fast: ~30secs.
  • fast: ~1min
  • standard: ~4mins.

I run the testing on my mac M1 Max with macOS 13.5. You need to have macOS 13.3+ to have int64 support on MPS. And you also need to use the nightly version of pytorch. Not all the operations are run in the GPU, I modified the code to fall back to CPU in some cases that gave errors. I also recommend running the code with PYTORCH_ENABLE_MPS_FALLBACK=1. In the future many more operations like the aten::_fft_r2c will have support in pytorch with MPS backend and so we can hope the inference times to get better, but for now, it is what it is.

Sidenote

The classifier that detects whether a clip is generated or not can be easily fooled by re-recording the audio.

@mattorp
Copy link

mattorp commented Aug 11, 2023

Thanks @Jerry-Master !

Works on the baseline MacBook M1 Pro as well. macOS Sonoma 14 Beta 4

@manmay-nakhashi
Copy link
Collaborator

@Jerry-Master have you checked if this doesn't break linux and windows ?

@manmay-nakhashi
Copy link
Collaborator

@Jerry-Master if you confirm this I'll merge it.

@Jerry-Master
Copy link
Contributor Author

I can only confirm that I have tested it on Rocky Linux 8.8 with NVIDIA GPU 3090. I wrote everything to be backward compatible, so I have no reasons to assume it could break on Windows or Debian Linux, but have no device to check it at the moment.

@mattorp
Copy link

mattorp commented Aug 11, 2023

Note that it needs pytorch nightly/2.1 to work:

pytorch/pytorch#96610 (comment)

@Jerry-Master
Copy link
Contributor Author

Yes, I said that in the description of the pull request, but the nightly is only necessary for the MPS backend version, for CUDA it works as before. That nightly requirement will take probably months to delete since MPS is slowly developing.

@mattorp
Copy link

mattorp commented Aug 11, 2023

@Jerry-Master are you looking into the read.py currently? I reused another implementation for text splitting when switching to tortoise and need to improve it now. So I'd like to help if relevant.

@Jerry-Master
Copy link
Contributor Author

I haven't tried read.py, I just assumed it would work out of the box because I changed all CUDA occurrences in the repo to MPS. Migrating from CUDA to MPS is fairly straightforward, you just need to check if torch.backends.mps.is_available() and based on that you use mps as device instead of cuda. For operations that are not yet supported you delegate the operation to CPU, and for operations that use a type not supported you simply cast to a supported type prior to the operation. If you need the read.py function try it for yourself and describe any bug you find for the MPS version.

@manmay-nakhashi
Copy link
Collaborator

Can you update in readme section about nightly version installation for MPS in same pl?

@mattorp
Copy link

mattorp commented Aug 11, 2023

Thanks! I receive this error:

File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2251, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Haven't looked into it yet, but can begin in ~15 min

@Jerry-Master
Copy link
Contributor Author

@mattorp can you provide a full traceback so that I can see the line that is causing the error? That line points to a torch file, but the error must be caused elsewhere in the repository.

@mattorp
Copy link

mattorp commented Aug 11, 2023

/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[2023-08-11 14:49:18,678] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-08-11 14:49:18,679] [WARNING] [config_utils.py:75:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-08-11 14:49:18,679] [INFO] [logging.py:93:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Using /Users/USER/Library/Caches/torch_extensions/py311_cpu as PyTorch extensions root...
Detected CUDA files, patching ldflags
Traceback (most recent call last):
  File "/Users/USER/github/tts/tortoise-tts/tortoise/read.py", line 33, in <module>
    tts = TextToSpeech(models_dir=args.model_dir, use_deepspeed=args.use_deepspeed, kv_cache=args.kv_cache, half=args.half)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/USER/github/tts/tortoise-tts/tortoise/api.py", line 243, in __init__
    self.autoregressive.post_init_gpt2_config(use_deepspeed=use_deepspeed, kv_cache=kv_cache, half=self.half)
  File "/opt/homebrew/lib/python3.11/site-packages/TorToiSe-2.6.0-py3.11.egg/tortoise/models/autoregressive.py", line 380, in post_init_gpt2_config
    self.ds_engine = deepspeed.init_inference(model=self.inference_model,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 136, in __init__
    self._apply_injection_policy(config)
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 363, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 534, in replace_transformer_layer
    replaced_module = replace_module(model=model,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 799, in replace_module
replaced_module,_ = _replace_module(model, policy)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 826, in_replace_module
    _, layer_id =_replace_module(child, policies, layer_id=layer_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 826, in_replace_module
    _, layer_id =_replace_module(child, policies, layer_id=layer_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 816, in_replace_module
    replaced_module = policies[child.__class__][0](child,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 524, in replace_fn
    new_module = replace_with_policy(child,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 385, in replace_with_policy
    _container.create_module()
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/containers/gpt2.py", line 16, in create_module
    self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 18, in __init__
    super().__init__(config,
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 53, in __init__
    inference_cuda_module = builder.load()
                            ^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
                ^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1302, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1525, in_jit_compile
    _write_ninja_file_and_build_library(
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1617, in_write_ninja_file_and_build_library
    extra_ldflags = _prepare_ldflags(
                    ^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1716, in_prepare_ldflags
    if (not os.path.exists(_join_cuda_home(extra_lib_dir)) and
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2251, in_join_cuda_home
    raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

@Jerry-Master
Copy link
Contributor Author

Seems an issue with deepspeed, for some reason in do_tts.py that flag is False by default and in read.py is True by default. Try changing that. I won't have my mac today so can't test it sorry. Other day I may try adding support for deepspeed too.

@mattorp
Copy link

mattorp commented Aug 11, 2023

I tried that with no luck 🤔 I'll keep poking around and get back to you

PYTORCH_ENABLE_MPS_FALLBACK=1 python tortoise/read.py --use_deepspeed False --textfile '/Users/USER/combined.md' --voice random --preset ultra_fast --half True
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[2023-08-11 15:00:29,327] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-08-11 15:00:29,327] [WARNING] [config_utils.py:75:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-08-11 15:00:29,327] [INFO] [logging.py:93:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
Using /Users/USER/Library/Caches/torch_extensions/py311_cpu as PyTorch extensions root...
Detected CUDA files, patching ldflags
Traceback (most recent call last):
  File "/Users/USER/github/tts/tortoise-tts/tortoise/read.py", line 41, in <module>
    tts = TextToSpeech(models_dir=args.model_dir, use_deepspeed=args.use_deepspeed,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/USER/github/tts/tortoise-tts/tortoise/api.py", line 243, in __init__
    self.autoregressive.post_init_gpt2_config(use_deepspeed=use_deepspeed, kv_cache=kv_cache, half=self.half)
  File "/opt/homebrew/lib/python3.11/site-packages/TorToiSe-2.6.0-py3.11.egg/tortoise/models/autoregressive.py", line 380, in post_init_gpt2_config
    self.ds_engine = deepspeed.init_inference(model=self.inference_model,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 136, in __init__
    self._apply_injection_policy(config)
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/inference/engine.py", line 363, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 534, in replace_transformer_layer
    replaced_module = replace_module(model=model,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 799, in replace_module
replaced_module,_ = _replace_module(model, policy)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 826, in_replace_module
    _, layer_id =_replace_module(child, policies, layer_id=layer_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 826, in_replace_module
    _, layer_id =_replace_module(child, policies, layer_id=layer_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 816, in_replace_module
    replaced_module = policies[child.__class__][0](child,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 524, in replace_fn
    new_module = replace_with_policy(child,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/replace_module.py", line 385, in replace_with_policy
    _container.create_module()
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/module_inject/containers/gpt2.py", line 16, in create_module
    self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 18, in __init__
    super().__init__(config,
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 53, in __init__
    inference_cuda_module = builder.load()
                            ^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
    return self.jit_load(verbose)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
    op_module = load(
                ^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1302, in load
return_jit_compile(
           ^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1525, in_jit_compile
    _write_ninja_file_and_build_library(
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1617, in_write_ninja_file_and_build_library
extra_ldflags =_prepare_ldflags(
                    ^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1716, in_prepare_ldflags
    if (not os.path.exists(_join_cuda_home(extra_lib_dir)) and
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2251, in_join_cuda_home
    raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

@mattorp
Copy link

mattorp commented Aug 11, 2023

Oh, changing the default, not the flag helped. Verifying if it completes now

@manmay-nakhashi
Copy link
Collaborator

manmay-nakhashi commented Aug 11, 2023

You can do export CUDA_HOME="path_to_your_cuda" and then run it with deepspeed , i don't know if there is MPS support in deepspeed, if not you can skip that if mps is active and check.

@mattorp
Copy link

mattorp commented Aug 11, 2023

I don't believe cuda is available on the M chips, right?

It keeps hanging at 0% even for short texts: Hello

PYTORCH_ENABLE_MPS_FALLBACK=1 python read.py --textfile '/Users/USER/github/tts/tortoise-tts/debug/hello.md' --voice random --preset ultra_fast --half True
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Generating autoregressive samples..
  0%|                                                     | 0/1 [00:00<?, ?it/s]

While this works:

PYTORCH_ENABLE_MPS_FALLBACK=1 python do_tts.py --text 'Hello' --voice random --preset ultra_fast --half True

I have to reset my environment for do_tts.py to work after running and canceling read.py (for now just closing and opening the terminal does the trick). Maybe an indicator of something?

@manmay-nakhashi
Copy link
Collaborator

I think deepspeed doesn't support Mac so it is always looking for cuda_home and it's not there for you.

@Jerry-Master
Copy link
Contributor Author

Can't know how to fix it without trying. Also, I didn't use --half True since float16 doesn't have support in MPS, or has very little support. For instance, autocast does not work. It may not affect the result since I already modified all autocast calls but it could be giving errors elsewhere.

@mattorp
Copy link

mattorp commented Aug 11, 2023

Removing PYTORCH_ENABLE_MPS_FALLBACK=1 With use_deepseed default=False in read.py works for me now!

python read.py \
      --half True \
      --voice random \
      --preset ultra_fast \
      --output_path "debug" \
      --textfile 'debug/test.md'

With use_deepseed default=False in read.py works, while using --use_deepspeed False when calling it doesn't — it tries to use CUDA.

PYTORCH_ENABLE_MPS_FALLBACK=1 works for do_tts.py however. Hope that helps with the investigation.

@Jerry-Master
Copy link
Contributor Author

I'll give it a look when I have time, but doesn't make any sense. PYTORCH_ENABLE_MPS_FALLBACK=1 is only telling the program to use the CPU for operation that are not implemented for the GPU. But the error seems to definitely be related to deepspeed. I will change the default in the pull request and update the README.

@mattorp
Copy link

mattorp commented Aug 11, 2023

Thanks for the speedy responses to this!

@manmay-nakhashi
Copy link
Collaborator

@Jerry-Master let's skip deepspeed while doing inference and handle that in autoregressive.py , once this changes are done , I'll merge this.

@Jerry-Master
Copy link
Contributor Author

I have removed the option for deepspeed after argument parsing. Under the mps version that flag is ignored. It will be awesome if @mattorp could test it.

@manmay-nakhashi
Copy link
Collaborator

should i merge this now ?

@mattorp
Copy link

mattorp commented Aug 11, 2023

Works for me!

Another note: For the lower specced (16 GB ram) the user might need to split the text into smaller chunks to get it working.

@chigkim
Copy link

chigkim commented May 16, 2024

Is mps still supported on tortoise v2?

https://huggingface.co/jbetker/tortoise-tts-v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants