-
Can multiple GPUs be used to load the model? FE: I have 2 GPUs (2x 8gb), with 1 I can load the medium model, but the large model crashes due to lack of memory. The model uses only 1 GPU, so is it possible to set up whisper (1 model/record) for multiple GPUs? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
It's possible to load the encoder on one GPU and the decoder on the other, with a bit of hack: First, please update the package so it has the latest commit (I made a minor modification for this): pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git And then something like this is possible: import whisper
model = whisper.load_model("large", device="cpu")
model.encoder.to("cuda:0")
model.decoder.to("cuda:1")
model.decoder.register_forward_pre_hook(lambda _, inputs: tuple([inputs[0].to("cuda:1"), inputs[1].to("cuda:1")] + list(inputs[2:])))
model.decoder.register_forward_hook(lambda _, inputs, outputs: outputs.to("cuda:0"))
model.transcribe("jfk.flac") The code above uses In my 2-GPU machine, the VRAM usage after executing the snippet above is:
|
Beta Was this translation helpful? Give feedback.
-
Did you see a large difference in transcription speed after this? |
Beta Was this translation helpful? Give feedback.
-
nope have not tried that. |
Beta Was this translation helpful? Give feedback.
It's possible to load the encoder on one GPU and the decoder on the other, with a bit of hack:
First, please update the package so it has the latest commit (I made a minor modification for this):
And then something like this is possible: