Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No outputs #11

Closed
avc1657 opened this issue Apr 10, 2023 · 46 comments
Closed

No outputs #11

avc1657 opened this issue Apr 10, 2023 · 46 comments

Comments

@avc1657
Copy link

avc1657 commented Apr 10, 2023

I spent half an hour running the large-v2 model on a 25 minutes video. At the end of the process, there were no outputs.

The command i used: whisper-ctranslate2 [the video file] --model large-v2 --output_format srt --output_dir .\ --word_timestamps True --no_speech_threshold 0.2 --logprob_threshold None

GPU -> GTX 1060 (6GB VRAM model)
Average VRAM used by whisper-ctranslate2 during the process -> varies from 2.5 to 4.5GB
Windows 10

Edit: tried with tiny model. Doesnt work either. No outputs.

@jordimas
Copy link
Collaborator

Thanks for reporting this

If you can do the following:

  1. Please make sure that you use version 0.16. If you are not, please update to this version.

  2. While the tool is running, can you see anything on the terminal? You should see the transcription that is doing.

  3. Can you try just "whisper-ctranslate2 [the video file] --model large-v2". Does it work?

Thanks

@avc1657
Copy link
Author

avc1657 commented Apr 10, 2023

  1. Please make sure that you use version 0.16. If you are not, please update to this version.

Im on 0.17, checked by running " whisper-ctranslate2 --version "

  1. While the tool is running, can you see anything on the terminal? You should see the transcription that is doing.

Yes, the transcriptions appears on the terminal

  1. Can you try just "whisper-ctranslate2 [the video file] --model large-v2". Does it work?

Ok, ive just tried this and noticed something. Btw i decided to run on a 2 minutes flac audio to speed up things. I ran the program using "whisper-ctranslate2 [the audio file] --model tiny": didnt work. Then i ran with large-v2 and to my surprise, it worked. Then i tried again with large-v2 and it worked again. Then i came back to tiny and it stopped working. Then i tried with base: doesnt work. THen i finally tried with large-v2 again and it worked. But previously even the large-v2 was not working.

@jordimas
Copy link
Collaborator

Hello. I'm unable to reproduce this problem in my Windows machine.

My only comment if you have tried doing inference in CPU vs GPU and if this makes any difference.

Thanks

@tariq0101
Copy link

Hi, I Have the same problem, transcription appears on the screen until the end of the duration but no files are produced.
the model is only using 50% of vram so it's definitely not crashing.
I'm also on windows, python3.9
this only happens on some files, smaller files or "clearer" files work fine, I think it's looping in the end or something like that.
"--vad_filter True" doesn't seem to do anything.

@jordimas
Copy link
Collaborator

Do you have any file that you can share then I can try to reproduce it? Thanks

@tariq0101
Copy link

tariq0101 commented Apr 15, 2023

Hi, I found that this only happens on GPU, it produces output when I add "--device CPU"
I'm sorry I can't provide anything because it's only happening on my personal videos.
videos with a clear professional audio isn't looping and producing output, it's probable an issue with whisper itself and not your software.
is there a way to produce log files?

@rsmith02ct
Copy link

rsmith02ct commented Apr 16, 2023

Hi Jordimas, I am having similar issues.

The first is that nothing gets output unless output type and location are set (though perhaps that is by design?)

The second is that unless I add "--device CPU" no data is returned- I just go back to the command prompt. This is true for short clear wav, longer mp4, English and Japanese.

I have a RTX 2080 Super with the current studio driver (531.61). I am able to use basic Whisper installations with CUDA as well as Const-me, etc. Is there something I need to set up here or in NVIDIA control panel?

For test video we can use the same one I shared before.

whisper-ctranslate2.exe --language ja --model "large-v2" --device CPU --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4"
This works- actually very well in terms of quality! No issues at all.

Change to CUDA and it fails.

whisper-ctranslate2.exe --language ja --model "large-v2" --device CUDA --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4"

Screenshot 2023-04-16 13 30 15

Base model, etc. also fail.

NVIDIA Control Panel reports I have NVIDIA CUDA 12.1.107 driver. It has a compute capability of 7.5.
I also installed the standalone cuda_12.1.0_531.14_windows.exe

@jordimas
Copy link
Collaborator

jordimas commented Apr 16, 2023

The first is that nothing gets output unless output type and location are set (though perhaps that is by design?)

No, this is not by design. By design it outputs all formats and writes in the current directory that you are.

@rsmith02ct Is possible please to create a separate ticker for this issue? It's different to the other one. Thanks

@tariq0101
Copy link

I think it's exactly the same problem.
the software finishes transcribing on GPU but no output files are created, you can still copy the results from the terminal.
the software finishes transcribing in CPU and creates output files.
this is probably a bug in faster-whisper if you can't find any problems in your code.

@Zacharie-Jacob
Copy link

Zacharie-Jacob commented Apr 16, 2023

I had this same problem. I was unable to pinpoint it to specifically whisper-ctranslate2, but the problem is exactly the same as yours. It displays the translation. There are no errors. No output files are written.

It does write out if I choose a very small file (like a minute or two long), but longer files just mysteriously do not have any outputs.

I do not know enough about the code itself to know if it makes sense that longer files would not produce outputs but shorter files will.

@dgoryeo
Copy link

dgoryeo commented Apr 16, 2023

I can confirm that I have the same problem. No output file is created. My command line is (in powershell):

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

@Zacharie-Jacob
Copy link

I can confirm that I have the same problem. No output file is created. My command line is (in powershell):

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

Could you try running this on a clip that is only one or two minutes, and see if it works? That seems like it works for me, which may help narrow down a cause if that is a reproducible pattern.

@dgoryeo
Copy link

dgoryeo commented Apr 17, 2023

Hi @Zacharie-Jacob , I tried additional test on a 4min clip with same command line:

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

Here are the results:

  • 4min video mp4: No output.
  • 4min audio wav: No output.
  • 4min audio wav 16khz mono: Works! The srt was generated.

I repeated the same test with --device cpu , and it worked well on all 3 above tests.

@tariq0101
Copy link

I can confirm that 16khz mono conversion works, but a lot of the information are lost and the output is very different than CPU on original file.

@rsmith02ct
Copy link

Hmm, here I don't see any text in the cmd terminal window when --cuda is enabled (and there's no text output). When set to CPU it works fine on every file I've given it in English and Japanese. I'm using an NVIDIA RTX 2080 Super with the current studio driver and CUDA SDK also installed (Windows 11).

@runw99
Copy link

runw99 commented Apr 20, 2023

In my environment, I can almost stably trigger the bug. It prints completely in command line, but nothing outputs in current directory and there is a windows error python has stopped working. The problem is probably about dictionary referencing and memory reclamation issue. My temporary solution is to transfer writer of whisper_ctranslate2.py to transcribe.py. Although it damages the code structure, it is currently important for me that it works

# \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\whisper_ctranslate2.py
def main():
    ...
    for audio_path in audio:
        result = Transcribe().inference(
            ...
            output_format, 
            output_dir,
            audio_path,
        )
        # writer = get_writer(output_format, output_dir)
        # writer(result, audio_path)
# \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\transcribe.py
class Transcribe:
    ...
    def inference(
        ...
        output_format, 
        output_dir,
        audio_path,
    ):
        ...
        
        result = dict(
            text=all_text,
            segments=list_segments,
            language=language_name,
        )

        from .writers import get_writer
        writer = get_writer(output_format, output_dir)
        writer(result, audio_path)

        # return result

The detailed process of my debugging

environment

OS: Windows 10
python: 3.9.16150.1013
GPU: GTX1660ti (mobile)
IDE: VS code

package:
numpy==1.23.3
faster-whisper==0.4.1
ctranslate2==3.11.0
tqdm==4.65.0
sounddevice==0.4.6

trigger the bug

  1. Audio file: 5m.mp3. about 100 segments.

  2. Model: guillaumekln/faster-whisper-tiny or guillaumekln/faster-whisper-large-v2

  3. cmd or powershell whisper-ctranslate2 ".\5m.mp3" --language Japanese --model_directory "..\model\faster-whisper-tiny"

  4. It will print the results on the screen correctly. After that, the python has stopped working and no output files.

Set the breakpoint

# whisper_ctranslate2\whisper_ctranslate2.py
for audio_path in audio:
    result = Transcribe().inference(...) 
    print(result) # some operation. Setting breakpoint here and moving the mouse on result will trigger `python has stopped working`

error analysis (unconfirmed)

  1. small audio file works well but failed in large file.

  2. openai/whisper works well for me. The difference with openai/whisper is that in whisper_ctranslate2, def transcribe(...) has been changed to:

class Transcribe:
	...
	def inference(...):
        	list_segments = []
        	last_pos = 0
        	accumated_inc = 0
        	all_text = ""
		...
		return dict(
            		text=all_text,
            		segments=list_segments,
            		language=language_name,
        	)

I guess it is suspected that list_segments is a local variable of Transcribe.inference, and after calling result = Transcribe().inference(...), the memory recycling mechanism causes the memory pointed to by result["segments"] to be recycled.

list_segments = [
    { },
    ...
]

Some failed attempts

ucrtbase.dll

In Windows Event Viewer, we can see that the crash seems to be related to ucrtbase.dll. However, I have tried search it online but no result related and I have tried updated it but it also doesn't work.

Writers

  1. Replace the main content of whisper_ctranslate2/writer.py'with openai/whisper/utils. py and make modifications, useless

  2. Place the content of writer. py directly in whisper_ctranslate2/whisper_ctranslate2.py is also useless.

@jordimas
Copy link
Collaborator

jordimas commented Apr 20, 2023

Thanks for investing time on this @runw99

Regarding memory, Python uses reference counting then it should delete the variable when it does out of scope.

Here you have an article that explains how memory works in Python:

https://rushter.com/blog/python-garbage-collector/

Actually you have check the reference that it has by doing:

import sys
print(sys.getrefcount(foo))

I have no idea why this happens, but I do not believe that is due to the variable going out of scope (it's recycled)

@runw99
Copy link

runw99 commented Apr 20, 2023

Thanks for investing time on this @runw99

Regarding memory, Python uses reference counting then it should delete the variable when it does out of scope.

Here you have an article that explains how memory works in Python:

https://rushter.com/blog/python-garbage-collector/

Actually you have check the reference that it has by doing:

import sys print(sys.getrefcount(foo))

I have no idea why this happens, but I do not believe that is due to the variable going out of scope (it's recycled)

Thansk for your reply. The article you mentioned helps me review the Garbage Collection in Python and learn something new.
And I went back and tried some copy.deepcopy(list_segments) operations, but still couldn't solve this bug. So, perhaps the Garbage Collection is really not the reason for this.

I have never encountered such a bug before, and I am curious about its causes and solutions. Looking forward to the follow-up

Thank you again for the patient answer and this project really saves me a lot of effort to run a big model.

@nikes
Copy link

nikes commented Apr 20, 2023

I ran 355 files, ranging in length from 10 to 120 minutes.
In the output I got 150(*5) files with text.
So I confirm that there is definitely a problem.
The original whisper project works correctly, so it's strange...

@Purfview
Copy link

@rsmith02ct reported that my standalone compile doesn't have this bug. [it doesn't use cli from this repo]

I can confirm that 16khz mono conversion works, but a lot of the information are lost and the output is very different than CPU on original file.

Faster-whisper converts to same audio format using PyAV library, OpenAI is using ffmpeg.
Strangely, transcription quality and timestamps accuracy ~significantly suffers on audios converted by ffmpeg.exe, no idea why this happens, I'm too lazy to investigate this...

@dgoryeo
Copy link

dgoryeo commented Apr 20, 2023

I second @rsmith02ct , I too have noticed that when I convert audio by Audacity, the results are better than ffmpeg.

@zx3777
Copy link

zx3777 commented Apr 23, 2023

same problem , 1.0 could outputs , but will frequently missing large dialogues

@Qel0droma
Copy link

have the same problem. Can see all the text in the powershell of it transcribing and translating, then when its done. nothing. no srt files are generated. whisper-ctranslate2 "file name here.mp4" --device cuda --device_index 0 --vad_filter true --vad_min_speech_duration_ms 50 --vad_min_silence_duration_ms 2000 --vad_max_speech_duration_s 10 --condition_on_previous_text False --language Japanese --task translate --output_format srt --model large-v2

@jordimas
Copy link
Collaborator

@Qel0droma Are you using a GPU?

@Qel0droma
Copy link

@Qel0droma Are you using a GPU?

yes

@emcodem
Copy link

emcodem commented Apr 27, 2023

no luck getting any kind of output, using a 16khz wav that i use for testing Const-me whisper and whisper cpp, expected is a 10 minute translation.

C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --model medium
There are old cache files at `C:\Users\emcod\.cache\whisper-ctranslate2` which are no longer used. Consider deleting them
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Downloading (…)56e98277/config.json: 100%|█████████████████████████████████████████| 2.26k/2.26k [00:00<00:00, 752kB/s]
C:\python3100\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\emcod\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)98277/vocabulary.txt: 100%|██████████████████████████████████████████| 460k/460k [00:00<00:00, 2.17MB/s]
Downloading (…)98277/tokenizer.json: 100%|████████████████████████████████████████| 2.20M/2.20M [00:01<00:00, 2.18MB/s]
Downloading model.bin: 100%|██████████████████████████████████████████████████████| 1.53G/1.53G [03:02<00:00, 8.39MB/s]

C:\Users\emcod>

Or some try with default params

C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --language de
There are old cache files at `C:\Users\emcod\.cache\whisper-ctranslate2` which are no longer used. Consider deleting them
Detected language 'German' with probability 1.000000

Then me follows the instructions and delete "old cache files" C:\Users\emcod\.cache\whisper-ctranslate2 (i delete the whole .cache folder):


C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --language de
Downloading (…)e94b4c8a/config.json: 100%|████████████████████████████████████████| 2.37k/2.37k [00:00<00:00, 1.19MB/s]
C:\python3100\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\emcod\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)b4c8a/vocabulary.txt: 100%|██████████████████████████████████████████| 460k/460k [00:00<00:00, 1.44MB/s]
Downloading (…)b4c8a/tokenizer.json: 100%|█████████████████████████████████████████| 2.20M/2.20M [00:07<00:00, 308kB/s]
Downloading model.bin: 100%|████████████████████████████████████████████████████████| 484M/484M [00:57<00:00, 8.35MB/s]
Detected language 'German' with probability 1.000000███████████████████████████████| 2.20M/2.20M [00:07<00:00, 309kB/s]

C:\Users\emcod>

Try to enable debug logging:

C:\Users\emcod>whisper-ctranslate2 --verbose true c:\temp\test.wav
whisper-ctranslate2: error: argument --verbose: invalid str2bool value: 'true'

same with

C:\Users\emcod>whisper-ctranslate2 --verbose 1 c:\temp\test.wav
whisper-ctranslate2: error: argument --verbose: invalid str2bool value: '1'

Now, read some python docs and see that "true" often is written as "True":

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>

ok, try some other stuff:

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav --compute_type int8
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>

@guillaumekln
Copy link

guillaumekln commented Apr 27, 2023

Hi,

I think it's the same issue as SYSTRAN/faster-whisper#71 which I can now reproduce on Windows.

When the output files are missing, you can verify that the process crashed with a non-zero exit code:

PS > $LASTEXITCODE
-1073740791

The process crashes when the model is unloaded but only when the transcription triggered the temperature fallback. If you disable the temperature fallback it should work without issue. Try adding this option on the command line:

--temperature_increment_on_fallback None

The crash seems to happen only on Windows.

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

@emcodem
Copy link

emcodem commented Apr 27, 2023

Win 11:

C:\Users\emcod>whisper-ctranslate2 --verbose True --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>echo %errorlevel%
-1073740791

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav --compute_type int8
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>whisper-ctranslate2 --verbose True --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>echo %errorlevel%
-1073740791

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None  --language de c:\temp\1234.wav
Detected language 'German' with probability 1.000000

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None  --language de c:\temp\test.wav
Detected language 'German' with probability 1.000000

C:\Users\emcod>echo %errorlevel%
-1073740791

Going to try on other OS tomorrow.

@Zacharie-Jacob
Copy link

Hi,

I think it's the same issue as guillaumekln/faster-whisper#71 which I can now reproduce on Windows.

When the output files are missing, you can verify that the process crashed with a non-zero exit code:

PS > $LASTEXITCODE
-1073740791

The process crashes when the model is unloaded but only when the transcription triggered the temperature fallback. If you disable the temperature fallback it should work without issue. Try adding this option on the command line:

--temperature_increment_on_fallback None

The crash seems to happen only on Windows.

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

Thank you, this fixes my problem. Yes, I am on Windows.

Unfortunately, that setting was particularly useful, as it prevents the translation from falling into ruts. I will have to make do with a combination of other settings for now.

@coder543
Copy link

Even using --temperature_increment_on_fallback None, I am getting zero output (even on the console) if I use the GPU on Windows. I am using a 3090, and I did install the various dependencies as far as I can tell. It would be nice if we got an error message of some kind.

@Zacharie-Jacob
Copy link

Zacharie-Jacob commented Apr 28, 2023

Even using --temperature_increment_on_fallback None, I am getting zero output (even on the console) if I use the GPU on Windows. I am using a 3090, and I did install the various dependencies as far as I can tell. It would be nice if we got an error message of some kind.

It looks like it is linked to general use of Temperature, perhaps? I was under the impression that you can have no temperature increment while still using temperature and best_of, but it looks like I get intermittent missing outputs if I am using any temperature settings at all other than just setting the fallback to None.

@jordimas
Copy link
Collaborator

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

Thanks a lot for looking into this issue. I was trying to get more evidence before reporting it to CTranslate issue, but it's great that you are looking a this.

Based on the feedback on this thread and the fact that I do not even have a Windows box with CUDA to test it, I do not know if it's worth to do a fix in whisper-ctranslate2 or just wait for the issue to be fixed in ctranslate2. I

@jpenney
Copy link

jpenney commented May 10, 2023

Just to see I made a local change to ensure the model was unloaded after outputs were written out. This sort of works, in that if it was going to crash, the files are written out before it crashes, but if you passed multiple files in to be processed it still crashes when the model is unloaded, so:

PS X:\to-process> whisper-ctranslate2 --model large-v2 --task translate --vad_filter True --language ja --output_format all --patience 2.0 -o translate-out file1.wav file2.wav file3.wav

Assuming the crash currently occurs with file2.wav, before the change it only output the files for file1.wav, now it outputs file2.wav then crashes, so file3.wav still isn't processed.

diff --git a/src/whisper_ctranslate2/transcribe.py b/src/whisper_ctranslate2/transcribe.py
index ca53fac..c422037 100644
--- a/src/whisper_ctranslate2/transcribe.py
+++ b/src/whisper_ctranslate2/transcribe.py
@@ -187,7 +187,7 @@ class Transcribe:
                 last_pos = segment.end
                 pbar.update(increment)

-        return dict(
+        return model, dict(
             text=all_text,
             segments=list_segments,
             language=language_name,
diff --git a/src/whisper_ctranslate2/whisper_ctranslate2.py b/src/whisper_ctranslate2/whisper_ctranslate2.py
index 1ff8335..58862a8 100644
--- a/src/whisper_ctranslate2/whisper_ctranslate2.py
+++ b/src/whisper_ctranslate2/whisper_ctranslate2.py
@@ -514,7 +514,7 @@ def main():
         return

     for audio_path in audio:
-        result = Transcribe().inference(
+        model, result = Transcribe().inference(
             audio_path,
             model_dir,
             cache_directory,
@@ -531,6 +531,7 @@ def main():
         )
         writer = get_writer(output_format, output_dir)
         writer(result, audio_path, writer_args)
+        model = None

     if verbose:
         print(f"Transcription results written to '{output_dir}' directory")

So it's not that helpful to try and work around it from whisper-ctranslate2. Hopefully it can be resolved upstream.

@guillaumekln
Copy link

guillaumekln commented May 10, 2023

You could load the model once and then use the same model instance to transcribe each file. This should work around the issue and also be more efficient than reloading the model each time.

@Zacharie-Jacob
Copy link

Is there a good workaround for this? Not having access to Temperature at all results in substantially worse model results.

@umiyuki
Copy link

umiyuki commented Jun 5, 2023

I followed guillaumekln's tip and modified the code:
move the WhisperModel generation to the main function of whisper_ctranslate2.py instead of the inference function. The model should be passed to the inference function. You also need to add
from faster_whisper import WhisperModel
to whisper_ctranslate2.py.
image

@jordimas
Copy link
Collaborator

jordimas commented Jun 5, 2023

Hello @guillaumekln. Do you have a timeline to release OpenNMT/CTranslate2#1201 ? If it's going to take more than a week, I can release a version changing the structure of the code (while my preference is to get this fixed upstream).

Thanks,

Jordi

@guillaumekln
Copy link

guillaumekln commented Jun 5, 2023

Hi, this change does not fix the issue according to user reports in SYSTRAN/faster-whisper#71. I have a hard time debugging this issue as I don't typically develop on Windows.

For now I suggest that you update the code to keep the model alive until all transcriptions are complete.

@jordimas
Copy link
Collaborator

jordimas commented Jun 5, 2023

I will then merge https://github.com/Softcatala/whisper-ctranslate2/pull/44/files in the next hours. This should fix the issue. If somebody wants to provide feedback since I do not have a Windows box handy neither. Thanks

jordimas added a commit that referenced this issue Jun 6, 2023
@jordimas
Copy link
Collaborator

jordimas commented Jun 6, 2023

Version 0.2.6 should fix this.

@jordimas jordimas closed this as completed Jun 6, 2023
@worldjoe
Copy link

Loaded 0.2.7 and sure enough this fixed the problem for me. I had been forced to use --device cpu for a while now, which is significantly slower than cuda with my 3080. Thank you.

@iGerman00
Copy link

Version 0.2.6 should fix this.

Currently am having the same issue on 0.2.7.

C:\Users\igerm\Desktop\whisper〉whisper-ctranslate2 --model large-v2 --language English -f all --verbose True audio.wav
Detected language 'English' with probability 1.000000

And then it exits. CPU works.

@emcodem
Copy link

emcodem commented Jul 26, 2023

Detected language 'English' with probability 1.000000
IMHO that should be fixed, i mean it actually did not "detect" anything because the user disabled automated detection by specifying the language.

@iGerman00 try if this works for you https://github.com/Purfview/whisper-standalone-win

@eric-gitta-moore
Copy link

I also have a similar problem, but in my case, there is no effective output. And the return code is not 0

(whisper) PS D:\BaiduNetdiskDownload> pip list 
Package             Version
------------------- ----------
av                  10.0.0
certifi             2023.11.17
cffi                1.16.0
charset-normalizer  3.3.2
colorama            0.4.6
coloredlogs         15.0.1
ctranslate2         3.23.0
faster-whisper      0.10.0
filelock            3.13.1
flatbuffers         23.5.26
fsspec              2023.12.2
huggingface-hub     0.19.4
humanfriendly       10.0
idna                3.6
mpmath              1.3.0
numpy               1.26.2
onnxruntime         1.16.3
packaging           23.2
pip                 23.3.1
protobuf            4.25.1
pycparser           2.21
pyreadline3         3.4.1
PyYAML              6.0.1
requests            2.31.0
setuptools          68.2.2
sounddevice         0.4.6
sympy               1.12
tokenizers          0.15.0
tqdm                4.66.1
typing_extensions   4.9.0
urllib3             2.1.0
wheel               0.41.2
whisper-ctranslate2 0.3.4
(whisper) PS D:\BaiduNetdiskDownload> whisper-ctranslate2.exe aaa.mp4 --model small --language zh --verbose True                                                                                                                    
stream 0, timescale not set
Detected language 'Chinese' with probability 1.000000
(whisper) PS D:\BaiduNetdiskDownload> 

@ysshin
Copy link

ysshin commented May 2, 2024

Does this problem still exist? I am seeing it, so I think it is...

@zx3777
Copy link

zx3777 commented May 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests