Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

mjgallow · 2016-08-05T19:53:10Z

I'm probably being dense or missing something, but speech_asynch_rest.py doesn't seem to work with Google Cloud Storage, and I believe you have to use Google Cloud Storage to transcribe long audio files.
I tried to find an answer to this on my own, but so far no luck. Hopefully I'm posting my issue in the right place, and following your guidelines for reporting issues (I did look over the guidelines that I could find). If not, let me know.

Here is an example Google Storage audio file to transcribe:
https://storage.googleapis.com/cloud-samples-tests/speech/brooklyn.flac

Below is what I'm entering and what errors I'm seeing when I try this in Windows 7 and Mac OS X El Capitan.

Also, note that I can run the examples fine that you list on https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api/README.md .
Also, note that I've been able successfully obtain transcripts for long audio files in my Google Cloud Storage using another method that involves curl.

I changed the audio part of the JSON object in speech_asynch_rest.py (currently line 68 or so, I believe) to the following only when trying to access the Google Cloud Storage audio file:

'uri': speech_content.decode('UTF-8')

If you need more information, let me know.

_Windows 7 PC Attempt, with error feedback included_
Open Window cmd.exe (Click Start. Type in "cmd" (without quotes). Press Enter/Return.)
"export" command doesn't work in DOS
Here's what I typed in at the prompt (username and specific project name and id replaced).
C:\Python\python-docs-samples-master\speech\api> cd C:/Users/USERNAME/env/Scripts
C:\Python\python-docs-samples-master\speech\api> call activate.bat
C:\Python\python-docs-samples-master\speech\api> cd C:/Python/python-docs-samples-master/speech/api
(env) C:\Python\python-docs-samples-master\speech\api>set GOOGLE_APPLICATION_CREDENTIALS=C:\Python\My_Project-SOME_NUMBER.json
(env) C:\Python\python-docs-samples-master\speech\api>python speech_rest.py resources/audio.raw
{"results": [{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]}
(env) C:\Python\python-docs-samples-master\speech\api>python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER_HERE"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"transcript": "how old is the Brooklyn Bridge", "confidence": 0.98267895}]}]
(env) C:\Python\python-docs-samples-master\speech\api>python speech_async_rest.py gs://cloud-samples-tests/speech/brooklyn.flac
Traceback (most recent call last):
File "speech_async_rest.py", line 101, in
main(args.speech_file)
File "speech_async_rest.py", line 52, in main
with open(speech_file, 'rb') as speech:
OSError: [Errno 22] Invalid argument: 'gs://cloud-samples-tests/speech/brooklyn.flac'

_Mac OS X El Capitan Attempt, with error feedback included_
Click Applications > Utilities > Terminal.
Here's what I typed in at the prompt (username, transcription number, and specific project name and id replaced).
$ cd Desktop/python-docs-samples-master/speech/api
$ source env/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS=/Users/USERNAME/Desktop/google_stuff/My_Project-SOME_NUMBER.json
$ python speech_rest.py resources/audio.raw
{"results": [{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]}
$ python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER_HERE"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]
$ python speech_async_rest.py gs://cloud-samples-tests/speech/brooklyn.flac
Traceback (most recent call last):
File "speech_async_rest.py", line 101, in
main(args.speech_file)
File "speech_async_rest.py", line 52, in main
with open(speech_file, 'rb') as speech:
IOError: [Errno 2] No such file or directory: 'gs://cloud-samples-tests/speech/brooklyn.flac'

The text was updated successfully, but these errors were encountered:

puneith · 2016-08-12T17:04:35Z

@mjgallow Can you please try the grpc version https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api/speech_async_grpc.py

mjgallow · 2016-08-13T23:24:56Z

Sorry that I missed that I was supposed to use speech_async_grpc.py for long audio file transcription. Thanks! speech_async_grpc.py works for transcribing short FLAC audio files and (with a little tweaking) long, properly prepared raw audio files in Google Cloud Storage (need to specify uri where file is in Google Cloud Storage), at least for me, on Mac OS X El Capitan. (Didn't try on Windows 7.)

Again, thanks! I'll close this ticket/issue. Below are some notes that might help others.

I tried with my own publicly available Google Cloud Storage raw audio file.

I properly converted the mp4 file to an mp3 file and then to the properly encoded raw file format using VLC (not shown below as I didn't use the application through the command line) and sox.

Note that with sox, when you specify the input file to be converted, you need to specify the channels, bits, and rate.

The soxi and file commands used before the sox command below provide the channels, bits, and rate information for the input file to be converted. The file command is probably unnecessary, actually.

Also, I had to change lines 64 and 65 in speech_asych_grpc.py to the following (maintaining indentation) to transcribe the long raw audio file:
encoding='LINEAR16', # one of LINEAR16, FLAC, MULAW, AMR, AMR_WB
sample_rate=16000, # the rate in hertz

I wanted output to go to text file instead of being printed out in the console, so I made the following code changes to my downloaded copy of speech_async_grpc.py.
I commented out the following code at line 99.
print(results)
I added the following code after that commented out line so that the transcription with the Google Cloud transcription name/number would be saved to a text file on my desktop, instead of outputting to the Terminal console. Note, you'll need to replace "USER_NAME" if not the entire file path in the first line below.
transcriptfilepath = '/Users/USER_NAME/Desktop/transcript.txt'
transcriptfile = open(transcriptfilepath, 'w+')
transcriptfile.write("name: "+str(name)+"\n")
transcriptfile.write(str(results))
transcriptfile.close()
print("Success! Output is in "+transcriptfilepath+"\n")

Mac OS X El Capitan Successful Attempt procedure and Terminal command line commands, with output in Terminal:

Click Applications > Utilities > Terminal.
Here's what I typed in at the prompt (username, transcription name/number, specific project name and id, and some other stuff replaced).
$ cd Desktop/python-docs-samples-master/speech/api
$ source env/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS=/Users/USERNAME/Desktop/google_stuff/My\ Project-SOME_NUMBER.json
$ gcloud auth activate-service-account --key-file=/Users/USERNAME/Desktop/google_stuff/My\ Project-SOME_NUMBER.json
$ python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]
$ python speech_async_grpc.py gs://cloud-samples-tests/speech/brooklyn.flac
BUNCH_OF_CHARACTERS_AND_NUMBERS] Using polling engine: poll
name: "LONG_NUMBER_HERE"
Waiting for server processing...
Waiting for server processing...
results {
alternatives {
transcript: "how old is the Brooklyn Bridge"
confidence: 0.982678949833
}
}
$ soxi /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
$ file /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
$ sox --channels=2 --bits=16 --rate=44100 /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
--channels=1 --bits=16 --rate=16000 /users/USERNAME/Downloads/NEW_FILE.raw
$ python speech_async_grpc.py gs://GOOGLE_CLOUD_STORAGE_BUCKET/NEW_FILE.raw
BUNCH_OF_CHARACTERS_AND_NUMBERS] Using polling engine: poll
name: "LONG_NUMBER_HERE"
Waiting for server processing...
...LINE_REPEATS_OVER_AND_OVER_AGAIN_UNTIL_OUTPUT_APPEARS...
Waiting for server processing...
Success! Output is in /Users/USERNAME/Desktop/transcript.txt
$

theacodes added the ML label Aug 5, 2016

theacodes assigned jerjou and puneith Aug 5, 2016

mjgallow closed this as completed Aug 13, 2016

ipuris mentioned this issue Dec 26, 2018

Speech: long audio files produce error googleapis/google-cloud-python#7024

Closed

yoshi-automation added the 🚨 This issue needs some love. label Apr 7, 2020

yoshi-automation unassigned jerjou Apr 7, 2020

yoshi-automation added the triage me I really want to be triaged. label Apr 7, 2020

arbrown mentioned this issue Nov 17, 2022

migrate code from googleapis/python-automl #8528

Merged

8 tasks

arbrown pushed a commit that referenced this issue Nov 17, 2022

chore(deps): update all dependencies (#441)

a9b9a61

dandhlee pushed a commit that referenced this issue Nov 17, 2022

chore(deps): update all dependencies (#441)

6baaefb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

mjgallow commented Aug 5, 2016

puneith commented Aug 12, 2016

mjgallow commented Aug 13, 2016

Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

Comments

mjgallow commented Aug 5, 2016

puneith commented Aug 12, 2016

mjgallow commented Aug 13, 2016