Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get speech_asynch_rest.py to work with Google Cloud Storage to transcribe long audio files #441

Closed
mjgallow opened this issue Aug 5, 2016 · 2 comments
Assignees
Labels
ML 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@mjgallow
Copy link

mjgallow commented Aug 5, 2016

I'm probably being dense or missing something, but speech_asynch_rest.py doesn't seem to work with Google Cloud Storage, and I believe you have to use Google Cloud Storage to transcribe long audio files.
I tried to find an answer to this on my own, but so far no luck. Hopefully I'm posting my issue in the right place, and following your guidelines for reporting issues (I did look over the guidelines that I could find). If not, let me know.

Here is an example Google Storage audio file to transcribe:
https://storage.googleapis.com/cloud-samples-tests/speech/brooklyn.flac

Below is what I'm entering and what errors I'm seeing when I try this in Windows 7 and Mac OS X El Capitan.

Also, note that I can run the examples fine that you list on https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api/README.md .
Also, note that I've been able successfully obtain transcripts for long audio files in my Google Cloud Storage using another method that involves curl.

I changed the audio part of the JSON object in speech_asynch_rest.py (currently line 68 or so, I believe) to the following only when trying to access the Google Cloud Storage audio file:

'uri': speech_content.decode('UTF-8')

If you need more information, let me know.

_Windows 7 PC Attempt, with error feedback included_
Open Window cmd.exe (Click Start. Type in "cmd" (without quotes). Press Enter/Return.)
"export" command doesn't work in DOS
Here's what I typed in at the prompt (username and specific project name and id replaced).
C:\Python\python-docs-samples-master\speech\api> cd C:/Users/USERNAME/env/Scripts
C:\Python\python-docs-samples-master\speech\api> call activate.bat
C:\Python\python-docs-samples-master\speech\api> cd C:/Python/python-docs-samples-master/speech/api
(env) C:\Python\python-docs-samples-master\speech\api>set GOOGLE_APPLICATION_CREDENTIALS=C:\Python\My_Project-SOME_NUMBER.json
(env) C:\Python\python-docs-samples-master\speech\api>python speech_rest.py resources/audio.raw
{"results": [{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]}
(env) C:\Python\python-docs-samples-master\speech\api>python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER_HERE"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"transcript": "how old is the Brooklyn Bridge", "confidence": 0.98267895}]}]
(env) C:\Python\python-docs-samples-master\speech\api>python speech_async_rest.py gs://cloud-samples-tests/speech/brooklyn.flac
Traceback (most recent call last):
File "speech_async_rest.py", line 101, in
main(args.speech_file)
File "speech_async_rest.py", line 52, in main
with open(speech_file, 'rb') as speech:
OSError: [Errno 22] Invalid argument: 'gs://cloud-samples-tests/speech/brooklyn.flac'

_Mac OS X El Capitan Attempt, with error feedback included_
Click Applications > Utilities > Terminal.
Here's what I typed in at the prompt (username, transcription number, and specific project name and id replaced).
$ cd Desktop/python-docs-samples-master/speech/api
$ source env/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS=/Users/USERNAME/Desktop/google_stuff/My_Project-SOME_NUMBER.json
$ python speech_rest.py resources/audio.raw
{"results": [{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]}
$ python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER_HERE"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]
$ python speech_async_rest.py gs://cloud-samples-tests/speech/brooklyn.flac
Traceback (most recent call last):
File "speech_async_rest.py", line 101, in
main(args.speech_file)
File "speech_async_rest.py", line 52, in main
with open(speech_file, 'rb') as speech:
IOError: [Errno 2] No such file or directory: 'gs://cloud-samples-tests/speech/brooklyn.flac'

@puneith
Copy link
Contributor

puneith commented Aug 12, 2016

@mjgallow
Copy link
Author

Sorry that I missed that I was supposed to use speech_async_grpc.py for long audio file transcription. Thanks! speech_async_grpc.py works for transcribing short FLAC audio files and (with a little tweaking) long, properly prepared raw audio files in Google Cloud Storage (need to specify uri where file is in Google Cloud Storage), at least for me, on Mac OS X El Capitan. (Didn't try on Windows 7.)

Again, thanks! I'll close this ticket/issue. Below are some notes that might help others.

I tried with my own publicly available Google Cloud Storage raw audio file.

I properly converted the mp4 file to an mp3 file and then to the properly encoded raw file format using VLC (not shown below as I didn't use the application through the command line) and sox.

Note that with sox, when you specify the input file to be converted, you need to specify the channels, bits, and rate.

The soxi and file commands used before the sox command below provide the channels, bits, and rate information for the input file to be converted. The file command is probably unnecessary, actually.

Also, I had to change lines 64 and 65 in speech_asych_grpc.py to the following (maintaining indentation) to transcribe the long raw audio file:
encoding='LINEAR16', # one of LINEAR16, FLAC, MULAW, AMR, AMR_WB
sample_rate=16000, # the rate in hertz

I wanted output to go to text file instead of being printed out in the console, so I made the following code changes to my downloaded copy of speech_async_grpc.py.
I commented out the following code at line 99.
print(results)
I added the following code after that commented out line so that the transcription with the Google Cloud transcription name/number would be saved to a text file on my desktop, instead of outputting to the Terminal console. Note, you'll need to replace "USER_NAME" if not the entire file path in the first line below.
transcriptfilepath = '/Users/USER_NAME/Desktop/transcript.txt'
transcriptfile = open(transcriptfilepath, 'w+')
transcriptfile.write("name: "+str(name)+"\n")
transcriptfile.write(str(results))
transcriptfile.close()
print("Success! Output is in "+transcriptfilepath+"\n")

Mac OS X El Capitan Successful Attempt procedure and Terminal command line commands, with output in Terminal:

Click Applications > Utilities > Terminal.
Here's what I typed in at the prompt (username, transcription name/number, specific project name and id, and some other stuff replaced).
$ cd Desktop/python-docs-samples-master/speech/api
$ source env/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS=/Users/USERNAME/Desktop/google_stuff/My\ Project-SOME_NUMBER.json
$ gcloud auth activate-service-account --key-file=/Users/USERNAME/Desktop/google_stuff/My\ Project-SOME_NUMBER.json
$ python speech_async_rest.py resources/audio.raw
{"name": "LONG_NUMBER"}
Waiting for server processing...
Waiting for server processing...
[{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]
$ python speech_async_grpc.py gs://cloud-samples-tests/speech/brooklyn.flac
BUNCH_OF_CHARACTERS_AND_NUMBERS] Using polling engine: poll
name: "LONG_NUMBER_HERE"
Waiting for server processing...
Waiting for server processing...
results {
alternatives {
transcript: "how old is the Brooklyn Bridge"
confidence: 0.982678949833
}
}
$ soxi /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
$ file /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
$ sox --channels=2 --bits=16 --rate=44100 /users/USERNAME/Downloads/SOME_MP3_FILE.mp3
--channels=1 --bits=16 --rate=16000 /users/USERNAME/Downloads/NEW_FILE.raw
$ python speech_async_grpc.py gs://GOOGLE_CLOUD_STORAGE_BUCKET/NEW_FILE.raw
BUNCH_OF_CHARACTERS_AND_NUMBERS] Using polling engine: poll
name: "LONG_NUMBER_HERE"
Waiting for server processing...
...LINE_REPEATS_OVER_AND_OVER_AGAIN_UNTIL_OUTPUT_APPEARS...
Waiting for server processing...
Success! Output is in /Users/USERNAME/Desktop/transcript.txt
$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ML 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

5 participants