Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[asr][websocket]add streaming asr demo #1737

Merged
merged 3 commits into from
Apr 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
355 changes: 355 additions & 0 deletions demos/streaming_asr_server/README.md

Large diffs are not rendered by default.

356 changes: 356 additions & 0 deletions demos/streaming_asr_server/README_cn.md

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions demos/streaming_asr_server/conf/ws_application.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# This is the parameter configuration file for PaddleSpeech Serving.

#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090

# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online', 'tts_online']
# protocol = ['websocket', 'http'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']


#################################################################################
# ENGINE CONFIG #
#################################################################################

################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'deepspeech2online_aishell'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True

am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config

chunk_buffer_conf:
frame_duration_ms: 80
shift_ms: 40
sample_rate: 16000
sample_width: 2
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 20 # ms
shift_ms: 10 # ms
45 changes: 45 additions & 0 deletions demos/streaming_asr_server/conf/ws_conformer_application.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# This is the parameter configuration file for PaddleSpeech Serving.

#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090

# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online', 'tts_online']
# protocol = ['websocket', 'http'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']


#################################################################################
# ENGINE CONFIG #
#################################################################################

################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_multicn'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config

chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2
2 changes: 2 additions & 0 deletions demos/streaming_asr_server/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# start the streaming asr service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
5 changes: 5 additions & 0 deletions demos/streaming_asr_server/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# download the test wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav

# read the wav and pass it to service
python3 websocket_client.py --wavfile ./zh.wav
62 changes: 62 additions & 0 deletions demos/streaming_asr_server/websocket_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddlespeech/server/tests/asr/online/microphone_client.py 这个文件保留

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import argparse
import asyncio
import codecs
import logging
import os

from paddlespeech.cli.log import logger
from paddlespeech.server.utils.audio_handler import ASRAudioHandler


def main(args):
logger.info("asr websocket client start")
handler = ASRAudioHandler("127.0.0.1", 8090)
loop = asyncio.get_event_loop()

# support to process single audio file
if args.wavfile and os.path.exists(args.wavfile):
logger.info(f"start to process the wavscp: {args.wavfile}")
result = loop.run_until_complete(handler.run(args.wavfile))
result = result["asr_results"]
logger.info(f"asr websocket client finished : {result}")

# support to process batch audios from wav.scp
if args.wavscp and os.path.exists(args.wavscp):
logging.info(f"start to process the wavscp: {args.wavscp}")
with codecs.open(args.wavscp, 'r', encoding='utf-8') as f,\
codecs.open("result.txt", 'w', encoding='utf-8') as w:
for line in f:
utt_name, utt_path = line.strip().split()
result = loop.run_until_complete(handler.run(utt_path))
result = result["asr_results"]
w.write(f"{utt_name} {result}\n")


if __name__ == "__main__":
logger.info("Start to do streaming asr client")
parser = argparse.ArgumentParser()
parser.add_argument(
"--wavfile",
action="store",
help="wav file path ",
default="./16_audio.wav")
parser.add_argument(
"--wavscp", type=str, default=None, help="The batch audios dict text")
args = parser.parse_args()

main(args)
14 changes: 9 additions & 5 deletions paddlespeech/server/bin/paddlespeech_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,14 @@
from ..util import cli_client_register
from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.server.tests.asr.online.websocket_client import ASRAudioHandler
from paddlespeech.server.utils.audio_handler import ASRAudioHandler
from paddlespeech.server.utils.audio_process import wav2pcm
from paddlespeech.server.utils.util import wav2base64

__all__ = ['TTSClientExecutor', 'ASRClientExecutor', 'CLSClientExecutor']
__all__ = [
'TTSClientExecutor', 'ASRClientExecutor', 'ASROnlineClientExecutor',
'CLSClientExecutor'
]


@cli_client_register(
Expand Down Expand Up @@ -236,11 +239,11 @@ def __call__(self,
@cli_client_register(
name='paddlespeech_client.asr_online',
description='visit asr online service')
class ASRClientExecutor(BaseExecutor):
class ASROnlineClientExecutor(BaseExecutor):
def __init__(self):
super(ASRClientExecutor, self).__init__()
super(ASROnlineClientExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_client.asr', add_help=True)
prog='paddlespeech_client.asr_online', add_help=True)
self.parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
self.parser.add_argument(
Expand Down Expand Up @@ -305,6 +308,7 @@ def __call__(self,

return res['asr_results']


@cli_client_register(
name='paddlespeech_client.cls', description='visit cls service')
class CLSClientExecutor(BaseExecutor):
Expand Down
2 changes: 1 addition & 1 deletion paddlespeech/server/conf/ws_conformer_application.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ asr_online:
cfg_path:
decode_method:
force_yes: True

device: # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
Expand Down
11 changes: 11 additions & 0 deletions paddlespeech/server/engine/asr/online/asr_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -1028,6 +1028,17 @@ def init(self, config: dict) -> bool:
self.output = ""
self.executor = ASRServerExecutor()
self.config = config
try:
if self.config.get("device", None):
self.device = self.config.device
else:
self.device = paddle.get_device()
logger.info(f"paddlespeech_server set the device: {self.device}")
paddle.set_device(self.device)
except BaseException:
logger.error(
"Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
)

self.executor._init_from_path(
model_type=self.config.model_type,
Expand Down
35 changes: 35 additions & 0 deletions paddlespeech/server/tests/asr/online/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
([简体中文](./README_cn.md)|English)

# Speech Service

## Introduction

This document introduces a client for streaming asr service: microphone


## Usage
### 1. Install
Refer [Install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

**paddlepaddle 2.2.1** 或以上版本。
It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech.


### 2. Prepare config File


The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

Here are sample files for thisASR client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```

### 2. Streaming ASR Client Usage

- microphone
```
python microphone_client.py

```
13 changes: 3 additions & 10 deletions paddlespeech/server/tests/asr/online/README_cn.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
([简体中文](./README_cn.md)|English)
([English](./README.md)|中文)

# 语音服务

## 介绍
本文档介绍如何使用流式ASR的三种不同客户端:网页、麦克风、Python模拟流式服务
本文档介绍如何使用流式ASR的一种不同客户端:麦克风。


## 使用方法
Expand All @@ -20,7 +20,7 @@

可以下载此 ASR client的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```

### 2. 流式 ASR 客户端使用方法
Expand All @@ -40,10 +40,3 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
python microphone_client.py

```


- 网页
```
# 进入web目录后参考相关readme.md

```
13 changes: 0 additions & 13 deletions paddlespeech/server/tests/asr/online/__init__.py

This file was deleted.

Loading