Highlight
New Features
- Add AudioTools toolkit support used in DAC (Descript-Audio-Codec) training and inference.
- Reproduce the losses required for DAC model: MultiScaleSTFTLoss, GANLoss, and SISDRLoss.
Version Adaptation
Upgrade and adapt PaddleSpeech from Paddle 2.5.1 to Paddle 3.0.0-beta. Address incompatibility issues caused by the new version upgrade of Paddle, perform adaptation development and regression testing on the models in PaddleSpeech, and ensure the suite operates normally without loss of model functionality or accuracy.
- Ensure the adaptation of 80+ existing models in the demo and example directories.
- Ensure the adaptation and accuracy alignment of 10+ core models in the example directory.
- Support the re-export of 20+ dynamic-to-static models using the PIR + predictor approach and ensure successful inference.
More Detail
New Features
- Add AudioTools toolkit #3900 (@DrRyanHuang)
- Add FFT convolution layer implementation #3947 (@DrRyanHuang)
- Implement loss functions required for DAC training #3988 (@cchenhaifeng)
- Add quantifiers and unit symbols support #3837 (@undefined-ux)
- Add multiple PIR models #3956, #3982 (@zxcd)
- Add chunk configuration for tal_cs #3936 (@zxcd)
Version Adaptation
- Enhance NumPy compatibility #3907 (@GreatV)
- Fix Whisper model support under Paddle 3.0 #3880 (@yinfan98)
- Remove dependency on paddlepaddle-gpu #3898 (@Liyulingyue)
- Support new inference interface #3927 (@zxcd)
- Modify inference to be compatible with Paddle 3.0 #3963 (@megemini)
- Fix cls static model infer error #3856 (@zxcd)
- Remove parser.add_argument #3878 (@Liyulingyue)
- Add strtobool implementation #3877 (@Liyulingyue)
- Fix view to shape for wav2vec2 #3904 (@Liyulingyue)
- Fix 0D tensor to 1D issue #3913 (@megemini)
- Fix type promotion issues #3817, #3883, #3944, #3943 (@megemini @GreatV)
- Fix shape error in layer normalization #3884 (@Liyulingyue)
- Resolve scipy import error #3874 (@GreatV)
- Fix vits type promotion and 0D #3920 (@Liyulingyue)
- Fix fastspeech2 0d issue #3951 (@megemini)
- Fix emb initialization #3962 (@megemini)
- Replace view with reshape #3887, #3939 (@GreatV, @megemini)
- Fix max between int and value #3903 (@megemini)
- Fix duplicated argument #3934 (@megemini)
- Fix asr5 test.sh script path error #3941 (@megemini)
- Fix vctk spk_emb dimension issue #3916 (@megemini)
- Fix type promotion for aishell3/vctk vc0/ernie #3928 (@Liyulingyue)
- Use numpy for transpose #3933 (@megemini)
- Fix shape issues in opencopop svs1 #3912 (@enkilee)
- Fix deepspeech2online export issue #3935 (@Liyulingyue)
Installation Adaptation
- Optimize Python version compatibility #3965, #3967, #3969, #3970, #3972 (@Liyulingyue)
- Add hints for installing with
-e
option #3979 (@Liyulingyue) - Move audiotools requirements to setup.py #3999 (@zxcd)
- Lower installation requirements #3985 by @Liyulingyue
- Remove paddleaudio from PaddleSpeech #3986 by @zxcd
- Update install_openblas.sh #3876 (@GreatV)
- Update setup.py #3964, #3995 (@Liyulingyue)
- Adapt for librosa #3989 (@Liyulingyue)
- Lower installation requirements #3985 (@Liyulingyue)
- Remove paddleaudio from PaddleSpeech #3986 (@zxcd)
- Define PythonDetermine in setup.py #3975 (@Liyulingyue)
Hardware Support
- Add GCU Backend support #3875 (@wanx7130)
- SpeedySpeech code adaptation for NPU #3804 (@warrentdrew)
- SpeedySpeech code adaptation for MLU #3828 (@warrentdrew)
Docs
- Add Squeezeformer information to README #3860 (@zxcd)
- Add README documentation for TIMIT/ASR1 #3930 (@enkilee)
- Fix multiple examples and demos #3830, #3872 (@zxcd @Liyulingyue)
- Fix tess readme #3882 (@megemini)
- Update README.md #3890 (@Liyulingyue)
- Fix Example/tiny documentation errors #3892 (@Liyulingyue)
- Update tal_cs readme #3911 (@megemini)
- Fix librispeech asr readme #3917 (@megemini)
- Fix CSMSC voc1 readme.md #3915 (@enkilee)
- Fix s2t example errors #3950 (@megemini)
- Fix led_en_zh st1 example #3955 (@GreatV)
- Text frontend intended links #3958 (@guspan-tanadi)
- Update Tiny README.md #3896 (@Liyulingyue)
- Fix acs demo #3826 (@zxcd)
- Fix g2p run.sh #3886 (@megemini)
- Fix asr4 test_wav redundant arguments #3940 (@megemini)
- Add synthesize_e2e.sh for csmsc/voc1, fix run.sh #3945 (@enkilee)
- Add synthesize_e2e.sh for csmsc/voc5, fix run.sh #3959 (@enkilee)
- Fix CSMSC Voc5/Jets/TTS2 #3906 (@Liyulingyue)
- Update utility script paths #3942 (@GreatV)
- Remove non-existent folders and add existing folders #3881 (@Liyulingyue)
- Fix file name #3895 (@zxcd)
- Fix typos #3984 (@co63oc)
- Fix missing ' #3869 (@Liyulingyue)
- Fix typos #3980 (@co63oc)
- Fix typos #3981 (@co63oc)
- Fix csmsc/voc3 script #3960 (@enkilee)
Bug Fix
- Fix streaming TTS server issues #3865 (@SuiYunsy)
- Fix matplotlib version incompatibility #3841 (@zxcd)
- Fix pydantic dependency issues #3715 (@Netrvin)
- Fix audiotools file path #3968 (@zxcd)
- Add missing keywords for aishell3/vits-vc #3932 (@yinfan98)
- Fix data traversal error caused by empty folders without *.npy files #3948 (@megemini)
- Fix package dependency issues in opencopop svs1 #3889 (@enkilee)
- Separate paddle.logsumexp #3897 (@zxcd)
- Fix audiotools model save and load #3994 (@zxcd)
- Fix TimeDomainSpecAugment import error #3919 (@megemini)
- Fix print_arguments import error #3918 (@megemini)
- Fix panns predict.py for pir json model path #3914 (@megemini)
- Complete missing parameters in synthesis series scripts #3998 (@enkilee)
- Fix tests/unit/tts/test_pwg.py #3974 (@co63oc)
CI
- Add server CI #3857 by @tianshuo78520a
- Add unit tests #3835, #3836 (@zxcd, @tianshuo78520a)
- Close test_expand.py #3971 (@co63oc)
- Close test_snapshot.py #3976 (@co63oc)
Acknowledgements
Special thanks to contributors including @wanx7130, @warrentdrew, @DrRyanHuang, @cchenhaifeng, @undefined-ux, @zxcd, @GreatV, @yinfan98, @Liyulingyue, @megemini, @SuiYunsy, @Netrvin, @enkilee, @tianshuo78520a, @guspan-tanadi, @co63oc and others for their support.
New Contributors
- @wanx7130 made their first contribution in #3875
- @cchenhaifeng made their first contribution in #3988
- @undefined-ux made their first contribution in #3837
- @DrRyanHuang made their first contribution in #3900
- @SuiYunsy made their first contribution in #3865
- @Netrvin made their first contribution in #3715
- @guspan-tanadi made their first contribution in #3958
- @enkilee made their first contribution in #3889
- @co63oc made their first contribution in #3971