From bb8784b3f91b8fe091b32cb6490c7ca935a9773e Mon Sep 17 00:00:00 2001 From: luayo-cv <1367355728@qq.com> Date: Fri, 8 Sep 2023 03:11:28 +0000 Subject: [PATCH 1/2] Unified README and Added Links --- applications/Audio2Caption/README.md | 16 +++++++---- applications/Audio2Img/README.md | 14 ++++----- applications/AudioChat/README.md | 10 +++---- applications/MusicGeneration/README.md | 39 +++++++++++++++++--------- applications/README.md | 6 ++++ 5 files changed, 55 insertions(+), 30 deletions(-) diff --git a/applications/Audio2Caption/README.md b/applications/Audio2Caption/README.md index 959ce8d20ddb49..ac67fba978987a 100644 --- a/applications/Audio2Caption/README.md +++ b/applications/Audio2Caption/README.md @@ -1,16 +1,18 @@ -# Audio2Caption +### 音频描述(Audio-to-Caption Generation) -## 1. 应用简介 + + +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [whisper](), [chatglm](). +- Integration with the moedel of [whisper](), [chatglm](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -37,7 +39,11 @@ print(result) ``` -| 输入音频 | 输入prompt | 输出识别 | 输出结果 | +
+ +| Input Audio | Input Prompt | Output ASR | Output Text | | --- | --- | --- | --- | |[zh.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav) | "描述这段话." |"我认为跑步最重要的就是给我带来了身体健康" |这段话表达了作者认为跑步最重要的好处之一是身体健康。作者认为,通过跑步,身体得到了良好的锻炼,身体健康得到了改善。作者还强调了跑步对身体健康的重要性,并认为这是最值得投资的运动之一。 | +
+ diff --git a/applications/Audio2Img/README.md b/applications/Audio2Img/README.md index 359df2c3243790..0e8550917f9484 100644 --- a/applications/Audio2Img/README.md +++ b/applications/Audio2Img/README.md @@ -1,6 +1,6 @@ -# Audio To Image +### 音频生成图像(Audio-to-Image Generation) -## 1. Application introduction +#### 1. Application introduction ***** @@ -35,7 +35,7 @@ Generate image from audio(w/ prompt or image) with [ImageBind](https://facebookr - [v0.0]: Support fusing audio, text(prompt) and imnage in ImageBind latent space. -## 2. Run +#### 2. Run ***** example: Use audio generate image across modalities (e.g. Image, Text and Audio) with the model of ImageBind and StableUnCLIPImg2ImgPipeline. @@ -50,10 +50,10 @@ python audio2img_imagebind.py \ ``` ---- -## 3. Visualization +#### 3. Visualization ---- -### Audio to Image +#### Audio to Image #### 3.1.1 Instruction ```python @@ -70,7 +70,7 @@ python audio2img_imagebind.py \ |[bird_audio.wav](https://github.com/luyao-cv/file_download/blob/main/assets/bird_audio.wav)| ![audio2img_output_bird](https://github.com/luyao-cv/file_download/blob/main/vis_audio2img/audio2img_output_bird.jpg) | -### Audio+Text to Image +#### Audio+Text to Image #### 3.2.1 Instruction ```python cd applications/Audio2Img @@ -87,7 +87,7 @@ python audio2img_imagebind.py \ |[bird_audio.wav](https://github.com/luyao-cv/file_download/blob/main/assets/bird_audio.wav) | 'A photo.' | ![audio_text_to_img_output_bird_a_photo](https://github.com/luyao-cv/file_download/blob/main/vis_audio2img/audio_text_to_img_output_bird_a_photo.jpg) -### Audio+Image to Image +#### Audio+Image to Image #### 3.3.1 Instruction ```python cd applications/Audio2Img diff --git a/applications/AudioChat/README.md b/applications/AudioChat/README.md index 966c02081979f3..d22e89e9d40ec3 100644 --- a/applications/AudioChat/README.md +++ b/applications/AudioChat/README.md @@ -1,16 +1,16 @@ -# Audio Chat +### 音频对话(Audio-to-Chat Generation) -## 1. 应用简介 +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [whisper](), [chatglm](). [fastspeech2](). +- Integration with the moedel of [whisper](), [chatglm](). [fastspeech2](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -31,6 +31,6 @@ result = task(audio=audio_file, prompt=prompt, output=output_path) ``` -| 输入音频 | 输入prompt | 输出文本 | 输出结果 | +| Input Audio | Input Prompt |Output Text| Output Audio| | --- | --- | --- | --- | |[zh.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav) | "描述这段话." |"这段话表达了作者认为跑步最重要的好处之一是身体健康。作者认为,通过跑步,身体得到了良好的锻炼,身体健康得到了改善。作者还强调了跑步对身体健康的重要性,并认为这是最值得投资的运动之一。" |[audiochat-result.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav)| diff --git a/applications/MusicGeneration/README.md b/applications/MusicGeneration/README.md index 8efb740aa0453d..5cfdeaf42b8d1a 100644 --- a/applications/MusicGeneration/README.md +++ b/applications/MusicGeneration/README.md @@ -1,16 +1,16 @@ -# Music Generation +### 音乐生成(Music Generation) -## 1. 应用简介 +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [minigpt4](), [minigpt4](), [chatglm](). +- Integration with the moedel of [minigpt4](), [chatglm](), [audioldm](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -25,28 +25,41 @@ paddle.seed(1024) # Text to music task = Appflow(app="music_generation", models=["cvssp/audioldm"]) prompt = "A classic cocktail lounge vibe with smooth jazz piano and a cool, relaxed atmosphere." -negative_prompt = "low quality, average quality" +negative_prompt = 'low quality, average quality, muffled quality, noise interference, poor and low-grade quality, inaudible quality, low-fidelity quality' +audio_length_in_s = 5 num_inference_steps = 20 -audio_length_in_s = 10 output_path = "tmp.wav" result = task(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps, audio_length_in_s=audio_length_in_s, generator = paddle.Generator().manual_seed(120))['result'] scipy.io.wavfile.write(output_path, rate=16000, data=result) # image to music task1 = Appflow(app="music_generation", models=["miniGPT4/MiniGPT4-7B"]) -negative_prompt = "low quality, average quality" +negative_prompt = 'low quality, average quality, muffled quality, noise interference, poor and low-grade quality, inaudible quality, low-fidelity quality' +audio_length_in_s = 5 num_inference_steps = 20 -audio_length_in_s = 10 output_path = "tmp.wav" minigpt4_text = 'describe the image, ' -image_pil = Image.open("tmp.jpg").convert("RGB") -result = task1(image=image_pil, minigpt4_text=minigpt4_text, )['result'].split('#')[0] +image_pil = Image.open("dance.png").convert("RGB") +result = task1(image=image_pil, minigpt4_text=minigpt4_text )['result'].split('#')[0] paddle.device.cuda.empty_cache() -# miniGPT4 output: The image shows a pineapple cocktail sitting on a table in front of a person. The pineapple is cut in half and the drink is poured into the top half. The person is holding a straw in their hand and appears to be sipping the drink. There are also some other items on the table, such as a plate with food and a glass of water. The background is a marble table with a pattern on it. -prompt = "Given the scene description in the following paragraph, please create a musical style sentence that fits the scene.Description:{}.".format(result) +# miniGPT4 output: The image shows a crowded nightclub with people dancing on the dance floor. The lights on the dance floor are green and red, and there are several people on the dance floor. The stage is at the back of the room, and there are several people on stage. The walls of the nightclub are decorated with neon lights and there are several people sitting at tables in the background. The atmosphere is lively and energetic. + +prompt = "Given the scene description in the following paragraph, please create a musical style sentence that fits the scene. Description:{}.".format(result) task2 = Appflow(app="music_generation", models=["THUDM/chatglm-6b", "cvssp/audioldm"]) result = task2(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps, audio_length_in_s=audio_length_in_s, generator = paddle.Generator().manual_seed(120))['result'] scipy.io.wavfile.write(output_path, rate=16000, data=result) -# chatglm ouptput: The music swells as the image shows the pineapple cocktail on the table, with the drink cut in half and the person sipping it with a straw. The background is a marble table with a pattern, and the other items on the table are a plate with food and a glass of water. The music fades until it disappears, leaving the scene in the person's hand the pineapple drink, with the music once again swelling in the background. +# chatglm ouptput: The music is playing, and the crowd is dancing like never before. The lights are bright and the atmosphere is electric, with people swaying to the rhythm of the music and the energy of the night. The dance floor is a sea of movement, with people moving to the music and feeling the rhythm of their feet. The stage is a place of magic, with people on it, performing their best. The neon lights of the nightclub are a testament to the energy and excitement of the night, with people's faces lit up as they perform. And as the music continues to play, the crowd continues to dance, never letting up, until the night is over. ``` + +#### Text to music +| Input Prompt | Output Music | +| --- | --- | +|'A classic cocktail lounge vibe with smooth jazz piano and a cool, relaxed atmosphere.'| [jazz_output.wav](https://github.com/luyao-cv/file_download/blob/main/assets/jazz_output.wav) + +--- + +#### image to music +| Input Image | Output Caption | Output Text | Output Music | +| --- | --- | --- | --- | +|![dance.png](https://github.com/luyao-cv/file_download/blob/main/vis_music_generation/dance.png) | 'The image shows a crowded nightclub with people dancing on the dance floor. The lights on the dance floor are green and red, and there are several people on the dance floor. The stage is at the back of the room, and there are several people on stage. The walls of the nightclub are decorated with neon lights and there are several people sitting at tables in the background. The atmosphere is lively and energetic.' | 'The music is playing, and the crowd is dancing like never before. The lights are bright and the atmosphere is electric, with people swaying to the rhythm of the music and the energy of the night. The dance floor is a sea of movement, with people moving to the music and feeling the rhythm of their feet. The stage is a place of magic, with people on it, performing their best. The neon lights of the nightclub are a testament to the energy and excitement of the night, with people's faces lit up as they perform. And as the music continues to play, the crowd continues to dance, never letting up, until the night is over.' | [dance_output.wav](https://github.com/luyao-cv/file_download/blob/main/assets/dance_output.wav) \ No newline at end of file diff --git a/applications/README.md b/applications/README.md index a083e877fd3883..2ecf66e9a6c538 100644 --- a/applications/README.md +++ b/applications/README.md @@ -54,6 +54,12 @@ result = task(prompt=prompt)['result'] | [文本引导的图像变换(Image-to-Image Text-Guided Generation)](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | `stable-diffusion-v1-5` | [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | | [文本图像双引导图像生成(Dual Text and Image Guided Generation)](./image2image/README.md/#文本图像双引导图像生成dual-text-and-image-guided-generation) | `versatile-diffusion` | ❌ | | [文本条件的视频生成(Text-to-Video Generation)](./text2video/README.md/#文本条件的视频生成text-to-video-generation) | `text-to-video-ms-1.7b` | ❌ | +| [音频生成图像(Audio-to-Chat Generation)](./Audio2Img/README.md/#audio-to-image) | `imagebind stable-diffusion-2-1-unclip` | | +| [音频描述(Audio-to-Caption Generation)](./Audio2Caption/README.md/#音频描述audio-to-caption-generation) | `chatglm-6b whisper` | | +| [音频对话(Audio-to-Chat Generation)](./AudioChat/README.md/#音频对话audio-to-chat-generation) | `chatglm-6b whisper fastspeech2` | | +| [音乐生成(Music Generation)](./MusicGeneration/README.md/#音乐生成music-generation) | `chatglm-6b minigpt4 audioldm` | | + + 更多应用持续开发中...... From e43f52e2b60f1369dca8e9122c5ca29a6e4ec7e7 Mon Sep 17 00:00:00 2001 From: luayo-cv <1367355728@qq.com> Date: Fri, 8 Sep 2023 03:11:28 +0000 Subject: [PATCH 2/2] Unified README and Added Links --- applications/Audio2Caption/README.md | 16 +++++++---- applications/Audio2Img/README.md | 14 ++++----- applications/AudioChat/README.md | 12 ++++---- applications/MusicGeneration/README.md | 39 +++++++++++++++++--------- applications/README.md | 6 ++++ 5 files changed, 56 insertions(+), 31 deletions(-) diff --git a/applications/Audio2Caption/README.md b/applications/Audio2Caption/README.md index 959ce8d20ddb49..ac67fba978987a 100644 --- a/applications/Audio2Caption/README.md +++ b/applications/Audio2Caption/README.md @@ -1,16 +1,18 @@ -# Audio2Caption +### 音频描述(Audio-to-Caption Generation) -## 1. 应用简介 + + +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [whisper](), [chatglm](). +- Integration with the moedel of [whisper](), [chatglm](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -37,7 +39,11 @@ print(result) ``` -| 输入音频 | 输入prompt | 输出识别 | 输出结果 | +
+ +| Input Audio | Input Prompt | Output ASR | Output Text | | --- | --- | --- | --- | |[zh.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav) | "描述这段话." |"我认为跑步最重要的就是给我带来了身体健康" |这段话表达了作者认为跑步最重要的好处之一是身体健康。作者认为,通过跑步,身体得到了良好的锻炼,身体健康得到了改善。作者还强调了跑步对身体健康的重要性,并认为这是最值得投资的运动之一。 | +
+ diff --git a/applications/Audio2Img/README.md b/applications/Audio2Img/README.md index 359df2c3243790..0e8550917f9484 100644 --- a/applications/Audio2Img/README.md +++ b/applications/Audio2Img/README.md @@ -1,6 +1,6 @@ -# Audio To Image +### 音频生成图像(Audio-to-Image Generation) -## 1. Application introduction +#### 1. Application introduction ***** @@ -35,7 +35,7 @@ Generate image from audio(w/ prompt or image) with [ImageBind](https://facebookr - [v0.0]: Support fusing audio, text(prompt) and imnage in ImageBind latent space. -## 2. Run +#### 2. Run ***** example: Use audio generate image across modalities (e.g. Image, Text and Audio) with the model of ImageBind and StableUnCLIPImg2ImgPipeline. @@ -50,10 +50,10 @@ python audio2img_imagebind.py \ ``` ---- -## 3. Visualization +#### 3. Visualization ---- -### Audio to Image +#### Audio to Image #### 3.1.1 Instruction ```python @@ -70,7 +70,7 @@ python audio2img_imagebind.py \ |[bird_audio.wav](https://github.com/luyao-cv/file_download/blob/main/assets/bird_audio.wav)| ![audio2img_output_bird](https://github.com/luyao-cv/file_download/blob/main/vis_audio2img/audio2img_output_bird.jpg) | -### Audio+Text to Image +#### Audio+Text to Image #### 3.2.1 Instruction ```python cd applications/Audio2Img @@ -87,7 +87,7 @@ python audio2img_imagebind.py \ |[bird_audio.wav](https://github.com/luyao-cv/file_download/blob/main/assets/bird_audio.wav) | 'A photo.' | ![audio_text_to_img_output_bird_a_photo](https://github.com/luyao-cv/file_download/blob/main/vis_audio2img/audio_text_to_img_output_bird_a_photo.jpg) -### Audio+Image to Image +#### Audio+Image to Image #### 3.3.1 Instruction ```python cd applications/Audio2Img diff --git a/applications/AudioChat/README.md b/applications/AudioChat/README.md index 966c02081979f3..50d143953211b6 100644 --- a/applications/AudioChat/README.md +++ b/applications/AudioChat/README.md @@ -1,16 +1,16 @@ -# Audio Chat +### 音频对话(Audio-to-Chat Generation) -## 1. 应用简介 +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [whisper](), [chatglm](). [fastspeech2](). +- Integration with the moedel of [whisper](), [chatglm](). [fastspeech2](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -31,6 +31,6 @@ result = task(audio=audio_file, prompt=prompt, output=output_path) ``` -| 输入音频 | 输入prompt | 输出文本 | 输出结果 | +| Input Audio | Input Prompt |Output Text| Output Audio| | --- | --- | --- | --- | -|[zh.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav) | "描述这段话." |"这段话表达了作者认为跑步最重要的好处之一是身体健康。作者认为,通过跑步,身体得到了良好的锻炼,身体健康得到了改善。作者还强调了跑步对身体健康的重要性,并认为这是最值得投资的运动之一。" |[audiochat-result.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav)| +|[zh.wav](https://github.com/luyao-cv/file_download/blob/main/assets/zh.wav) | "描述这段话." |"这段话表达了作者认为跑步最重要的好处之一是身体健康。作者认为,通过跑步,身体得到了良好的锻炼,身体健康得到了改善。作者还强调了跑步对身体健康的重要性,并认为这是最值得投资的运动之一。" |[audiochat-result.wav](https://github.com/luyao-cv/file_download/blob/main/assets/audiochat-result.wav)| diff --git a/applications/MusicGeneration/README.md b/applications/MusicGeneration/README.md index 8efb740aa0453d..5cfdeaf42b8d1a 100644 --- a/applications/MusicGeneration/README.md +++ b/applications/MusicGeneration/README.md @@ -1,16 +1,16 @@ -# Music Generation +### 音乐生成(Music Generation) -## 1. 应用简介 +#### 1. Application introduction Enter audio and prompt words for question and answer. ***** - No training is need. -- Integration with the moedel of 🤗 [minigpt4](), [minigpt4](), [chatglm](). +- Integration with the moedel of [minigpt4](), [chatglm](), [audioldm](). ---- -## 2. Demo +#### 2. Demo ***** example: @@ -25,28 +25,41 @@ paddle.seed(1024) # Text to music task = Appflow(app="music_generation", models=["cvssp/audioldm"]) prompt = "A classic cocktail lounge vibe with smooth jazz piano and a cool, relaxed atmosphere." -negative_prompt = "low quality, average quality" +negative_prompt = 'low quality, average quality, muffled quality, noise interference, poor and low-grade quality, inaudible quality, low-fidelity quality' +audio_length_in_s = 5 num_inference_steps = 20 -audio_length_in_s = 10 output_path = "tmp.wav" result = task(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps, audio_length_in_s=audio_length_in_s, generator = paddle.Generator().manual_seed(120))['result'] scipy.io.wavfile.write(output_path, rate=16000, data=result) # image to music task1 = Appflow(app="music_generation", models=["miniGPT4/MiniGPT4-7B"]) -negative_prompt = "low quality, average quality" +negative_prompt = 'low quality, average quality, muffled quality, noise interference, poor and low-grade quality, inaudible quality, low-fidelity quality' +audio_length_in_s = 5 num_inference_steps = 20 -audio_length_in_s = 10 output_path = "tmp.wav" minigpt4_text = 'describe the image, ' -image_pil = Image.open("tmp.jpg").convert("RGB") -result = task1(image=image_pil, minigpt4_text=minigpt4_text, )['result'].split('#')[0] +image_pil = Image.open("dance.png").convert("RGB") +result = task1(image=image_pil, minigpt4_text=minigpt4_text )['result'].split('#')[0] paddle.device.cuda.empty_cache() -# miniGPT4 output: The image shows a pineapple cocktail sitting on a table in front of a person. The pineapple is cut in half and the drink is poured into the top half. The person is holding a straw in their hand and appears to be sipping the drink. There are also some other items on the table, such as a plate with food and a glass of water. The background is a marble table with a pattern on it. -prompt = "Given the scene description in the following paragraph, please create a musical style sentence that fits the scene.Description:{}.".format(result) +# miniGPT4 output: The image shows a crowded nightclub with people dancing on the dance floor. The lights on the dance floor are green and red, and there are several people on the dance floor. The stage is at the back of the room, and there are several people on stage. The walls of the nightclub are decorated with neon lights and there are several people sitting at tables in the background. The atmosphere is lively and energetic. + +prompt = "Given the scene description in the following paragraph, please create a musical style sentence that fits the scene. Description:{}.".format(result) task2 = Appflow(app="music_generation", models=["THUDM/chatglm-6b", "cvssp/audioldm"]) result = task2(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps, audio_length_in_s=audio_length_in_s, generator = paddle.Generator().manual_seed(120))['result'] scipy.io.wavfile.write(output_path, rate=16000, data=result) -# chatglm ouptput: The music swells as the image shows the pineapple cocktail on the table, with the drink cut in half and the person sipping it with a straw. The background is a marble table with a pattern, and the other items on the table are a plate with food and a glass of water. The music fades until it disappears, leaving the scene in the person's hand the pineapple drink, with the music once again swelling in the background. +# chatglm ouptput: The music is playing, and the crowd is dancing like never before. The lights are bright and the atmosphere is electric, with people swaying to the rhythm of the music and the energy of the night. The dance floor is a sea of movement, with people moving to the music and feeling the rhythm of their feet. The stage is a place of magic, with people on it, performing their best. The neon lights of the nightclub are a testament to the energy and excitement of the night, with people's faces lit up as they perform. And as the music continues to play, the crowd continues to dance, never letting up, until the night is over. ``` + +#### Text to music +| Input Prompt | Output Music | +| --- | --- | +|'A classic cocktail lounge vibe with smooth jazz piano and a cool, relaxed atmosphere.'| [jazz_output.wav](https://github.com/luyao-cv/file_download/blob/main/assets/jazz_output.wav) + +--- + +#### image to music +| Input Image | Output Caption | Output Text | Output Music | +| --- | --- | --- | --- | +|![dance.png](https://github.com/luyao-cv/file_download/blob/main/vis_music_generation/dance.png) | 'The image shows a crowded nightclub with people dancing on the dance floor. The lights on the dance floor are green and red, and there are several people on the dance floor. The stage is at the back of the room, and there are several people on stage. The walls of the nightclub are decorated with neon lights and there are several people sitting at tables in the background. The atmosphere is lively and energetic.' | 'The music is playing, and the crowd is dancing like never before. The lights are bright and the atmosphere is electric, with people swaying to the rhythm of the music and the energy of the night. The dance floor is a sea of movement, with people moving to the music and feeling the rhythm of their feet. The stage is a place of magic, with people on it, performing their best. The neon lights of the nightclub are a testament to the energy and excitement of the night, with people's faces lit up as they perform. And as the music continues to play, the crowd continues to dance, never letting up, until the night is over.' | [dance_output.wav](https://github.com/luyao-cv/file_download/blob/main/assets/dance_output.wav) \ No newline at end of file diff --git a/applications/README.md b/applications/README.md index a083e877fd3883..2ecf66e9a6c538 100644 --- a/applications/README.md +++ b/applications/README.md @@ -54,6 +54,12 @@ result = task(prompt=prompt)['result'] | [文本引导的图像变换(Image-to-Image Text-Guided Generation)](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | `stable-diffusion-v1-5` | [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | | [文本图像双引导图像生成(Dual Text and Image Guided Generation)](./image2image/README.md/#文本图像双引导图像生成dual-text-and-image-guided-generation) | `versatile-diffusion` | ❌ | | [文本条件的视频生成(Text-to-Video Generation)](./text2video/README.md/#文本条件的视频生成text-to-video-generation) | `text-to-video-ms-1.7b` | ❌ | +| [音频生成图像(Audio-to-Chat Generation)](./Audio2Img/README.md/#audio-to-image) | `imagebind stable-diffusion-2-1-unclip` | | +| [音频描述(Audio-to-Caption Generation)](./Audio2Caption/README.md/#音频描述audio-to-caption-generation) | `chatglm-6b whisper` | | +| [音频对话(Audio-to-Chat Generation)](./AudioChat/README.md/#音频对话audio-to-chat-generation) | `chatglm-6b whisper fastspeech2` | | +| [音乐生成(Music Generation)](./MusicGeneration/README.md/#音乐生成music-generation) | `chatglm-6b minigpt4 audioldm` | | + + 更多应用持续开发中......