Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a Avatar Chatbot (Audio) example #523

Closed
wants to merge 39 commits into from
Closed

[WIP] Add a Avatar Chatbot (Audio) example #523

wants to merge 39 commits into from

Conversation

ctao456
Copy link
Collaborator

@ctao456 ctao456 commented Aug 4, 2024

Description

Initiate Avatar Chatbot (Audio) example

Issues

opea-project/docs#59

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Wav2Lip-GFPGAN

Appending "animation" microservice to AudioQnA example, to make a new avatar chatbot example
opea-project/GenAIComps#400

Tests

curl http://${host_ip}:3009/v1/avatarchatbot
-X POST
-d @sample_whoareyou.json
-H 'Content-Type: application/json'

If the megaservice is running properly, you should see the following output:

"/outputs/result.mp4"

ctao456 and others added 30 commits June 24, 2024 18:47
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
@ctao456 ctao456 requested a review from Spycsh as a code owner August 5, 2024 20:44
ctao456 and others added 2 commits August 5, 2024 20:00
@Spycsh
Copy link
Member

Spycsh commented Aug 6, 2024

Hi @ctao456 , thanks for contribution. The talking avatar with Wav2Lip-GFPGAN looks good. Before reviewing I have a few questions. I notice that you use HPU to run Wav2Lip and report the latency "10-50 seconds for AvatarAnimation on Gaudi". How long is the driven audio? Is that the latency of the first run or after a warmup? Have you tried to optimize the related models (Wav2Lip model, GFPGAN) on HPU? With optimization it could be faster by fully utilizing the static shape feature on Gaudi.

@ctao456
Copy link
Collaborator Author

ctao456 commented Aug 6, 2024

Hi @ctao456 , thanks for contribution. The talking avatar with Wav2Lip-GFPGAN looks good. Before reviewing I have a few questions. I notice that you use HPU to run Wav2Lip and report the latency "10-50 seconds for AvatarAnimation on Gaudi". How long is the driven audio? Is that the latency of the first run or after a warmup? Have you tried to optimize the related models (Wav2Lip model, GFPGAN) on HPU? With optimization it could be faster by fully utilizing the static shape feature on Gaudi.

Hi @Spycsh thank you for your comments.

  1. In the demo video, the driven audio was 22s seconds long, and the inference time was around 50 seconds using both Wav2Lip-GAN and GFPGAN models (--inference_mode set to wav2lip+gfpgan). There will be significant speedup by switching --inference_mode flag to wav2lip_only, with some tradeoff on face restoration quality.
  2. "10-50 seconds for AvatarAnimation on Gaudi" is the latency of the first run, without warming up. But we can try including warm-up to speed up.
  3. Thank you for your suggestion. The current efforts focus on building the micro- and megaservice architecture. We will gradually add more features for
    a. HPUs optimization (eager mode with torch.compile v.s. lazy mode, torch.jit, HPU graph, BF16 & INT8 precision, etc.) to acclerate graph inference
    b. Distributed inference on multiple Gaudi cards, using DeepSpeed.
    c. Support for more SoTA face animation models (SadTalker, LivePortrait, etc.)

Copy link
Collaborator

@louie-tsai louie-tsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. minor feedback

AvatarChatbot/docker/Dockerfile Show resolved Hide resolved
AvatarChatbot/docker/Dockerfile Show resolved Hide resolved
AvatarChatbot/docker/gaudi/README.md Show resolved Hide resolved
AvatarChatbot/docker/gaudi/README.md Show resolved Hide resolved
AvatarChatbot/docker/gaudi/README.md Show resolved Hide resolved
AvatarChatbot/docker/gaudi/README.md Show resolved Hide resolved
@ctao456 ctao456 marked this pull request as draft August 21, 2024 18:21
@louie-tsai louie-tsai closed this Aug 21, 2024
Copy link
Collaborator

@louie-tsai louie-tsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. leave some comments

AvatarChatbot/docker/avatarchatbot.py Show resolved Hide resolved
outputs=[video_output, video_time_text],
)

demo.queue().launch(server_name="0.0.0.0", server_port=65535) # demo port is 65535
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to make server name and port as variables in the beginning. doesn't hurt to make them as env variables.

# Prepare 3 image paths
# HOME = os.getenv("HOME")
# HOME="/mnt/localdisk4"
HOME = "/home/demo/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to make "HOME" configurable via env or parameter

asyncio.set_event_loop(loop)
audio_file = loop.run_until_complete(aiavatar_demo(audio))
count += 1
end_time = time.time()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the latency, we might need to support statistics RESTful API in the end.

outputs=[video_output, video_time_text],
)

demo.queue().launch(server_name="0.0.0.0", server_port=7861)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to make server name and port as env variables.

# 2. Run inference.sh bash script to perform Wav2Lip+GFPGAN inference
# Output video is saved at the path 'OUTFILE'
command_wav2lip_gfpgan = "bash inference_vars.sh"
subprocess.run(command_wav2lip_gfpgan, shell=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protentially might have issue from Intel IP scan..

wangkl2 pushed a commit to wangkl2/GenAIExamples that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants