Enable bf16 quantization of models, and fix resetting args.audio in t…

…he microservice between runs (opea-project#832) * Update audioqna gateway to print text, in gateway.py Signed-off-by: Chun Tao <[email protected]> * updates needed for demo Signed-off-by: Chun Tao <[email protected]> * original pr content Signed-off-by: Chun Tao <[email protected]> * Revert "updates needed for demo" This reverts commit f0c7a026305ace410610c9dba771699e13dde8ea. Signed-off-by: Chun Tao <[email protected]> * remove improper images Signed-off-by: Chun Tao <[email protected]> * Addressed some comments on previous pr Signed-off-by: Chun Tao <[email protected]> * Add Dockerfile for cpu support Signed-off-by: Chun Tao <[email protected]> * CODEOWNER: Update comp CODEOWNER (opea-project#757) Signed-off-by: Yeoh, Hoong Tee <[email protected]> Signed-off-by: Chun Tao <[email protected]> * Add stable diffusion microservice (opea-project#729) * add stable diffusion microservice. Signed-off-by: Ye, Xinyu <[email protected]> * added test. Signed-off-by: Ye, Xinyu <[email protected]> * changed output to images bytes data Signed-off-by: Ye, Xinyu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified inference and wrapper into one microservice. Signed-off-by: Ye, Xinyu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test. Signed-off-by: Ye, Xinyu <[email protected]> --------- Signed-off-by: Ye, Xinyu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Chun Tao <[email protected]> * Compatible with different platforms. (opea-project#766) * Compatible with different platforms. Signed-off-by: ZePan110 <[email protected]> * Fix issue. Signed-off-by: ZePan110 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix issue Signed-off-by: ZePan110 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ZePan110 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Chun Tao <[email protected]> * Optimize path and link validity check. (opea-project#745) Signed-off-by: ZePan110 <[email protected]> Signed-off-by: Chun Tao <[email protected]> * Add timeout for ut test (opea-project#773) Signed-off-by: chensuyue <[email protected]> Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Chun Tao <[email protected]> * test hyperlink Signed-off-by: Chun Tao <[email protected]> * test hyperlink Signed-off-by: Chun Tao <[email protected]> * test hyperlink issue Signed-off-by: Chun Tao <[email protected]> * test hyperlink issue Signed-off-by: Chun Tao <[email protected]> * put back hyperlinks in readme Signed-off-by: Chun Tao <[email protected]> * remove possible error hyperlink Signed-off-by: Chun Tao <[email protected]> * put hyperlink back Signed-off-by: Chun Tao <[email protected]> * major update to use FastAPI for wav2lip, and structure component format Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add dockerfiles in animation-compose-cd.yaml Signed-off-by: Chun Tao <[email protected]> * Fix end of file issue in animation-compose-cd.yaml Signed-off-by: Chun Tao <[email protected]> * Fix Docker deployment on Xeon Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add versioning for all pip packages Signed-off-by: Chun Tao <[email protected]> * e2e test script for animation Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update e2e test script Signed-off-by: Chun Tao <[email protected]> * update e2e test script Signed-off-by: Chun Tao <[email protected]> * update readme Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gateway Signed-off-by: Chun Tao <[email protected]> * udpate gateway Signed-off-by: Chun Tao <[email protected]> * Fix AVATAR_CHATBOT Signed-off-by: Chun Tao <[email protected]> * update gateway Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * test Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * update gateway Signed-off-by: Chun Tao <[email protected]> * fix max_tokens in AvatarChatbot gateway Signed-off-by: Chun Tao <[email protected]> * test Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Model download script moved from Dockerfiles to Docker entrypoint Signed-off-by: Chun Tao <[email protected]> * update paths Signed-off-by: Chun Tao <[email protected]> * Correct paths in readme Signed-off-by: Chun Tao <[email protected]> * revert changes to audioqna gateway Signed-off-by: Chun Tao <[email protected]> * longer wait time after docker run Signed-off-by: Chun Tao <[email protected]> * add mount volume in test scripts Signed-off-by: Chun Tao <[email protected]> * add volume mount in test scripts Signed-off-by: Chun Tao <[email protected]> * udpate test script Signed-off-by: Chun Tao <[email protected]> * udpate optimizations Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * need outputs folder Signed-off-by: Chun Tao <[email protected]> * test Signed-off-by: Chun Tao <[email protected]> --------- Signed-off-by: Chun Tao <[email protected]> Signed-off-by: Yeoh, Hoong Tee <[email protected]> Signed-off-by: Ye, Xinyu <[email protected]> Signed-off-by: ZePan110 <[email protected]> Signed-off-by: chensuyue <[email protected]> Co-authored-by: Hoong Tee, Yeoh <[email protected]> Co-authored-by: XinyuYe-Intel <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZePan110 <[email protected]> Co-authored-by: chen, suyue <[email protected]>
wangkl2 · Oct 30, 2024 · 00abba2 · 00abba2
1 parent 9fec226
commit 00abba2
Show file tree

Hide file tree

Showing 7 changed files with 49 additions and 40 deletions.
diff --git a/comps/animation/wav2lip/README.md b/comps/animation/wav2lip/README.md
@@ -84,7 +84,7 @@ docker run --privileged -d --name "wav2lip-service" -p 7860:7860 --ipc=host -w /
 - Gaudi2 HPU
 
 ```bash
-docker run --privileged -d --name "wav2lip-gaudi-service" -p 7860:7860 --runtime=habana --cap-add=sys_nice --net=host --ipc=host -w /home/user/comps/animation/wav2lip -v $(pwd)/comps/animation/wav2lip/assets:/home/user/comps/animation/wav2lip/assets -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PYTHON=/usr/bin/python3.10 -e DEVICE=$DEVICE -e INFERENCE_MODE=$INFERENCE_MODE -e CHECKPOINT_PATH=$CHECKPOINT_PATH -e FACE=$FACE -e AUDIO=$AUDIO -e FACESIZE=$FACESIZE -e OUTFILE=$OUTFILE -e GFPGAN_MODEL_VERSION=$GFPGAN_MODEL_VERSION -e UPSCALE_FACTOR=$UPSCALE_FACTOR -e FPS=$FPS -e WAV2LIP_PORT=$WAV2LIP_PORT opea/wav2lip-gaudi:latest
+docker run --privileged -d --name "wav2lip-gaudi-service" -p 7860:7860 --runtime=habana --cap-add=sys_nice --ipc=host -w /home/user/comps/animation/wav2lip -v $(pwd)/comps/animation/wav2lip/assets:/home/user/comps/animation/wav2lip/assets -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PYTHON=/usr/bin/python3.10 -e DEVICE=$DEVICE -e INFERENCE_MODE=$INFERENCE_MODE -e CHECKPOINT_PATH=$CHECKPOINT_PATH -e FACE=$FACE -e AUDIO=$AUDIO -e FACESIZE=$FACESIZE -e OUTFILE=$OUTFILE -e GFPGAN_MODEL_VERSION=$GFPGAN_MODEL_VERSION -e UPSCALE_FACTOR=$UPSCALE_FACTOR -e FPS=$FPS -e WAV2LIP_PORT=$WAV2LIP_PORT opea/wav2lip-gaudi:latest
 ```
 
 ## 2.2 Run Animation Microservice
@@ -119,7 +119,7 @@ cd GenAIComps
 python3 comps/animation/wav2lip/dependency/check_animation_server.py
 ```
 
-The expected output is a message similar to the following:
+The expected output will be a message similar to the following:
 
 ```bash
 {'wav2lip_result': '....../GenAIComps/comps/animation/wav2lip/assets/outputs/result.mp4'}

diff --git a/comps/animation/wav2lip/assets/audio/sample_whoareyou.json b/comps/animation/wav2lip/assets/audio/sample_whoareyou.json
diff --git a/comps/animation/wav2lip/assets/outputs/result.mp4 b/comps/animation/wav2lip/assets/outputs/result.mp4
diff --git a/comps/animation/wav2lip/assets/outputs/results.mp4 b/comps/animation/wav2lip/assets/outputs/results.mp4
diff --git a/comps/animation/wav2lip/dependency/check_wav2lip_server.py b/comps/animation/wav2lip/dependency/check_wav2lip_server.py
@@ -10,9 +10,9 @@
 outfile = os.environ.get("OUTFILE")
 
 # Read the JSON file
-with open("comps/animation/wav2lip/assets/audio/sample_question.json", "r") as file:
+with open("comps/animation/wav2lip/assets/audio/sample_whoareyou.json", "r") as file:
     data = json.load(file)
 
-inputs = {"audio": data["byte_str"]}
+inputs = {"audio": data["byte_str"], "max_tokens": 64}
 response = requests.post(url=endpoint, data=json.dumps(inputs), proxies={"http": None})
 print(response.json())
diff --git a/comps/animation/wav2lip/dependency/utils.py b/comps/animation/wav2lip/dependency/utils.py
@@ -177,8 +177,10 @@ def face_detect(args, images):
     while 1:
         predictions = []
         try:
-            for i in tqdm(range(0, len(images), batch_size)):
-                predictions.extend(detector.get_detections_for_batch(np.array(images[i : i + batch_size])))
+            with torch.no_grad():
+                for i in tqdm(range(0, len(images), batch_size)):
+                    with torch.autocast(device_type=args.device, dtype=torch.bfloat16):
+                        predictions.extend(detector.get_detections_for_batch(np.array(images[i : i + batch_size])))
         except RuntimeError:
             if batch_size == 1:
                 raise RuntimeError(

diff --git a/comps/animation/wav2lip/dependency/wav2lip_server.py b/comps/animation/wav2lip/dependency/wav2lip_server.py
@@ -84,6 +84,7 @@ async def animate(request: Request):
             ffmpeg.input(args.audio).output("temp/temp.wav", strict="-2").run(overwrite_output=True)
             args.audio = "temp/temp.wav"
     else:
+        print(f"Signature for your audio: {audio_b64_str[:100]}")
         sr, y = base64_to_int16_to_wav(audio_b64_str, "temp/temp.wav")
         args.audio = "temp/temp.wav"
 
@@ -113,40 +114,41 @@ async def animate(request: Request):
     gen = datagen(args, full_frames.copy(), mel_chunks)
 
     # iterate over the generator
-    for i, (img_batch, mel_batch, frames, coords) in enumerate(
-        tqdm(gen, total=int(np.ceil(float(len(mel_chunks)) / batch_size)))
-    ):
-        if i == 0:
-            frame_h, frame_w = full_frames[0].shape[:-1]
-            if args.inference_mode == "wav2lip_only":
-                out = cv2.VideoWriter("temp/result.avi", cv2.VideoWriter_fourcc(*"DIVX"), fps, (frame_w, frame_h))
-            else:
-                out = cv2.VideoWriter(
-                    "temp/result.avi",
-                    cv2.VideoWriter_fourcc(*"DIVX"),
-                    fps,
-                    (frame_w * args.upscale, frame_h * args.upscale),
-                )
-
-        img_batch = torch.FloatTensor(np.transpose(img_batch, (0, 3, 1, 2))).to(device)
-        mel_batch = torch.FloatTensor(np.transpose(mel_batch, (0, 3, 1, 2))).to(device)
-
-        with torch.no_grad():
-            pred = model(mel_batch, img_batch)
-
-        pred = pred.cpu().numpy().transpose(0, 2, 3, 1) * 255.0
-
-        for p, f, c in tqdm(zip(pred, frames, coords), total=pred.shape[0]):
-            y1, y2, x1, x2 = c
-            p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))
-            f[y1:y2, x1:x2] = p  # patching
-
-            # restore faces and background if necessary
-            if args.inference_mode == "wav2lip+gfpgan":
-                cropped_faces, restored_faces, f = model_restorer.enhance(
-                    f, has_aligned=args.aligned, only_center_face=args.only_center_face, paste_back=True
-                )
-            out.write(f)
+    with torch.no_grad():
+        for i, (img_batch, mel_batch, frames, coords) in enumerate(
+            tqdm(gen, total=int(np.ceil(float(len(mel_chunks)) / batch_size)))
+        ):
+            if i == 0:
+                frame_h, frame_w = full_frames[0].shape[:-1]
+                if args.inference_mode == "wav2lip_only":
+                    out = cv2.VideoWriter("temp/result.avi", cv2.VideoWriter_fourcc(*"DIVX"), fps, (frame_w, frame_h))
+                else:
+                    out = cv2.VideoWriter(
+                        "temp/result.avi",
+                        cv2.VideoWriter_fourcc(*"DIVX"),
+                        fps,
+                        (frame_w * args.upscale, frame_h * args.upscale),
+                    )
+
+            img_batch = torch.FloatTensor(np.transpose(img_batch, (0, 3, 1, 2))).to(device)
+            mel_batch = torch.FloatTensor(np.transpose(mel_batch, (0, 3, 1, 2))).to(device)
+
+            with torch.autocast(device_type=args.device, dtype=torch.bfloat16):
+                pred = model(mel_batch, img_batch)
+
+            pred = pred.cpu().to(torch.float32).numpy().transpose(0, 2, 3, 1) * 255.0
+
+            for p, f, c in tqdm(zip(pred, frames, coords), total=pred.shape[0]):
+                y1, y2, x1, x2 = c
+                p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))
+                f[y1:y2, x1:x2] = p  # patching
+
+                # restore faces and background if necessary
+                if args.inference_mode == "wav2lip+gfpgan":
+                    cropped_faces, restored_faces, f = model_restorer.enhance(
+                        f, has_aligned=args.aligned, only_center_face=args.only_center_face, paste_back=True
+                    )
+                out.write(f)
     out.release()
 
     ffmpeg.output(
@@ -160,6 +162,8 @@ async def animate(request: Request):
         acodec="aac",
     ).run(overwrite_output=True)
 
+    args.audio = "None"  # IMPORTANT: Reset audio to None for the next audio request
+
     return {"wav2lip_result": args.outfile}