Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update faster_live_portrait_pipeline.py #87

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

YacratesWyh
Copy link

Rethinking of tracking. Realtime camera need reinitial points.
I still miss the error catching codes. As you would get error if you lost the the cropped face. Need another catch of restart in the run.py.

@YacratesWyh
Copy link
Author

It's just a initiative, to show need of retracking.

@YacratesWyh
Copy link
Author

also, src_lmk_pre should be driving_lmk_pre I guess.

@warmshao
Copy link
Owner

also, src_lmk_pre should be driving_lmk_pre I guess.

yes, driving_lmk_pre indeed fits the context well

@warmshao
Copy link
Owner

also, src_lmk_pre should be driving_lmk_pre I guess.

  1. Original Code Issue: In the PR (Pull Request) at line 293, the code sets face detection to occur every frame, i.e., dri_face = self.model_dict["face_analysis"].predict(img_bgr). This setting may not be necessary as it can affect the performance and speed of the program.

  2. Reason for the Problem: If face detection is performed every frame, then the previously defined dri_lmk_pre (predefined facial landmarks) loses its significance, because every frame will re-detect the landmarks.

  3. Suggested Improvement: It is recommended to check whether the confidence of the keypoints predicted by lmk = self.model_dict["landmark"].predict(img_rgb, self.src_lmk_pre) exceeds a certain threshold to determine if the lmk is valid. If valid, assign the value to self.dri_lmk_pre = lmk.copy(), otherwise set self.dri_lmk_pre=None.

@YacratesWyh
Copy link
Author

YacratesWyh commented Sep 12, 2024

I'm not familiar with trt and onnx, how to get its second-to-last confidence, or it's already wrapped? Or should I follow some transformation code. I don't understand the landmark model input, it can get either the bbx with 2 points, or the 203 points. I guess it may only use the left most or right most or some high dim combination? I mean, would it be faster for 203 input than 2 points? I have no idea even what the model is.
I may give a naive confidence such as track the ratio threshold: facial lmk bbx over image size. But I need to test whether it's effective.
self.dri_lmk_pre=None is not acceptable, resulting in no face detection error, as I've written in 297, after initialization you need a valid return to run the code, where how to deal is beyond my knowledge, you may deal with this exception.

@YacratesWyh
Copy link
Author

YacratesWyh commented Sep 12, 2024

I think it's perfect, I found the model tends to shrink to its crop, so I use axis1 diff as confidence, say 32 pixels at least, and it works perfectly when lose track.

@YacratesWyh
Copy link
Author

YacratesWyh commented Sep 12, 2024

            if self.dri_lmk_pre is None:
                #initialization
                dri_face = self.model_dict["face_analysis"].predict(img_bgr)
                if len(dri_face) == 0:
                    self.dri_lmk_pre = None
                    return None, None, None
                lmk = self.model_dict["landmark"].predict(img_rgb, dri_face[0])
                self.dri_lmk_pre = lmk.copy()
                self.dir_initial = lmk.copy()
            elif self.dri_reanalysis:
                dri_face = self.model_dict["face_analysis"].predict(img_bgr)
                if len(dri_face) == 0:                    
                    # assert self.dri_lmk_pre is not None                    
                    # Temporarily use the frame before lost
                    lmk = self.dir_initial
                else:
                    # Re initialization
                    self.dri_reanalysis = False                    
                    lmk = self.model_dict["landmark"].predict(img_rgb, dri_face[0])
                    self.dri_lmk_pre = lmk.copy()
                    self.dir_initial = lmk.copy()
            else:
                lmk = self.model_dict["landmark"].predict(img_rgb, self.dri_lmk_pre)
                slice = lmk[:,0]
                diff = slice.max()-slice.min()
                if diff < 32: # not confident, say less than 32 pixels                    
                    self.dri_reanalysis = True
                self.dri_lmk_pre = lmk.copy()
           

I also write another choice for lmk when it lose track, but it takes time to trigger the confidence alert, so it's both a bit wierd feeling lag ( thought it's correct).
The weird feeling comes from the moment losing track, a blank img and _pre inputs the model, result in a certain lmk output which is not expected. It takes seconds for it to collapse to on point, and it's not instant. On the other side, the output image is weird because of losing track. It would tic(and seconds later instantly recover to the initial in the upon code), which wouldn't bring a better experience.
166.75574 96.4196 140.22104 90.890335 73.36743 59.77655 72.35101 67.771194 64.598816 56.977997 51.257187 48.209747 45.525314 44.39215 42.601593 40.521576 39.322495 38.322617 38.57997 39.62802 39.698807 40.31526 39.106003 38.948654 38.670624 37.28125 35.295547 34.74524 34.44597 32.799652 32.226685 30.967072
Here's a test for the moment it losing track. The model behaves a quick shaver and slowly gets lower, which is not sure whether it's user's behavior.
135.66116 128.98488 127.559784 121.487 119.745544 111.359344 104.241425 102.787155 97.18617 93.05463 91.474915 88.92752 88.31581 86.15164 85.344666 82.760635 80.00801 79.80339 78.69656 78.57855 77.37436 76.392426 75.97241 75.513596 75.67801 75.231 74.72847 74.665054 74.30463 72.5072 63.508667 63.7666 65.57309 70.1102 71.318726 73.05115 72.690125 72.65547 72.63841 72.21785 70.537125 68.45395 68.00752 65.88606 65.59711 63.513107 61.761673 61.5121 59.6194 59.43741 57.69641 56.94345 56.82074 56.818237 56.998993 57.794098 58.78827 58.98288 58.370728 58.35907 57.1734 57.152924 55.958862 55.04059 54.692657 53.52115 53.25531 51.885864 50.641907 50.421326 49.347656 49.20401 48.35974 47.82431 47.87918 48.12793 48.223236 48.2771 48.29712 48.457336 48.33194 47.912903 47.712463 47.26831 46.958282 46.356476 45.64737 45.364746 44.61136 44.428528 43.599243 43.3468 42.735046 42.533783 42.53006 42.537384 42.705017 42.802917 43.138306 43.13504 42.944153 42.69455 42.64084 42.16098 41.963623 41.52771 40.976013 40.625366 40.093018 39.74762 39.33731 38.850677 38.57599 38.13678 38.08902 38.121338 38.30612 38.251373 38.42508 38.489166 38.528442 38.41971 38.230377 38.013885 37.977844 37.819824 37.223816 37.18634 36.47342 36.399994 35.550934 35.252136 34.852325 34.507904 34.631348 34.24179 33.70529 33.57834 32.93216 32.86258 33.291687 33.52899 33.24179 33.803284 34.10315 34.020905 33.95999 33.960815 34.177338 33.32779 32.34781 26.4859 21.179749 20.723907 31.62561 33.687744 33.49356 33.54706 33.798218 33.621826 33.593903 33.64514 33.48105 32.871094 32.960144 32.346313 32.201263 31.774689 31.532043 31.081451 31.067383 30.966705 31.223572 31.319458 31.232666 30.902039 29.87967 29.353119 29.192505 28.663788 28.495178 28.54242 24.042816 19.81784 18.176117 16.411987 17.69168 26.501648 28.868439 28.178345 27.669617 27.564697 27.260498 27.154694 26.775787 26.860504 26.264893 26.15393 26.21939 25.375916 25.623138 25.191742 25.557251 25.298584 24.047668 23.563904 24.628998 25.271454 25.370544 25.224823 25.522034 25.382599 25.676086 25.783325 25.861786 25.912415 25.82843 25.859344 25.99533 26.279785 26.828918 26.170532 26.548096 26.633545 24.813995 23.682404 23.338196 27.33606 27.357208 27.354523 25.017944 26.79834 26.8779 26.737732 27.085754 27.220581 26.399048 26.304077 27.88797 28.514832 29.21222 29.073486 29.802673 31.660645 32.85858 32.830322 33.243927 33.674713 33.996582 34.75699 34.987152 35.249237 35.9039 36.08899 34.690765 35.783203 35.030823 35.104492 35.812134 36.73282 36.725525 37.846222 38.672546 40.3154 40.866608 41.435547 42.002594 41.83734 41.463135 41.991608 42.091644 43.49292 44.890533 45.325867 47.218872 46.77173 46.834167 46.238007 46.180145 46.93094 47.10608 49.862488 52.522125 52.85852 54.07486 54.79184 56.85202 56.958984 58.304382 60.88846 61.763306 63.23871 64.99899 65.052155 67.15805 67.470184 70.20914 74.16815 74.957825 77.88562 77.5235 79.368744 80.74927 81.88089 86.11719 87.94092 93.74661 98.96896 100.21506 103.365295 103.58493 105.671326 107.51114
Here's another truth test where the user is far away from camera and always kept in track.
Note that even in a low pixels, say 32 even under 20, it behaves normally for the output.

inconfident when shrink 20 pixels, or less than 32 pixels face
@warmshao
Copy link
Owner

            if self.dri_lmk_pre is None:
                #initialization
                dri_face = self.model_dict["face_analysis"].predict(img_bgr)
                if len(dri_face) == 0:
                    self.dri_lmk_pre = None
                    return None, None, None
                lmk = self.model_dict["landmark"].predict(img_rgb, dri_face[0])
                self.dri_lmk_pre = lmk.copy()
                self.dir_initial = lmk.copy()
            elif self.dri_reanalysis:
                dri_face = self.model_dict["face_analysis"].predict(img_bgr)
                if len(dri_face) == 0:                    
                    # assert self.dri_lmk_pre is not None                    
                    # Temporarily use the frame before lost
                    lmk = self.dir_initial
                else:
                    # Re initialization
                    self.dri_reanalysis = False                    
                    lmk = self.model_dict["landmark"].predict(img_rgb, dri_face[0])
                    self.dri_lmk_pre = lmk.copy()
                    self.dir_initial = lmk.copy()
            else:
                lmk = self.model_dict["landmark"].predict(img_rgb, self.dri_lmk_pre)
                slice = lmk[:,0]
                diff = slice.max()-slice.min()
                if diff < 32: # not confident, say less than 32 pixels                    
                    self.dri_reanalysis = True
                self.dri_lmk_pre = lmk.copy()
           

I also write another choice for lmk when it lose track, but it takes time to trigger the confidence alert, so it's both a bit wierd feeling lag ( thought it's correct). The weird feeling comes from the moment losing track, a blank img and _pre inputs the model, result in a certain lmk output which is not expected. It takes seconds for it to collapse to on point, and it's not instant. On the other side, the output image is weird because of losing track. It would tic(and seconds later instantly recover to the initial in the upon code), which wouldn't bring a better experience. 166.75574 96.4196 140.22104 90.890335 73.36743 59.77655 72.35101 67.771194 64.598816 56.977997 51.257187 48.209747 45.525314 44.39215 42.601593 40.521576 39.322495 38.322617 38.57997 39.62802 39.698807 40.31526 39.106003 38.948654 38.670624 37.28125 35.295547 34.74524 34.44597 32.799652 32.226685 30.967072 Here's a test for the moment it losing track. The model behaves a quick shaver and slowly gets lower, which is not sure whether it's user's behavior. 135.66116 128.98488 127.559784 121.487 119.745544 111.359344 104.241425 102.787155 97.18617 93.05463 91.474915 88.92752 88.31581 86.15164 85.344666 82.760635 80.00801 79.80339 78.69656 78.57855 77.37436 76.392426 75.97241 75.513596 75.67801 75.231 74.72847 74.665054 74.30463 72.5072 63.508667 63.7666 65.57309 70.1102 71.318726 73.05115 72.690125 72.65547 72.63841 72.21785 70.537125 68.45395 68.00752 65.88606 65.59711 63.513107 61.761673 61.5121 59.6194 59.43741 57.69641 56.94345 56.82074 56.818237 56.998993 57.794098 58.78827 58.98288 58.370728 58.35907 57.1734 57.152924 55.958862 55.04059 54.692657 53.52115 53.25531 51.885864 50.641907 50.421326 49.347656 49.20401 48.35974 47.82431 47.87918 48.12793 48.223236 48.2771 48.29712 48.457336 48.33194 47.912903 47.712463 47.26831 46.958282 46.356476 45.64737 45.364746 44.61136 44.428528 43.599243 43.3468 42.735046 42.533783 42.53006 42.537384 42.705017 42.802917 43.138306 43.13504 42.944153 42.69455 42.64084 42.16098 41.963623 41.52771 40.976013 40.625366 40.093018 39.74762 39.33731 38.850677 38.57599 38.13678 38.08902 38.121338 38.30612 38.251373 38.42508 38.489166 38.528442 38.41971 38.230377 38.013885 37.977844 37.819824 37.223816 37.18634 36.47342 36.399994 35.550934 35.252136 34.852325 34.507904 34.631348 34.24179 33.70529 33.57834 32.93216 32.86258 33.291687 33.52899 33.24179 33.803284 34.10315 34.020905 33.95999 33.960815 34.177338 33.32779 32.34781 26.4859 21.179749 20.723907 31.62561 33.687744 33.49356 33.54706 33.798218 33.621826 33.593903 33.64514 33.48105 32.871094 32.960144 32.346313 32.201263 31.774689 31.532043 31.081451 31.067383 30.966705 31.223572 31.319458 31.232666 30.902039 29.87967 29.353119 29.192505 28.663788 28.495178 28.54242 24.042816 19.81784 18.176117 16.411987 17.69168 26.501648 28.868439 28.178345 27.669617 27.564697 27.260498 27.154694 26.775787 26.860504 26.264893 26.15393 26.21939 25.375916 25.623138 25.191742 25.557251 25.298584 24.047668 23.563904 24.628998 25.271454 25.370544 25.224823 25.522034 25.382599 25.676086 25.783325 25.861786 25.912415 25.82843 25.859344 25.99533 26.279785 26.828918 26.170532 26.548096 26.633545 24.813995 23.682404 23.338196 27.33606 27.357208 27.354523 25.017944 26.79834 26.8779 26.737732 27.085754 27.220581 26.399048 26.304077 27.88797 28.514832 29.21222 29.073486 29.802673 31.660645 32.85858 32.830322 33.243927 33.674713 33.996582 34.75699 34.987152 35.249237 35.9039 36.08899 34.690765 35.783203 35.030823 35.104492 35.812134 36.73282 36.725525 37.846222 38.672546 40.3154 40.866608 41.435547 42.002594 41.83734 41.463135 41.991608 42.091644 43.49292 44.890533 45.325867 47.218872 46.77173 46.834167 46.238007 46.180145 46.93094 47.10608 49.862488 52.522125 52.85852 54.07486 54.79184 56.85202 56.958984 58.304382 60.88846 61.763306 63.23871 64.99899 65.052155 67.15805 67.470184 70.20914 74.16815 74.957825 77.88562 77.5235 79.368744 80.74927 81.88089 86.11719 87.94092 93.74661 98.96896 100.21506 103.365295 103.58493 105.671326 107.51114 Here's another truth test where the user is far away from camera and always kept in track. Note that even in a low pixels, say 32 even under 20, it behaves normally for the output.

I feel that it's too tricky and not elegant enough. I checked the face_analysis model's prediction for landmarks (lmk) and indeed it doesn't have confidence scores.

@YacratesWyh
Copy link
Author

So what's the plan? At least so far this could retrack your face

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants