diff --git a/README.md b/README.md index 4da2758..a304920 100644 --- a/README.md +++ b/README.md @@ -240,7 +240,7 @@ Paper by Folder : [📁/survey](https://github.com/OpenHuman-ai/awesome-gesture_ - DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [[paper]](https://openreview.net/pdf?id=swc28UDR8Wk) - The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [[paper]](https://openreview.net/pdf?id=pVBKLqpAUtP) - Am I listening - Evaluating theQuality of Generated Data-driven Listening Motion [[paper]](https://pieterwolfert.com/files/icmi_2023.pdf) -- Unified speech and gesture synthesis using flow matching [[paper]](https://arxiv.org/pdf/2310.05181.pdf) ; [[homepage]](https://shivammehta25.github.io/Match-TTSG/) ; +- Unified speech and gesture synthesis using flow matching [[paper]](https://arxiv.org/pdf/2310.05181.pdf) ; [[homepage]](https://shivammehta25.github.io/Match-TTSG/) ; @@ -380,11 +380,11 @@ Paper by Folder : [📁/survey](https://github.com/OpenHuman-ai/awesome-gesture_ -## 3. Selected Approachs +## 3. Approachs -### 3.1 Selected rule Base approach +### 3.1 Rule Base approach - [1994] Rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents [[paper]]() @@ -458,10 +458,6 @@ This section is -- **not accurate** --> continue edditing - Freeform Body Motion Generation from Speech [[paper]](https://arxiv.org/pdf/2203.02291) ; [[TheTempAccount/Co-Speech-Motion-Generation]](https://github.com/TheTempAccount/Co-Speech-Motion-Generation) ; [[youtube]](https://www.youtube.com/watch?v=Wb5VYqKX_x0) - 【CVMP 2021】 **Flow-VAE** Speech-Driven Conversational Agents using Conditional Flow-VAEs [[paper]]() - - **VQ-VAE** - - - - - **Learnable noise codes** - 【ICCV 2021】 Speech Drives Templates: Co-Speech Gesture Synthesis With Learned Templates ; [[paper]](https://arxiv.org/pdf/2108.08020.pdf) ; [[ShenhanQian/SpeechDrivesTemplates]](https://github.com/ShenhanQian/SpeechDrivesTemplates) ; @@ -494,7 +490,7 @@ This section is -- **not accurate** --> continue edditing - 【CVPR 2022】 Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [[paper]]() - 【AAMAS 2022】 Multimodal analysis of the predictability of hand-gesture properties [[paper]]() - 【ICMI 2022】 **GestureMaster** GestureMaster: Graph-based Speech-driven Gesture Generation [[paper]]() - - 【ICCV 2021】 Speech Drives Templates: Co-Speech Gesture Synthesis With Learned Templates [[paper]]() ; [shenhanqian/speechdrivestemplates](https://github.com/shenhanqian/speechdrivestemplates) ; [[youtube]]() ; [poster](https://shenhanqian.com/assets/2021-07-25-sdt/poster.pdf) + - 【ICCV 2021】 **Audio2Gestures** Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders [[paper]]() - 【IVA 2021】 Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech [[paper]]() ; [[homepage]]() - 【ECCV 2020】 **Mix-StAGE** Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [[paper]]() @@ -520,6 +516,10 @@ This section is -- **not accurate** --> continue edditing ## 5. Learning Objective +- [**FrĂ©chet Inception Distance (FID)**](https://arxiv.org/abs/1706.08500) - +- [**FrĂ©chet Gesture Distance (FGD)**](https://arxiv.org/abs/2009.02119) - +- [**FrĂ©chet Template Distance (FTD)**](https://arxiv.org/abs/2108.08020) - + | Full name | Description | | ------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Adversarial Loss (**Adv**) | Used in Generative Adversarial Networks (GANs), this loss function pits a generator network against a discriminator network, with the goal of the generator producing samples that can fool the discriminator into thinking they are real. | diff --git a/eval/FGD - Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf b/eval/FGD - Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf new file mode 100644 index 0000000..d008663 Binary files /dev/null and b/eval/FGD - Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf differ diff --git a/eval/FID - GANs Trained by a Two Time-Scale Update Rule.pdf b/eval/FID - GANs Trained by a Two Time-Scale Update Rule.pdf new file mode 100644 index 0000000..7d197cd Binary files /dev/null and b/eval/FID - GANs Trained by a Two Time-Scale Update Rule.pdf differ diff --git a/eval/FTD - Speech Drives Templates - Co-Speech Gesture Synthesis with Learned Templates.pdf b/eval/FTD - Speech Drives Templates - Co-Speech Gesture Synthesis with Learned Templates.pdf new file mode 100644 index 0000000..26fbbd9 Binary files /dev/null and b/eval/FTD - Speech Drives Templates - Co-Speech Gesture Synthesis with Learned Templates.pdf differ diff --git a/papers/2020/Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf b/papers/2020/Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf new file mode 100644 index 0000000..d008663 Binary files /dev/null and b/papers/2020/Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity.pdf differ