[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

Junho Park*, Kyeongbo Kong* and Suk-Ju Kang†

(* Equal contribution, † Corresponding author)

Presented by Sogang University, LG Electronics, and Pusan National University

Primary contact: Junho Park ( [email protected] )

TL;DR

We propose AttentionHand, a novel method for text-driven controllable hand image generation. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt.

What's New

[2024/11/22] ⭐ We release train & inference code! Enjoy! 😄

[2024/08/12] 🚀 Our paper will be introduced as oral presentation at ECCV 2024!

[2024/07/03] 🔥 Our paper is accepted by ECCV 2024!

Install

pip install -r requirements.txt

Inference

~~Download our pre-trained model attentionhand.ckpt from here.~~ I will update the checkpoint ASAP. Alternatively, you can train from the scratch on your own as described in here.
Set your own modalities in samples. (But, we provide some samples for fast implementation.)
Put samples and downloaded weight as follows.

${ROOT}
|-- samples
|   |-- mesh
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt

Run inference.py.

Train from scratch

Download initial model sd15_ini.ckpt from here.
Download pre-processed dataset dataset.tar.gz from here.
Put downloaded weight and dataset as follows.

${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- sd15_ini.ckpt

Run train.py.

Fine-tuning

~~Download our pre-trained model attentionhand.ckpt from here.~~ I will update the checkpoint ASAP. Alternatively, you can train from the scratch on your own as described in here.
Set your own modalities in data as datasets.tar.gz in here.
Put downloaded weight and dataset as follows.

${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt

Change resume_path in train.py to weights/attentionhand.ckpt.
Run train.py.

Related Repositories

Special thank to the great project: ControlNet and Attend-and-Excite!

License and Citation

All assets and code are under the license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{park2024attentionhand,
  author  = {Park, Junho and Kong, Kyeongbo and Kang, Suk-Ju},
  title   = {AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild},
  journal = {European Conference on Computer Vision},
  year    = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cldm		cldm
ldm		ldm
models		models
samples		samples
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
inference.py		inference.py
requirements.txt		requirements.txt
thumbnail.png		thumbnail.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

TL;DR

What's New

Install

Inference

Train from scratch

Fine-tuning

Related Repositories

License and Citation

About

Releases

Packages

Languages

License

redorangeyellowy/AttentionHand

Folders and files

Latest commit

History

Repository files navigation

[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

TL;DR

What's New

Install

Inference

Train from scratch

Fine-tuning

Related Repositories

License and Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages