Skip to content

Official Pytorch implementation for "AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild" (ECCV 2024 Oral)

License

Notifications You must be signed in to change notification settings

redorangeyellowy/AttentionHand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

arXiv Project Page Youtube

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

Junho Park*, Kyeongbo Kong* and Suk-Ju Kang†

(* Equal contribution, † Corresponding author)

TL;DR

We propose AttentionHand, a novel method for text-driven controllable hand image generation. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt.

introduction

What's New

[2024/11/22] ⭐ We release train & inference code! Enjoy! 😄

[2024/08/12] 🚀 Our paper will be introduced as oral presentation at ECCV 2024!

[2024/07/03] 🔥 Our paper is accepted by ECCV 2024!

Install

pip install -r requirements.txt

Inference

  1. Download our pre-trained model attentionhand.ckpt from here. I will update the checkpoint ASAP. Alternatively, you can train from the scratch on your own as described in here.
  2. Set your own modalities in samples. (But, we provide some samples for fast implementation.)
  3. Put samples and downloaded weight as follows.
${ROOT}
|-- samples
|   |-- mesh
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt
  1. Run inference.py.

Train from scratch

  1. Download initial model sd15_ini.ckpt from here.
  2. Download pre-processed dataset dataset.tar.gz from here.
  3. Put downloaded weight and dataset as follows.
${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- sd15_ini.ckpt
  1. Run train.py.

Fine-tuning

  1. Download our pre-trained model attentionhand.ckpt from here. I will update the checkpoint ASAP. Alternatively, you can train from the scratch on your own as described in here.
  2. Set your own modalities in data as datasets.tar.gz in here.
  3. Put downloaded weight and dataset as follows.
${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt
  1. Change resume_path in train.py to weights/attentionhand.ckpt.
  2. Run train.py.

Related Repositories

Special thank to the great project: ControlNet and Attend-and-Excite!

License and Citation

All assets and code are under the license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{park2024attentionhand,
  author  = {Park, Junho and Kong, Kyeongbo and Kang, Suk-Ju},
  title   = {AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild},
  journal = {European Conference on Computer Vision},
  year    = {2024},
}

About

Official Pytorch implementation for "AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild" (ECCV 2024 Oral)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages