Welcome to the Gen-AI-Powered-AR-App repository! This project explores various architectures for generating 3D images from 2D images and implements a text-to-image generation model. It also includes a Kotlin application that leverages the Meshy API to visualize 3D models in augmented reality using ARCore.
After extensive experimentation with different models and techniques, we have reached a final architecture that is almost working as expected, with promising results shown in the last day of development. These results are discussed in detail in the final section of this repository.
This repository contains several Jupyter notebooks showcasing the following:
- 3D Image Generation: Implementation of various architectures for converting 2D images into 3D images using the Pix3D dataset.
- Text-to-Image Generation: Techniques for generating images from text descriptions using the CUB200-2011 dataset.
- AR Visualization: A Kotlin application that utilizes the Meshy API to render 3D models in augmented reality based on user prompts.
- Pix3D Dataset: This dataset is used for training models to generate 3D images from 2D images.
- CUB200-2011 Dataset: A dataset for text-to-image generation, containing images of birds with corresponding textual descriptions.
The following notebooks are included in this repository:
3d-Pix3pix.ipynb
: Implementatation and training of 3D-Pix2Pix with a U-Net Generator and a Patch Discriminator.Pix3Pix.ipynb
: Implementation and training of the final version of 3D-Pix2Pix that is working as well as resultsimage2vox-model.ipynb
: Implementation and training of Pix2Vox.Pix2Vox-Pretrained-A.ipynb
: Inference of the pretrainedPix2Vox-A
version.pretrained-pix2vox-F.ipynb
: Inference of the pretrainedPix2Vox-F
version.dcgan-cls.ipynb
: Implementation of a Text-To-Image cGAN, leveraging text descriptions as conditioning inputs to generate corresponding images.dcgan-cls_one_Cat.ipynb
: Implementation of a Text-To-Image cGAN on one category of images due to lack of resources.Notebooks/PIFuHD/
: Exploring PIFuHD from Meta Research for High-Resolution 3D Human Digitization.mesh-reconstruction-pytorch3d.ipynb
: Exploring Pytorch3D from Meta Research.
The Kotlin app provides an interface for users to input prompts, which are processed to visualize a 3D model using the Meshy API and ARCore. This application enhances user interaction by allowing them to see generated 3D models in an augmented reality environment.
- User-friendly interface for inputting prompts.
- Real-time visualization of 3D models in AR.
- Open the Kotlin project in your preferred IDE.
- Ensure the Meshy API is correctly set up and configured.
- Run the application and follow the instructions to visualize 3D models in AR.
To get started with the project, follow these steps:
-
Clone the repository:
git clone https://github.com/Seif-Yasser-Ahmed/Gen-AI-Powered-AR-App.git
-
Navigate to the project directory:
cd Gen-AI-Powered-AR-App
-
Install the required Python packages:
pip install -r requirements.txt
-
Install Android Studio
We would like to express our gratitude to the following repositories and their contributors for their valuable resources:
PiFuHD by MetaResearch for providing the framework and models used in this project. ICML 2016 Text-to-Image Generation for inspiring methodologies in text-to-image generation. Pix2Vox for its implementation and pretrained models, which contributed to our 3D image generation efforts.
This project is licensed under the MIT LISENCE
. See the LICENSE file for more details.