Skip to content

An AI-Powered Augmented Reality App project that aims to develop a mobile application that uses generative AI models that allow users to generate unique 3D models simply by describing them in text.

License

Notifications You must be signed in to change notification settings

Seif-Yasser-Ahmed/Gen-AI-Powered-AR-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gen-AI-Powered-AR-App

Welcome to the Gen-AI-Powered-AR-App repository! This project explores various architectures for generating 3D images from 2D images and implements a text-to-image generation model. It also includes a Kotlin application that leverages the Meshy API to visualize 3D models in augmented reality using ARCore.

After extensive experimentation with different models and techniques, we have reached a final architecture that is almost working as expected, with promising results shown in the last day of development. These results are discussed in detail in the final section of this repository.

Team Members

Table of Contents

Project Overview

This repository contains several Jupyter notebooks showcasing the following:

  1. 3D Image Generation: Implementation of various architectures for converting 2D images into 3D images using the Pix3D dataset.
  2. Text-to-Image Generation: Techniques for generating images from text descriptions using the CUB200-2011 dataset.
  3. AR Visualization: A Kotlin application that utilizes the Meshy API to render 3D models in augmented reality based on user prompts.

Datasets

  • Pix3D Dataset: This dataset is used for training models to generate 3D images from 2D images.
  • CUB200-2011 Dataset: A dataset for text-to-image generation, containing images of birds with corresponding textual descriptions.

Notebooks

The following notebooks are included in this repository:

  1. 3d-Pix3pix.ipynb: Implementatation and training of 3D-Pix2Pix with a U-Net Generator and a Patch Discriminator.
  2. Pix3Pix.ipynb: Implementation and training of the final version of 3D-Pix2Pix that is working as well as results
  3. image2vox-model.ipynb: Implementation and training of Pix2Vox.
  4. Pix2Vox-Pretrained-A.ipynb: Inference of the pretrained Pix2Vox-A version.
  5. pretrained-pix2vox-F.ipynb: Inference of the pretrained Pix2Vox-F version.
  6. dcgan-cls.ipynb: Implementation of a Text-To-Image cGAN, leveraging text descriptions as conditioning inputs to generate corresponding images.
  7. dcgan-cls_one_Cat.ipynb: Implementation of a Text-To-Image cGAN on one category of images due to lack of resources.
  8. Notebooks/PIFuHD/: Exploring PIFuHD from Meta Research for High-Resolution 3D Human Digitization.
  9. mesh-reconstruction-pytorch3d.ipynb: Exploring Pytorch3D from Meta Research.

Kotlin Application

The Kotlin app provides an interface for users to input prompts, which are processed to visualize a 3D model using the Meshy API and ARCore. This application enhances user interaction by allowing them to see generated 3D models in an augmented reality environment.

Key Features:

  • User-friendly interface for inputting prompts.
  • Real-time visualization of 3D models in AR.

Running the Application

  • Open the Kotlin project in your preferred IDE.
  • Ensure the Meshy API is correctly set up and configured.
  • Run the application and follow the instructions to visualize 3D models in AR.

drawing drawing

Installation

To get started with the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Seif-Yasser-Ahmed/Gen-AI-Powered-AR-App.git
  2. Navigate to the project directory:

    cd Gen-AI-Powered-AR-App
  3. Install the required Python packages:

    pip install -r requirements.txt
  4. Install Android Studio

Acknowledgments

We would like to express our gratitude to the following repositories and their contributors for their valuable resources:

PiFuHD by MetaResearch for providing the framework and models used in this project. ICML 2016 Text-to-Image Generation for inspiring methodologies in text-to-image generation. Pix2Vox for its implementation and pretrained models, which contributed to our 3D image generation efforts.

License

This project is licensed under the MIT LISENCE. See the LICENSE file for more details.

About

An AI-Powered Augmented Reality App project that aims to develop a mobile application that uses generative AI models that allow users to generate unique 3D models simply by describing them in text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published