Training with custom dataset #4

abhigoku10 · 2023-11-25T17:04:03Z

@ArcherFMY thanks a lot for sharing the code base , just had couple of queries

is the training pipeline shared ? if not can u please share the training pipeline
can we train with the custom dataset for different application ? if so what is the quantity of the data required for getting decent output
can the generated images be more photorealistic ? if so what had to be done
Thanks in advvance

ArcherFMY · 2023-11-27T02:36:19Z

@ArcherFMY thanks a lot for sharing the code base , just had couple of queries

is the training pipeline shared ? if not can u please share the training pipeline

can we train with the custom dataset for different application ? if so what is the quantity of the data required for getting decent output

can the generated images be more photorealistic ? if so what had to be done
Thanks in advvance

Hi,

We just use the dreambooth training scripts in diffusers (https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to train the model. You can simply follow the instructions in that repo to train on your custom dataset.

The 'instance_prompt' we used is '<360panorama>' as can be found here (https://github.com/ArcherFMY/SD-T2I-360PanoImage/blob/main/txt2panoimg/text_to_360panorama_image_pipeline.py#L150). The image resolution while training can be set to w=1024 and h=512 (just resize). We use one A100, and we set 'train_batch_size=8' and 'learning_rate=1e-6'. 20,000 to 30,000 steps would be ok.

According to our experience, for high-quality image generation, the quality of the training images is more important than the quantity. For the text-to-360panoimage, the training dataset we use contains about 2000 images (we use data augmentation, such as gradually stitching the right part to the left part, and finally get 20,000 images). All images are 4k resolution and are carefully picked by removing complex scenarios that have complex textures.

The results generated by our base model are close to photorealistic. The final high-resolution images are a little artificial. This is mainly influenced by the GAN model (RealESRGAN). You can try some other super-resolution models or try the image-to-image-with-controlnet to generate images again, using other styled base models.

abhigoku10 · 2024-01-24T13:17:56Z

@ArcherFMY thanks for response, i was able to train the repo for the text2panroma application, what is the training process for image2 panorama.. can you please share the steps

ArcherFMY mentioned this issue May 17, 2024

can we use other models like sdxl or pixart? ArcherFMY/Diffusion360_ComfyUI#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with custom dataset #4

Training with custom dataset #4

abhigoku10 commented Nov 25, 2023

ArcherFMY commented Nov 27, 2023

abhigoku10 commented Jan 24, 2024

Training with custom dataset #4

Training with custom dataset #4

Comments

abhigoku10 commented Nov 25, 2023

ArcherFMY commented Nov 27, 2023

abhigoku10 commented Jan 24, 2024