Cartoonify

This is a dreambooth model derived from runwayml/stable-diffusion-v1-5 with additional fine-tuning of the text encoder. The weights were trained from a popular animation studio using DreamBooth. The fine-tuned model weights can be found here. Use the tokens disney style in your prompts for the effect.

You can find some example images below for text-to-image translation using the model:

And image-to-image translation using the model:

Getting started

Installation

python -m venv .venv
source .venv/bin/activate
python -m pip install poetry
poetry install

Overview of Codebase

Scripts

./tools/train_dreambooth.py - Fine-tune the model on your own dataset.

./tools/experiment.py - Generate images using the model and log metadata such as seed, classifier-free guidance parameter, prompt, negative prompt, etc.

./tools/rename_images.py - Rename collected images to unique integer id for ease of use.

Cartoonify Package

./src/cartoonify/utils.py - Utility functions for prediction via Huggingface pipeline.

Intended uses & limitations

Demo

You can try image-to-image translation using the model in an interactive UI by running the following command:

python ./demo/app.py

How to use

import torch
from diffusers import StableDiffusionPipeline

# basic usage
repo_id = "lavaman131/cartoonify"
device = torch.device("cuda")
torch_dtype = torch.float16 if device.type in ["mps", "cuda"] else torch.float32
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch_dtype).to(device)
image = pipeline("PROMPT GOES HERE").images[0]
image.save("output.png")

Limitations and bias

As with any diffusion model, playing around with the prompt and classifier-free guidance parameter is required until you get the results you want. Zoomed-out subjects seem to loose clairity in the face. For additional safety in image generation, we use the Stable Diffusion safety checker.

Data preparation details

The data used for training the model was collected from the internet (namely screenshots taken from YouTube). The images were resized to 512x512 using subject-preserving cropping from here and saved. The images were then renamed to unique integer ids using the ./tools/rename_images.py script. The full dataset can be found here.

Training details

The model was fine-tuned for 3500 steps on around 200 images of modern Disney characters, backgrounds, and animals. The ratios for each class were 70%, 20%, and 10% respectively on an RTX A5000 GPU (24GB VRAM).

A report on the training process can be found here.

The training code used can be found here. The regularization images used for training can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
demo		demo
images		images
src/cartoonify		src/cartoonify
tools		tools
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cartoonify

Getting started

Installation

Overview of Codebase

Scripts

Cartoonify Package

Intended uses & limitations

Demo

How to use

Limitations and bias

Data preparation details

Training details

About

Releases

Packages

Contributors 2

Languages

lavaman131/cartoonify

Folders and files

Latest commit

History

Repository files navigation

Cartoonify

Getting started

Installation

Overview of Codebase

Scripts

Cartoonify Package

Intended uses & limitations

Demo

How to use

Limitations and bias

Data preparation details

Training details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages