Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does nunif/waifu2x support pair training (x to y mapping) like the previous version does? #250

Open
SAOMDVN opened this issue Nov 6, 2024 · 5 comments

Comments

@SAOMDVN
Copy link

SAOMDVN commented Nov 6, 2024

Hi. I am quite impressed with the performance of the models and methods used in waifu2x and I want to train my own model based on them. It will be a 1x mapping from an image x (RGBA) to an image y (RGB or L), similar to noise model. However the changes can't be generated on the fly like noise. It seemed the old code supported this functionality: nagadomi/waifu2x#193. Is it possible on this new repo too? If so how do I arrange my dataset files accordingly?

Thank you for reading

@nagadomi
Copy link
Owner

nagadomi commented Nov 7, 2024

Currently not supported.
In previous repo, I used that feature for text block segmentation for manga speech bubble.
If there is some kind of Image to Image conversion model that would be useful and popular, I might be able to support it.

@SAOMDVN
Copy link
Author

SAOMDVN commented Nov 16, 2024

I didn't have much time to understand your codebase but from what I can see, it seems the Waifu2xDataset class generate noise from the no-noise images from the train set and feed it to the model as input and the clean images as expected output. From that knowledge if one modify/override the gen noise function to transform the images in a certain way, they can achieve custom x to y mapping functionality, can't they? Just wanted to make sure my understanding is correct.

Regarding my use-case, I want to train a model that can remove anime illustration background. My targets are very specific illustrations with a solid white background, in a specific artist's artstyle. Of course it is not meant to be a plug-and-use solution and is more of an immediate processing step before I come in and manually remove the background that fit my personal standard.

This picture is the result of a model I trained based on TensorFlow's pix2pix tutorial. Left is the input and right is the ouput, meant to be an alpha mask. The main focus is the edges, as those empty black areas inside the character can be easily filled in by a human. I want the model to perform better and more accurate, and also the model is quite heavy, with a 600MB checkpoint file and 15GB of VRAM usage, that why I seek to use your model instead, as it was able to achieve brilliant quality for both upscale and denoising with a small model.

Miku

@nagadomi
Copy link
Owner

For character(person) segmentation,
characters often cover a larger area of the image, so it is important to use the global context of the entire image.
existing waifu2x models only use a small area, such as 64x64 blocks, so they may not be suitable for that application.

Also, background removal is a popular task, you can see many (pre-trained) models in
https://github.com/danielgatis/rembg?tab=readme-ov-file#models
anime-segmentation: https://github.com/SkyTNT/anime-segmentation/tree/main


For custom x,y input image, x,y are generated from a single image(im) at

x, y = self.transforms(im, im)

To use x,y two different images, it is possible to generate them from two images like

 x, y = self.transforms(im_x, im_y)

@SAOMDVN
Copy link
Author

SAOMDVN commented Dec 15, 2024

Hi. Thanks for the help! I decided to continue with waifu2x before trying out other solutions you suggested. Here is my model current accuracy:
Miku

I trained it with the following command

python train.py waifu2x --method noise --noise-level -1 --num-workers 4 --max-epoch 30 --arch waifu2x.swin_unet_1x --loss lbp --size 64 --disable-amp --data-dir ../data/ --model-dir ../model/

As you can see the edges are a little jagged, I am unsure if this is because of the low number of epochs trained or not big enough datasets (my dataset has 10 images in eval and 18 images in train, all in 4K or above, which when splitted with create_training_data.py becomes 3366 imgs in eval and 5128 imgs in train)

I am currently training 30 more epochs (with --resume) to see it it performs better, in the meanwhile I am seeking any wisdom you have that can help improve my model. Thank you a lot!

For reference here is my fork with my modifications: nunif fork


Edit: After 30 more epochs (60 epoch total) the model doesn't improve a bit. (No * best model updated log message, model .pth md5 hash is the same except for the checkpoint file) 😭

@nagadomi
Copy link
Owner

nagadomi commented Dec 15, 2024

It looks like overfitting to the white color, so you may want to use random background color or composite with random background image.

As for the training commands,
By default, cyclic learning rate is used, and the number of cycles can be specified with --learning-rate-cycles (default 5).
When --max-epoch is low, it may not be good because the learning rate changes in very short cycles.
It may be worth trying --max-epoch 200 --num-samples 5000 (--num-samples 5000 will reduce the time per epoch to 1/10 of the default option).
When changing options, adding --resume --reset-state option allows you to start from a previously trained model.

The fundamental problem, as I wrote above, is that it is difficult to segment the characters from the background in 64x64 small areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants