Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model size reduced to 43Mb! and new Webapp #295

Open
Kikedao opened this issue Mar 19, 2022 · 32 comments
Open

Model size reduced to 43Mb! and new Webapp #295

Kikedao opened this issue Mar 19, 2022 · 32 comments

Comments

@Kikedao
Copy link

Kikedao commented Mar 19, 2022

Hi @xuebinqin !

I made a webapp that uses U2net at its core, site is https://silueta.me, it would be great if you showcase it in the readme :)

I struggled a lot to make it work in a free Heroku instance, limited to 512Mb of RAM, one of the things I achieved, and that I have seen asked for a lot in the issues, is I managed to reduce the model size to 43Mb from the 170Mb of the original model.

So I want to share with you all the reduced size model, feel free to test it and add it to the pretrained models in the sourcecode if you wish so:
https://drive.google.com/file/d/14Uy2F2i59MZONzf4eyRtpnZWiJB60drL/view?usp=sharing

The limited size helps a lot not to reach the soft limit of 512Mb and the hard limit of 1024Mb of RAM in Heroku (instance is destroyed and restarted when you reach 1024Mb and swap memory is used when you surpass 512Mb).

I perceived some degradation in the result with the size reduction that I will show in the following images but I think many people will find it usefull.

Thanks a lot for this awesome model and PLEASE showcase the webapp https://silueta.me , I did it with a lot of love and respect (you will see in the site that I mention U2Net and your research) to showcase it in my portfolio (I'm a frontend/creative technologist), if I manage to get some consistent visits in the site I will hopefully monetize it a little and be able to purchase dedicated hardware resources to bring it to the next level ;)

Obviously I will answer any questions from the community in this thread if you find any of my knowhow usefull, don't be shy.

These are the comparison images from the original model and the reduced size model:

Almost imperceptible difference with human portrait images:
ORIGINAL U2Net Model

original1

Reduced Size Model
reduced1

Here you can detect a little difference in the left shoulder:
ORIGINAL U2Net Model

original1a

Reduced Size Model

reduced1a

The main difference is with Non-human images, here is one of my cats, in this case there is little difference, you can notice a little less of 'shadow' in the overall matting, but the resutl is very good in both:

*ORIGINAL U2Net Model

original2

Reduced Size Model

reduced2

And last one of the biggest examples with loss I managed, my other cat, you can notice the bottom par of the body is way more transparent:

*ORIGINAL U2Net Model

original3

Reduced Size Model

reduced3

@xuebinqin
Copy link
Owner

xuebinqin commented Mar 19, 2022 via email

@anguoyang
Copy link

hi @Kikedao , could you please share your training code? model compression seems interesting. thank you

@Kikedao
Copy link
Author

Kikedao commented Mar 30, 2022

Hi @anguoyang,

For this version I didn't retrain it, I used the pretrained model and quantized it using the quantization utils from onnx-runtime, following the original docs:
https://onnxruntime.ai/docs/performance/quantization.html

As you can see in the screenshots there is a slight loss, but it was good enough for the end result I was searching for, the gain in RAM consumption for inference is massive.

@anguoyang
Copy link

@Kikedao thank you so much for the information. I want to put person segmentation on mobile/edge devices, so I retrained the person segmentation model(lite version) from dennis (https://github.com/dennisbappert/u-2-net-portrait), with supervisely cleaned images(2667 pairs), unfortunately, the result is bad(about 400 epochs), if I just do the quantization, it is difficult to decrease it into very small size(say <5MB), and I also trained with u2net-lite of this original repo, the result was even more worse, it is strange that if we train with SOD datasets on lite model, the result was not so bad, @xuebinqin could you please give us some advices, thank you.

@anguoyang
Copy link

BTW, We trained the u2net-lite of this repo for about 300 epochs, the train loss is around 0.2, and the tar is around 0.02, although it is not small enough, and still in decreasing, the result is much worse than my expectation, as I trained the u2net-lite before with SOD datasets

@Kikedao
Copy link
Author

Kikedao commented Mar 31, 2022

Hi @anguoyang,

I see you are aiming at an extreme small size model (<5Mb).
You can try to quantizise the lite u2net model but I don't think you will get good results. If you start with such a light model you will get very bad inferences.
Such a light model works wonders with other techniques if your input is a video, as you can get precious information from frame to frame and work also with it.
But that technique doesn't work with still images!

The first foreground/background segmentation I managed to train some years ago with my own dataset was based in almost 20k custom images, very good quality cutouts from KNN human operated trimap segmentation, and extended to 80k+ images from some preprocessing of rotation, change of brigtness, contrast, etc.
It worked OK, but worse than what u2net achieves today. Only uploading the model to the GPU (no dreaming of making inference in the CPU at the time), took at least 650Mb of the VRAM of the GPU card.
Only loading the model, a lot more V/RAM when inference was running.

I've worked with models around 5Mb for cute little things like pitch and tone detection, running in the browser client, but I have a strong intuition 5Mb is WAY too low for any good CNN computer vision model.

Maybe I'm very wrong, if that's the case please correct me:
Imagine a 512x512 image, a single image. It's a total of 262,144 pixels. Now Multiply that by 24 bits per pixel. It gives you 6,292,456 bits. That's 6,144 Mb more or less.
It will take at least 6+ Mb of RAM (GPU VRAM or system RAM), of uncompressed data.
And that's only a single 512x512 little image.
So when we talk about a CNN model that can make segmentation of foreground/background with any additional human input, even animals and objects... we have to expect it to be muuuch larger, both compressed in disk as uploaded in RAM or in the buffers of the VRAM.

The model I quatizised takes only 43Mb as a file, a good Disk/RAM weight optimization if you ask me, but the inference can bring the RAM usage to 500Mb+ anytime, no way to use it in a mobile system in a mainstream app if you don't want to crash the average user device (sadly).

@Kikedao
Copy link
Author

Kikedao commented Mar 31, 2022

@anguoyang, as I finished writing the last comment I had an idea.

If I'm not wrong u2net is trained based from a public dataset of 320x320px images.
I'm pretty sure the inference code resamples the input image to 320x320px in oder to finally generate the alpha output mask.
And in my experience a good 320x320 output alpha mask works NICE even when applied with HD or even 4K images (check the screenshots of my first post of this thread for proof).

So, I would try that:

  • Take the original u2net training code, take the original 320x320 training dataset. Execute for many Epochs until loss.
  • Compare inference from your resulting model to the pretrained model.
  • If your custom trained model works the same as the pretrained model then I would preprocess the dataset to scale it to half, 160x160px, tweak the train code a little to fit the new size.
  • Run the training code with your new low-resolution dataset.

As you halve one side dimension of a square the area is reduced not by half but 4 times, so 'maybe' the weight of the model should be reduced in the same proportion, 4 times..
I don't think it works 100% like a linear function for a CNN model, but it's worth a try!
You do that and then run the quantization as I did with the pretrained code.

The quantizised model I proposed is 43Mb from the original 170Mb, so, maybe you can reach at least <10Mb due to the half resolution of the input I'm proposing.

I'm just a frontend developer, I'm not a data scientist nor an expert in machine learning, maybe this suggestion is silly, but I would give it a try.

If it works...share your model and results with all of us please ;)

@mutsuyuki
Copy link

hi @Kikedao
Thanks for the great job.

If you don't mind, can you tell me which pre-trained model you select?
There are many learned models in the repository so that I would like to refer to.

@its-jd
Copy link

its-jd commented Apr 7, 2022

Hi @anguoyang,

For this version I didn't retrain it, I used the pretrained model and quantized it using the quantization utils from onnx-runtime, following the original docs: https://onnxruntime.ai/docs/performance/quantization.html

As you can see in the screenshots there is a slight loss, but it was good enough for the end result I was searching for, the gain in RAM consumption for inference is massive.

Hi, this looks very impressive!!!
Could you please give me some guidance or refer to the article, which shows how can I use this model? Is this model compatible with original u2net scripts from this repo?

@Kikedao
Copy link
Author

Kikedao commented Apr 10, 2022

Hi @mutsuyuki and @its-jd , glad you liked the app!

I used the official 'standard' u2net.onnx model from this repo and quantized it as I explain in the previous comments.

You can download and use my quantized model with the code of this repo, as easy as replacing the original file of the model with mine, no code modification required.

Also if you are curious in my web app I implemented some optimizations in pre and post processing of the input image and output mask just to try to get slightly better results and mainly to try to optimize memory comsumption as I have it running in a free Heroku instance with very tight RAM limitations.

I have a batch of optimizations I wasn't able yet to deploy in production because of the computing limitations of the hosting but I would like to use if I get a better server in the future. The idea is using u2net to get a mask but not using it directly for the output, instead using it to generate a trimap and perform KNN or CloseForm alpha matting for even better results (inspired by the awesome PyMatting library: https://pymatting.github.io/).

I have a little Ethereum minning rig at my home with 6 GPUs, 8GB VRAM each one of them so 48GB of sweet DDR5 memory (compare it to the 512Mb of 'normal' RAM limitation of Heroku ;) ), that I plan to use in the future to try to deploy the model and bring it to the next level with the optimizations I mentioned before and also retraining it a with a larger dataset I already have. But at the time I can't afford to invest the time of creating that service, I just did it to showcase that I can use and deploy ML models in a (humble) production scale with a clean profesional UI to add it to the rest of my portfolio as a frontend/creative developer.

Feel free to ask me anything, and sorry if my responses are too extense, I try not to answer just the direct questions but also give additional insights to anyone that can read this thread, it's my way to try to give back a little to the community that has helped myself so much with the amazing work of so many contributors.

@StefanHavbro
Copy link

@Kikedao awesome Work !

I have tried to send you an email to your Gmail mentioned at your webapp, but it was returned to med with error "email does not exist" :)
Is there another mail i can contact you on?

Kind regards, Stefan Nielsen

@ioskevinshah
Copy link

@Kikedao , are you able to make it work for AWS?

@hungtooc
Copy link

hungtooc commented Apr 14, 2022

Hi @anguoyang,

For this version I didn't retrain it, I used the pretrained model and quantized it using the quantization utils from onnx-runtime, following the original docs: https://onnxruntime.ai/docs/performance/quantization.html

As you can see in the screenshots there is a slight loss, but it was good enough for the end result I was searching for, the gain in RAM consumption for inference is massive.

Hi, thank for onnx file.
But when i inference onnx on CUDA, the inference time are slower than original weights!!
I tested images in test_data/test_human_images, the inference time (s):

  • with original weights (.pth):

inferencing: Athlete-Intake.jpg 0.0629429817199707
inferencing: language_1280p.jpg 0.035804033279418945
inferencing: coach-yelling-at-athlete-716268.jpg 0.01948714256286621
inferencing: julia+trotti_17.jpg 0.03343081474304199
inferencing: 2019-LADIES-NIGHT-2ND-GOMES.jpg 0.019161701202392578

  • with your onnx weights:

inferencing: Athlete-Intake.jpg 0.6743252277374268
inferencing: language_1280p.jpg 0.5101361274719238
inferencing: coach-yelling-at-athlete-716268.jpg 0.5202233791351318
inferencing: julia+trotti_17.jpg 0.5090630054473877
inferencing: 2019-LADIES-NIGHT-2ND-GOMES.jpg 0.48451828956604004

Here are my code onnx:

from time import time
import onnxruntime as rt
import numpy as np
import onnxruntime
import time
import os
from skimage import io, transform
import torch
import torchvision
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms#, utils
# import torch.optim as optim

import numpy as np
from PIL import Image
import glob

from data_loader import RescaleT
from data_loader import ToTensor
from data_loader import ToTensorLab
from data_loader import SalObjDataset

from model import U2NET # full size version 173.6 MB

# normalize the predicted SOD probability map
def normPRED(d):
    ma = torch.max(d)
    mi = torch.min(d)

    dn = (d-mi)/(ma-mi)

    return dn

def save_output(image_name,pred,d_dir):

    predict = pred
    predict = predict.squeeze()
    predict_np = predict.cpu().data.numpy()

    im = Image.fromarray(predict_np*255).convert('RGB')
    img_name = image_name.split(os.sep)[-1]
    image = io.imread(image_name)
    imo = im.resize((image.shape[1],image.shape[0]),resample=Image.BILINEAR)

    pb_np = np.array(imo)

    aaa = img_name.split(".")
    bbb = aaa[0:-1]
    imidx = bbb[0]
    for i in range(1,len(bbb)):
        imidx = imidx + "." + bbb[i]

    imo.save(d_dir+imidx+'.png')

def main():

    # --------- 1. get image path and name ---------
    model_name='u2net'


    image_dir = os.path.join(os.getcwd(), 'test_data', 'test_human_images')
    prediction_dir = os.path.join(os.getcwd(), 'test_data', 'test_human_images' + '_results' + os.sep)
    model_dir = os.path.join(os.getcwd(), 'saved_models', model_name+'_human_seg', model_name + '_human_seg.pth')

    img_name_list = glob.glob(image_dir + os.sep + '*')
    print(img_name_list)

    # --------- 2. dataloader ---------
    #1. dataloader
    test_salobj_dataset = SalObjDataset(img_name_list = img_name_list,
                                        lbl_name_list = [],
                                        transform=transforms.Compose([RescaleT(320),
                                                                      ToTensorLab(flag=0)])
                                        )
    test_salobj_dataloader = DataLoader(test_salobj_dataset,
                                        batch_size=1,
                                        shuffle=False,
                                        num_workers=1)

    # --------- 3. model define ---------
    # if(model_name=='u2net'):
    #     print("...load U2NET---173.6 MB")
    #     net = U2NET(3,1)

    # if torch.cuda.is_available():
    #     net.load_state_dict(torch.load(model_dir))
    #     net.cuda()
    # else:
    #     net.load_state_dict(torch.load(model_dir, map_location='cpu'))
    # net.eval()
    onnx_path = "u2net.quant.onnx"
    session = onnxruntime.InferenceSession(onnx_path, providers=['CUDAExecutionProvider'])
    binding = session.io_binding()
    input_names = [item.name for item in session.get_inputs()]
    output_names = [item.name for item in session.get_outputs()]
    Y_shape = (1, 1, 320, 320) # You need to specify the output PyTorch tensor shape
    Y_list = [(output_names[i], torch.zeros(Y_shape, dtype=torch.float32, device='cuda:0').contiguous()) for i in range(len(output_names))]
    for Y in Y_list:
        binding.bind_output(
            name=Y[0],
            device_type='cuda',
            device_id=0,
            element_type=np.float32,
            shape=tuple(Y[1].shape),
            buffer_ptr=Y[1].data_ptr(),
        )
    # --------- 4. inference for each image ---------
    for i_test, data_test in enumerate(test_salobj_dataloader):
        t1 = time.time()
        inputs_test = data_test['image'].contiguous().cuda()
        inputs_test = inputs_test.float() # convert to float32
        binding.bind_input(
            name=input_names[0],
            device_type='cuda',
            device_id=0,
            element_type=np.float32,
            shape=tuple(inputs_test.shape),
            buffer_ptr=inputs_test.data_ptr(),
            )
        
        session.run_with_iobinding(binding)

        # normalization
        pred = Y_list[0][1][:,0,:,:]
        pred = normPRED(pred)
        t2 = time.time()
        print("inferencing:",img_name_list[i_test].split(os.sep)[-1], t2 - t1)
        # save results to test_results folder
        if not os.path.exists(prediction_dir):
            os.makedirs(prediction_dir, exist_ok=True)
        save_output(img_name_list[i_test],pred,prediction_dir)

        # del d1,d2,d3,d4,d5,d6,d7

if __name__ == "__main__":
    main()

update: I converted onnx on my way, the result are faster than original weights:
https://drive.google.com/file/d/1DskrMqylHThqGgEDGDH3Pc4Raj5Ww_TN/view?usp=sharing

@Kikedao
Copy link
Author

Kikedao commented Apr 19, 2022

Hi @StefanHavbro , sorry the email was wrong in the web, I corrected it and now it should show right in the footer of the web app.
I fear the Spam, you can't imagine the amount of interesting pictures I get in the google drive I created just to post the quantized model, but here it is for anyone interested in contacting me: [email protected]


Hi @ioskevinshah , it should be 'easy' to make it run in AWS, it depends on the EC2 instance you have but I don't think it would be harder than in Heroku where I have it now. In my experience the problem is it would be VERY expensive to do it.

I used Flask and Gunicorn in Heroku. In the past I've used AWS Sagemaker and AWS EC2 with Lambda and Gateway to deploy similar models...but the cost is just too much and with little control over the expenses. I'm talking about EC2 instances with machine learning GPUs at that time (6-5 years ago), one of the amazing thinks about the u2net model is you can run it fast enough in a CPU and not only in CUDA/GPU, but AWS for me is a no go for a personal free project.


Hi @hungtooc , wow, thanks a lot for your tests, it's very impressive how much the computation cost has risen when run in the GPU with CUDA with the quantized model, it's very counter-intuitive.

I don't have benchmarks but I can assure you in CPU the speed was very little affected, the app is running in a somewhat low-end CPU in Heroku (I think the free instances are a 'virtual' 2 core CPU at 2,5Ghz) and it takes at most 10 to 15 seconds depending on input image (almost same inference time before quantization).

In my personal computer the inference is way faster in CPU, 1 second more or less (Ryzen 3600 CPU, 12 cores at 4.2 Ghz Max, with fast RAM and overall decent hardware).

But your tests are in GPU and we are not talking about 10 seconds, but 0.5 secs at most even using my model, it's another world! With the original model you could almost do inference at realtime video, amazing metrics.

I can't install CUDA in my machine right now and test it so I can't help you, I'm very sorry. It's strange you get such a big performance loss with the quantized model, around 10X worse in GPU, but I have no clue about it.

If I understood correctly quantization is supposed to remove things like big float numbers from the weights (changes integers from 32 and 16 bits to 8 bits for example), reducing the model size a lot, but maybe this affects the computation in the inference in a bad way when using numpy and parallel tools like CUDA... I have no idea, in the end I'm just an amateur at ML :)

@Kikedao
Copy link
Author

Kikedao commented Apr 19, 2022

I would love to share the code of the whole app, but one of the main things I worked on was security, and I can't share it for the following reason, let me explain it a little:

The inference code I used is almost the same as the inference code from this repo, my work and main difference was this:

  • Tweaks and optimizations in the original code to reduce de RAM footprint, Eg. removing cv2 , etcetera.
  • The quantization reduced the RAM usage A LOT, from almost 2,2Gb of RAM to 400-600Mb. That was mandatory to make it run in a free Heroku dyno. There are still some memory leaks and it crashes sometimes.
  • Obviously converting it to a web service, using Flask and Gunicorn.
  • A LOT of work in the frontend part, lots of canvas manipulation and user input scenarios, EXIF data, responsive desktop/mobile UI, high DPI canvas compatibility etcera.
  • AND security.

I've worked over the years with websites with millions of users in a month, sometimes even millions in 3-4 days.
They attack you very hard when you get such an amount of users, always.
So I have the inertia to invest quite a lot of time trying to protect at least a little the web services I make.
Security by obfuscation is never the way, but that doesn't mean you can give hints about what you have done to try to mitigate it. That's why I can't share the code.

Developing an awesome machine learning model and deploying it in production, even at a humble scale like mine, are two very different things, it takes a different skillset.

I'm much better at the second one than the first so feel free to ask me anything, if I can help I will.

@hungtooc
Copy link

Hi @Kikedao, here are my inference code for onnx i executed it on Colab free GPU https://colab.research.google.com/drive/1yfgGdoFDEOrGQCB2Cevl-GAAUChPQlzI?usp=sharing

@Kikedao
Copy link
Author

Kikedao commented Apr 19, 2022

@hungtooc I just ran your Colab Notebook, at first glance it's very nice, the output speed metrics are awesome.

But come on, I'm a frontend programmer, why don't you show me/us the input and output images? :)

Talking more seriously, it would be nice if you post input and output image comparions, side by side, so people reading this thread in the future can obtain good info.

@nissansz
Copy link

Hi @Kikedao, here are my inference code for onnx i executed it on Colab free GPU https://colab.research.google.com/drive/1yfgGdoFDEOrGQCB2Cevl-GAAUChPQlzI?usp=sharing

can you help send code to convert original model to onnx? [email protected]

@nissansz
Copy link

Hi @xuebinqin !

I made a webapp that uses U2net at its core, site is https://silueta.me, it would be great if you showcase it in the readme :)

I struggled a lot to make it work in a free Heroku instance, limited to 512Mb of RAM, one of the things I achieved, and that I have seen asked for a lot in the issues, is I managed to reduce the model size to 43Mb from the 170Mb of the original model.

So I want to share with you all the reduced size model, feel free to test it and add it to the pretrained models in the sourcecode if you wish so: https://drive.google.com/file/d/14Uy2F2i59MZONzf4eyRtpnZWiJB60drL/view?usp=sharing

The limited size helps a lot not to reach the soft limit of 512Mb and the hard limit of 1024Mb of RAM in Heroku (instance is destroyed and restarted when you reach 1024Mb and swap memory is used when you surpass 512Mb).

I perceived some degradation in the result with the size reduction that I will show in the following images but I think many people will find it usefull.

Thanks a lot for this awesome model and PLEASE showcase the webapp https://silueta.me , I did it with a lot of love and respect (you will see in the site that I mention U2Net and your research) to showcase it in my portfolio (I'm a frontend/creative technologist), if I manage to get some consistent visits in the site I will hopefully monetize it a little and be able to purchase dedicated hardware resources to bring it to the next level ;)

Obviously I will answer any questions from the community in this thread if you find any of my knowhow usefull, don't be shy.

These are the comparison images from the original model and the reduced size model:

Almost imperceptible difference with human portrait images: ORIGINAL U2Net Model

original1

Reduced Size Model reduced1

Here you can detect a little difference in the left shoulder: ORIGINAL U2Net Model

original1a

Reduced Size Model

reduced1a

The main difference is with Non-human images, here is one of my cats, in this case there is little difference, you can notice a little less of 'shadow' in the overall matting, but the resutl is very good in both:

*ORIGINAL U2Net Model

original2

Reduced Size Model

reduced2

And last one of the biggest examples with loss I managed, my other cat, you can notice the bottom par of the body is way more transparent:

*ORIGINAL U2Net Model

original3

Reduced Size Model

reduced3

can you help send code to convert original model to onnx? [email protected]

@deshwalmahesh
Copy link

Hi @Kikedao , this is an awesome app you have built. Much respect. I just have 1 question. Are you using any other pre or post processing method too? I compared the official implementation's result and you web app result and even when you've quantized the weights, your results are better on a couple of images which I tried with original model. I'm just curious to know how could that be until unless you're using some extra pre or post processing methods given you say that you have not re trained the model.

@juergengunz
Copy link

@Kikedao could you share what pre and post processing you are using in your app? results are way better than original

@aycaecemgul
Copy link

@Kikedao could you share the pre and post processing you are using? I have tried some examples and the difference is astronomical. it is as if you are using another model. I've tried some morphological filters and tried to tune alpha matting but they did not help at all.

@animus22
Copy link

animus22 commented Jun 16, 2022

Hi @Kikedao, Thank you for sharing your great work!! It really works well, even seems better than SOTA matting algorithme for some images.
I was working on extracting NON-human foreground object, and for some objects, your works seems having more precise edge extraction.
I was wondering if there's any finetuning job or any post processing job for Non-human images.

Here are some result of non human samples.
The combined results are works of MODNet
silhouette_me_alpha (3)
silhouette_me_alpha (2)
silhouette_me_alpha (1)
combined_shoe1
combined_test_animal_01
combined_test_object_01

@sampatsharma143
Copy link

@Kikedao @anguoyang @juergengunz @xuebinqin @mutsuyuki i am new to the world of ML and AL , and i don't know how to convert this .pth file into onnx file which i can use in the rembg code.
please help me if anyone knows how to do it , with code.

@deshwalmahesh
Copy link

@Kikedao @anguoyang @juergengunz @xuebinqin @mutsuyuki i am new to the world of ML and AL , and i don't know how to convert this .pth file into onnx file which i can use in the rembg code. please help me if anyone knows how to do it , with code.

A simple Google search would give you hundreds of blogs on how to do it. Start from this official one

@sampatsharma143
Copy link

@deshwalmahesh thanks for the reply , the problem was that , i was not able to create the dummy inputs for onnx conversion but i found the code that convert the u2net.pth to u2net.onnx .

@deshwalmahesh
Copy link

@deshwalmahesh thanks for the reply , the problem was that , i was not able to create the dummy inputs for onnx conversion but i found the code that convert the u2net.pth to u2net.onnx .

Keep on going further in that blog, you'll find a code where they show you the inference too. Start from basic of concepts instead of directly using it on advanced concepts.

@NingNanXin
Copy link

I convert .pth to .onnx but get the wrong inference result with onnx. You can see details in this issue WRONG. I spent about 3 days trying to solve this problem but no results. So sad...

Can you take a look at what is wrong? Or should I modify the model structure? Thanks for your help.

@Edward-Zhou
Copy link

@Kikedao Thanks for sharing the smaller model. I also have the size limitation while deploy app. When i use with rembg, it will automatically download the models, could you share how to make rembg work with your model if I include your model file in my project as a file?

@SuyueLiu
Copy link

Thanks for your great work.
I saw on your website that the post-processing method of closed-form matting is used. Can you give a more detailed introduction on how to do it?

@farazBhatti
Copy link

torch to onnx model conversion script:

from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

from data_loader import RescaleT
from data_loader import ToTensor
from data_loader import ToTensorLab
from data_loader import SalObjDataset

from model import U2NET # full size version 173.6 MB
from model import U2NETP # small version u2net 4.7 MB

def main():
    model_name='u2net'
    export_directory = "exported models"

    if not os.path.exists(export_directory):
        os.makedirs(export_directory)

    model_dir = os.path.join(os.getcwd(), 'saved_models', model_name, model_name + '.pth')
    image_dir = os.path.join(os.getcwd(), 'test_data', 'test_images')
    img_name_list = glob.glob(image_dir + os.sep + '*')

    test_salobj_dataset = SalObjDataset(img_name_list = img_name_list,
                                        lbl_name_list = [],
                                        transform=transforms.Compose([RescaleT(320),
                                                                        ToTensorLab(flag=0)])
                                        )
    test_salobj_dataloader = DataLoader(test_salobj_dataset,
                                        batch_size=1,
                                        shuffle=False,
                                        num_workers=1)

    if(model_name=='u2net'):
        print("...load U2NET---173.6 MB")
        net = U2NET(3,1)
    elif(model_name=='u2netp'):
        print("...load U2NETP---4.7 MB")
        net = U2NETP(3,1)

    # net.load_state_dict(torch.load(model_dir),map_location=torch.device('cpu'))
    if torch.cuda.is_available():
        net.load_state_dict(torch.load(model_dir))
        net.cuda()
    else:
        net.load_state_dict(torch.load(model_dir, map_location='cpu'))
    net.eval()

    for data_test in test_salobj_dataloader:
        inputs_test = data_test['image']

        inputs_test = inputs_test.type(torch.FloatTensor)
        dummy_input = inputs_test # torch.randn(1, 3, 320, 320).type(torch.FloatTensor)
        torch.onnx.export(net, dummy_input, export_directory+"/"+model_name+".onnx",opset_version=10)

        break



if __name__ == "__main__":
    main()`

@theranajayant
Copy link

Hi @xuebinqin !

I made a webapp that uses U2net at its core, site is https://silueta.me, it would be great if you showcase it in the readme :)

I struggled a lot to make it work in a free Heroku instance, limited to 512Mb of RAM, one of the things I achieved, and that I have seen asked for a lot in the issues, is I managed to reduce the model size to 43Mb from the 170Mb of the original model.

So I want to share with you all the reduced size model, feel free to test it and add it to the pretrained models in the sourcecode if you wish so: https://drive.google.com/file/d/14Uy2F2i59MZONzf4eyRtpnZWiJB60drL/view?usp=sharing

The limited size helps a lot not to reach the soft limit of 512Mb and the hard limit of 1024Mb of RAM in Heroku (instance is destroyed and restarted when you reach 1024Mb and swap memory is used when you surpass 512Mb).

I perceived some degradation in the result with the size reduction that I will show in the following images but I think many people will find it usefull.

Thanks a lot for this awesome model and PLEASE showcase the webapp https://silueta.me , I did it with a lot of love and respect (you will see in the site that I mention U2Net and your research) to showcase it in my portfolio (I'm a frontend/creative technologist), if I manage to get some consistent visits in the site I will hopefully monetize it a little and be able to purchase dedicated hardware resources to bring it to the next level ;)

Obviously I will answer any questions from the community in this thread if you find any of my knowhow usefull, don't be shy.

These are the comparison images from the original model and the reduced size model:

Almost imperceptible difference with human portrait images: ORIGINAL U2Net Model

original1

Reduced Size Model reduced1

Here you can detect a little difference in the left shoulder: ORIGINAL U2Net Model

original1a

Reduced Size Model

reduced1a

The main difference is with Non-human images, here is one of my cats, in this case there is little difference, you can notice a little less of 'shadow' in the overall matting, but the resutl is very good in both:

*ORIGINAL U2Net Model

original2

Reduced Size Model

reduced2

And last one of the biggest examples with loss I managed, my other cat, you can notice the bottom par of the body is way more transparent:

*ORIGINAL U2Net Model

original3

Reduced Size Model

reduced3

This app is having CORS error and getting stuck at starting: "Access to XMLHttpRequest at 'https://superportraitsegmentation.herokuapp.com/status' from origin 'https://silueta.me' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource." shown in browser console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests