Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure getting EditGAN to run in Colab #8

Open
dubtor opened this issue Apr 8, 2022 · 12 comments
Open

Failure getting EditGAN to run in Colab #8

dubtor opened this issue Apr 8, 2022 · 12 comments

Comments

@dubtor
Copy link

dubtor commented Apr 8, 2022

Thanks for sharing your code! I tried getting this to load up in Google Colab. After some hassle and experimenting, I got it to the state where it's loading up at least.

If I try to click something like checking the box, or uploading a file, I see the following reports:
image

image

The UI seems unresponsive. It is not perfectly clear whether that is due to a problem, or because I don't know how to use the software. Any hints?

Thank you 🙏

@dubtor
Copy link
Author

dubtor commented Apr 8, 2022

Well, just seeing myself that it semed to have failed to load the demo_origin.js -- which is probably the reason it fails to see the JS functions. Will update this issue.

@arieling
Copy link
Collaborator

arieling commented Apr 8, 2022

Demo js is at
static/demo_origin.js

Could you please also let me know how it's going on google Colab? Would like to help and update the Colab option in released code if possible

@dubtor
Copy link
Author

dubtor commented Apr 10, 2022

Thank you @arieling - I have not yet gotten it to work properly, but at least solved the initial problem (which is why I am also adjust the title of this ticket).

I managed to create a colab environment using the mentioned package versions and added ngrok to run_app.py to create a tunnel from the localhost:8888 to a public URL. I had some trouble loading the local files from within the 'index.html' because they seemed to be a CORS issues, so the files 'demo_origin.js' and 'demo.css' would not load. I was able to work around this for the moment by inlining both the script and the CSS into the index.html itself.

I got the app loading up now and I can draw on the left and click the middle button (which I guess from the videos is the 'process' button). Once I click it, it looks like the app is running out of CUDA memory.

This is where I am currently at. I dont know if the memory is really full, or if maybe something else isn't working properly. I am running on Google Colab Pro+ with extended RAM.

The reported hardware is
`Sun Apr 10 12:35:19 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 35C P0 27W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+`

The log report of the run_app.py is as follows:
`ngrok: no process found
Starting server...
Server ready...
Open URL in browser: NgrokTunnel: "http://e171-34-90-74-42.ngrok.io/" -> "http://localhost:8888/"

  • Serving Flask app 'run_app' (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
  • Running on all addresses.
    WARNING: This is a development server. Do not use it in a production deployment.
  • Running on http://172.28.0.2:8888/ (Press CTRL+C to quit)
    Current working directory: /content
    Experiment folder created at: ./static/samples
    Experiment folder created at: ./static/results
    Experiment folder created at: ./static/upload_latents
    Load stylegan from, ./checkpoint/stylegan_pretrain/stylegan2_networks_stylegan2-car-config-f.pt at res, 512
    make_mean_latent
    Load Classifier path, ./checkpoint/datasetgan_pretrain/classifier
    Setting up Perceptual loss...
    Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/checkpoints/vgg16-397923af.pth
    100% 528M/528M [00:02<00:00, 247MB/s]
    Loading model from: /content/EditGAN-Robert/lpips/weights/v0.1/vgg.pth
    ...[net-lin [vgg]] initialized
    ...Done
    0% 0/10 [00:00<?, ?it/s]/usr/local/lib/python3.8/site-packages/torch/nn/functional.py:2503: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
    warnings.warn("Default upsampling behavior when mode={} is changed "
    100% 10/10 [00:11<00:00, 1.15s/it]
    TOOL init!!
    127.0.0.1 - - [10/Apr/2022 12:45:23] "GET / HTTP/1.1" 200 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /static/loading.gif HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /static/images/car_real/0.jpg HTTP/1.1" 200 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_circle.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_square.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_diamond.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /paint-brush.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /paint-can.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /eyedropper.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /undo.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /save.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /run.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /random.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/0.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/1.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/2.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/3.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/4.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/5.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/6.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/7.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/8.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/9.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/10.jpg HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /info.png HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:26] "GET /static/images/car_real/colorize_mask/0.png HTTP/1.1" 200 -
    127.0.0.1 - - [10/Apr/2022 12:45:26] "GET /favicon.ico HTTP/1.1" 404 -
    127.0.0.1 - - [10/Apr/2022 12:45:57] "GET /undo.png HTTP/1.1" 404 -
    Current image id: 0
    0% 0/29 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1579022027550/work/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
    Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (expandTensors at /opt/conda/conda-bld/pytorch_1579022027550/work/aten/src/ATen/native/IndexingUtils.h:20)
    0% 0/29 [00:00<?, ?it/s]
    [2022-04-10 12:46:10,107] ERROR in app: Exception on /api/edit_from_mask [POST]
    Traceback (most recent call last):
    File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
    File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
    File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
    File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
    File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
    File "/usr/local/lib/python3.8/site-packages/flask_cors/decorator.py", line 128, in wrapped_function
    resp = make_response(f(*args, **kwargs))
    File "/content/EditGAN-Robert/run_app.py", line 138, in edit_from_mask
    img_out, img_seg_final, optimized_latent = tool.run_optimization_editGAN(seg_mask, curr_latent, roi)
    File "/content/EditGAN-Robert/models/EditGAN/EditGAN_tool.py", line 378, in run_optimization_editGAN
    loss.backward()
    File "/usr/local/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
    File "/usr/local/lib/python3.8/site-packages/torch/autograd/init.py", line 97, in backward
    Variable._execution_engine.run_backward(
    RuntimeError: CUDA out of memory. Tried to allocate 5.88 GiB (GPU 0; 15.90 GiB total capacity; 14.24 GiB already allocated; 231.75 MiB free; 15.03 GiB reserved in total by PyTorch)
    127.0.0.1 - - [10/Apr/2022 12:46:10] "POST /api/edit_from_mask HTTP/1.1" 500 -`

@dubtor dubtor changed the title Failure running in Colab / non-responsive UI Failure getting EditGAN to run in Colab Apr 10, 2022
@dubtor
Copy link
Author

dubtor commented Apr 10, 2022

@arieling I can invite you to the Colab if you like, even though some of the settings will be fixed to my system. Feel free to reach out via Telegram @dubtor

@udibr
Copy link

udibr commented Apr 30, 2022

@dubtor I managed to run your fork on colab using this notebook
https://colab.research.google.com/drive/14nY3p9GG-yfzMziySVqs2zZZk5ArXFiY?usp=sharing

@dubtor
Copy link
Author

dubtor commented Apr 30, 2022

Thanks @udibr for sharing! Does this colab run the full demo for you? I tried to run yours, and in my case, it is still running out of CUDA memory, same like my own previous tests with an own Colab notebook. My own version run until I clicked the "process" button on the web UI. Yours was running OOM already during the bootup of the web app. I was using the GPU version on Colab+. Have you done anything differently? Thank you!

@udibr
Copy link

udibr commented Apr 30, 2022

I occasionally do get OOM but most of the time not.
It could be its because I'm using "Clab Pro" which gives you a better priority on the GPU card being used.

I did manage to modify the tire and headlights of the car image which was fun, but I have no idea how to use the rest of the features of this App

@udibr
Copy link

udibr commented Apr 30, 2022

I just tried again and indeed got OOM I then did "Runtime->Disconnect and delete run time" and re run the notbook and it works

@udibr
Copy link

udibr commented Apr 30, 2022

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

@arieling
Copy link
Collaborator

arieling commented May 2, 2022

Maybe you want to make the editing region smaller to test first. Once you can deploy the model, the memory depends on your editing region area.

@wandrzej
Copy link

wandrzej commented May 4, 2022

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

I've tried that fix and still getting the out of memory error on a P100
Restarting the runtime also didn't help.

One difference from the previous descriptions of the problem is that I can't get the url to open anything.

@Ley-lele
Copy link

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

I've tried that fix and still getting the out of memory error on a P100 Restarting the runtime also didn't help.

One difference from the previous descriptions of the problem is that I can't get the url to open anything.

I met the same problem as yours. U may like to check your projects follow the link below
https://blog.csdn.net/qq_38677322/article/details/109696077

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants