-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dreambooth doesn't train on 8GB #807
Comments
It seems that it's failing at pinning the allocated CPU memory. I'm not sure what the issue is but this doesn't seem to be an issue that could be fixed in diffusers. Do you have multiple GPUs? That could limit the maximum pinned memory available: https://forums.developer.nvidia.com/t/max-amount-of-host-pinned-memory-available-for-allocation/56053/7 This pytorch code should do the same thing. It allocates 16 GB of CPU memory and tries to pin it: import torch
cpu = torch.device('cpu')
alloc_size = 16e9
print('Allocating')
x = torch.zeros(int(alloc_size / 8),
dtype=torch.float32,
device=cpu)
print('Pinning')
x = x.pin_memory()
print('Accessing')
m = torch.mean(x)
assert m == 0
print('Done') |
Check your WSL config file, it might have your max memory at a much lower value. They are .wslconfig and wsl.conf See docs here, https://learn.microsoft.com/en-us/windows/wsl/wsl-config In powershell you can do
you can change the memory to allow whatever you want but I think WSL2 might cap it to some percentage of system ram. Edit: Hmm your output is,
So 7.68GB/16.3% = 47 GB of CPU VM total, thus 39GB should be available. The MA, Max_MA, CA, Max_CA are
some memory profiling suggestions here |
Sorry forgot to mention I already edited the WSL config to allow 48GB of RAM, keep in mind this also didn't work on native linux. |
Yeah was editing my reply after I realized your logs suggested you had 48 GB available. |
Yep this also fails, I do have an integrated intel gpu that's turned off in the bios. |
I'd try pinning various sizes, say in 1 GB increments, or slightly faster but more complicated a binary search (keep halving till it succeeds, then increment by half between the success and last fail, repeat up/down till you get close enough). That said this seems like a pytorch related bug perhaps, so might post it in their tracker. Hmm might also relate to WSL, |
looks like it fails after 2.147481e9 |
Ah here we go, it is (probably) an NVIDIA driver related limitation under WSL.
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations-for-linux-cuda-apps |
yes but unfortunately it happens under native linux as well (ubuntu), although I did see reports of user having it working sometimes under wsl |
Could you try running it under native linux and give the debug output? Perhaps there is a different error/cause for it failing there? |
I'll have to reinstall it and report back |
You could also try wsl --update potentially if your wsl is older might have different pinning than current. Will check my WSL and see what it fails at for pinning. Edit Tried it here, confirmed fail after more than 2.147481e9 in WSL. Also it is the total pinned that matters, tried allocating various smaller pins, and once more than the limit was reached it failed. |
Having the exact same problem on a 2070 SUPER 8GB, 64GB RAM, 3950x, Windows 10 WSL2 OP here apparently has it working under WSL on a 2070 non super, https://www.reddit.com/r/StableDiffusion/comments/xzbc2h/guide_for_dreambooth_with_8gb_vram_under_windows/?sort=new |
note you can pass a deepspeed config file and set pin memory to false, this will slow things down but might allow it to work. See this config that uses NVME offload, you can just replace NVME with CPU. |
For reference, I can report that with my RTX3080 10GB and 32GB of RAM it works correctly on Linux. Note:
|
I'll try it again later tonight; from my previous tests it had the same issues but we'll see. seeing as some folks have it working in WSL it's bizzare |
the config I linked earlier wasn't parsable by pythons json, here is one that parses, don't know if it is set up correctly (most of you will want to change from "nvme" to "cpu" - and probably delete most of the other parameters). note had to make it .txt since .json isn't allowed for attachments. |
this issue very interesting |
Pretty sure it it the memory pinning is limited to 2 GB, please do
and use the following config file (change the .txt to .json) with the file in the local directory then run things in the standard way and I think you guys will be sorted out. |
ty! i will test it if i go home |
|
use code:
Did I make a mistake?
|
also make sure you generate your 200 class images before you try and run accelerate. you probably didn't make a mistake - my hope was just optimistic :) |
My WSL not work to before make class image |
I had exactly the same issue with my RTX 3070 and 32GB RAM, Ubuntu 22.04 running on WSL2 in Windows 11. However I just solved it by updating to Windows 22H2 (I had 21H2 previously) and afterwards updating WSL with Maybe it helps you too. |
ok! let`s try this |
I upgrade my Windows 10 21H2 WSL2 > Windows 11 22H2 WSL2 Now after this Work OOM(With out ds_config) if use ds_config same error Code
|
Appearently I had already generated class images on earlier tries. When I try to generate new ones I get the same problem as you. If I generate the class images with some other stable diffusion instance and put them into the class folder everything works (as there are no more class images to be generated). Maybe this could be a workaround. I currently haven't checked it any further. Edit: |
To reduce VRAM usage while generating class images, try to use |
Perfect! It`s Work Generating class images! with use 6GB VRAM and 11GB RAM To Win11 WSL2 Now Train Use 7.7 Vram and 4.6 shared memory and it work 5~6?s/it but
I don't know [deepspeed] OVERFLOW! message is okay Code Out
|
what did you change exactly? I already have class images made and using the ds_config and it still won't work |
when you do wsl --update what kernel version does it say you have? |
Yes, it's okay. For me, it repeats about 10 times at the start and then it might rarely occur again later. I think that this message means that DeepSpeed is adjusting settings to fit the model into currently available VRAM. |
I hope it helps you |
OK I can confirm updating to the preview windows release (22H2) and doing wsl --update got me past the pinning, only the overflow msg keeps on popping |
before used window10 21H2 wsl2? P.S |
Updated to 22H2 and can now pin large amounts of memory as well. |
cc ing @patil-suraj here for dreambooth |
Did anyone of you ever get it solved to work for Windows 10? I can't pin large amounts of ram, and I have all windows updates applied and run I'm reluctant to update to Windows 11 unless necessary so just curious if this is the current solution. |
It won't work on Windows 10, It didn't work on Windows 11 till a week ago with the latest update, and Windows 10 ceased getting any updates. To work apparently required support in the OS to enable larger memory pinning in WSL. |
I won`t upgrade My window10 like you, but I upgrade WIN11 22H2 for this Work |
@patil-suraj re-pinging you here - could you take a look? :-) |
Thanks for the issue Looks like some of you already got it working! I'm not on expert of deepspeed, but I can run it fine on linux machines. So not sure what the issue is. Also note that, when using prior preservation, you can also generate prior images beforehand or grab them from internet and just specify the path to that dir. This way, we won't have mess with |
Windows 10 22H2 update has been released (yesterday I think). However this sadly does not solve the issue for memory pinning past 2GB in WSL2. I guess Windows 11 is the only way to go for this to work. |
Thanks everyone 🎉 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Sorry, just to be sure is this issue closed or was there still an open problem/bug? |
I still cannot run it on Windows 10 22H2, so I presume the only way would be Windows 11 although reddit users on Windows 11 also have issues... |
We really need to get better support for Windows! |
Did you test to see if the 22H2 on Windows 10 increased the amount of memory pinning? If the update didn't do so, then it still won't work on Windows 10. Try this test mentioned above,
|
Describe the bug
Per the example featured in the repo, it goes OOM when DeepSpeed is loading the optimizer, tested on a 3080 10GB + 64GB RAM in WSL2 and native Linux.
Reproduction
Follow the pastebin for setup purposes (on WSL2), or just try it yourself https://pastebin.com/0NHA5YTP
Logs
System Info
3080 10GB + 64GB RAM, WSL2 and Linux
The text was updated successfully, but these errors were encountered: