-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S4D Memory Requirements #51
Comments
The implementation of S4D originally uploaded to this repo did not have a custom kernel for the Vandermonde multiplication. Materializing the matrix uses S4D was originally meant to be pedagogical whereas S4 usually requires less tuning out of the box so I didn't implement the more efficient kernel S4D kernel at first. I now have a version using the |
Ah, that makes sense, thanks! I'd love to test it, see what kind of effect it has for me, and report back :) |
The new standalone file is here. This one includes all options for all models; for example, you would pass in The measures for S4D are a little different: they are I'm still thinking about what else to release - while this file has the full model with every option, I was thinking about releasing an intermediate file similar to the current In the next two days I will be working on double checking and releasing the original Sashimi+Diffwave repo, so stay tuned! |
Awesome, I'll give it a try. So just to make sure that I understood everything correctly: to do a drop-in-replacement of S4 -> S4D, I'd just take the S4 module and set Regarding your other question: Right now, I'm trying to apply a variant of SaShiMi trained in a diffusion context for my research. As I'm training on longer sequences than with Speech Commands, I run into memory limitations quickly (> 40GB of memory usage for a forward pass on one GPU with a batch size of 1/GPU), so I'm particularly interested in any additional efficiency I can get additionally to the more common efficiency optimizations. |
Trying it out in practice currently causes the pykeops implementation to raise an exception for me:
The same code was working with the previous S4D module, and I'm simply instanciating the S4 modules as |
Hmm, I went to branch
then ran Can you try this? What version of torch and pykeops are you on? |
Yes, I can confirm that this works for me, too. I'm on |
Those are my versions too. Is it possible that the error is instantiating the wrong version of the module? |
You mean that I'm using a version other than that from the v3 branch of the repo for the model that errors out? I don't see how this could happen in my case. |
Ok, I'm not sure what the difference is between the instantiation that errors and the one that doesn't. They are being instantiated with the same call right? |
Yes, the call is the same, except for the value for |
Can you specify what value you're using then? |
It's variable, different values from [64, 128, 256], as in the default small version of SaShiMi. |
Also, you can try the latest version You can also try to follow their instructions to clear the cache here: https://www.kernel-operations.io/keops/python/installation.html#part-checkpython |
Okay, this is really weird - looking more deeply into this, it seems like I randomly get this kind of error (seems to change sometimes). I don't know what this is caused by, but it seems to be dependent on the system I train on, so it's most probably not caused by you. Please excuse the commotion - this seems like the error is likely not caused on your side of the code. I can confirm now though, that I got your pykeops implementation to work on one machine in both a single- and dual-gpu setup. I should be able to report some results on the performance impact from the new version tomorrow, provided that I get it to work at full scale. |
Ok, feel free to file a separate issue as you narrow this down. This seems like quite a strange bug. Also just to confirm, does this happen in the pykeops version of S4? If you pass in |
So, first of all the promised results regarding the performance impact. They are from my analysis of my full model that I've done anyways, so they're not isolating S4D, but they should give a good lower bound on the improvements at least, incase it's interesting to you. |
Did you actually test training the current version of the S4 module in practice btw? Going back from S4D to S4 (and using default settings instead of I condensed it down to this code to reliably trigger this issue on my end (adding it to the end of if __name__ == '__main__':
torch.manual_seed(42)
device = 'cuda' # 'cpu'
device = torch.device(device)
# model = S4(256, bidirectional=True, mode='diag', measure='diag-lin').to(device) # works
# model = S4(256, bidirectional=True, mode='nplr', keops=True).to(device) # works
model = S4(256, bidirectional=True, mode='nplr').to(device) # torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0
def f(x):
return model(x)[0]
print(model)
torch.autograd.gradcheck(f, (torch.ones((1, 256, 2**6), requires_grad=True, device=device),)) If this is something you want to investigate, I can also create a separate issue, it just came up during the investigation of this problem, so I thought I'd post it here first. |
this is really great to know! that's even a bit better than I expected
This works fine for me. I ran my standard testing command |
If the code works for you with One more thing: You can try upgrading to |
Okay, I was not aware that I'm expected to reinstall the cauchy extension for every single machine. I've been using a virtual environment shared over multiple systems (same GPU setup), which has worked fine before. Creating a new virtual environment and reinstalling the Cauchy extension seems to have fixed the immediate problem, thanks! Maybe adding a quick hint about it in one of the READMEs would be helpful :) |
Curious how things turned out - did you run into any other issues? Did you ever figure out what was going on with the multiGPU issue? |
So, in general, reinstalling the Cauchy extension seems to have resolved most of the issues I was facing. Regarding my multi-GPU problems, PyKeOps seems to have some issues when multiprocessing is used and the filesystem where its cache is is not particularly fast, which I could hotfix by removing its ability to cache builds. I assume that this problem might also be responsible for some other issues I encountered, as it seems to both randomly crash some processes and cause unexpected behaviour in others. I haven't investigated it enough to open an issue there yet though. Apart from that, I have recently encountered some NaN gradients again when using the S4 layer with nplr mode and default measure on the most recent v3 build when keops=True is specified that don't occur with any other S4 parameters I use. I haven't had the time to further look into this yet though, and I haven't checked whether I can reliably reproduce this either so far. If I can nail it down somewhat, I'll open another issue. |
Ah yes I remember issues with pykeops 1.5 on multi-gpu and had to add some hacks with the cache folder. pykeops 2.1 is supposed to resolve some of these problems by avoiding the cache entirely; it has a much faster compilation time so it just doesn't cache the kernels. So does your NaN gradient issue only occurs with |
Yes, I experienced the same with 1.5. But 2.1 still had similar issues for me with its cache folder (there's one at My issues with the gradients only occur with |
Good to know, thanks for the report! File an issue if you dig into it more |
Hey, I wanted to give S4D a quick try in my research as a drop-in replacement of S4 (which, as far as I gathered, should be a good way to start), but I'm running into some hard memory limitations. I'm trying to train the DiffWave version of SaShiMi as a first experiment, but the memory requirements seem to increase significantly when replacing S4 with an equivalent S4D layer (with default settings), causing the model to go OOM in my case actually (so I don't have any precise measurements, but it's a 20% increase in overall memory consumption at least. I use the parameters as discussed in #46. Is this something you'd expect?
The text was updated successfully, but these errors were encountered: