Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Corruption Error in Kernel _setup_linear #56

Open
ethanbar11 opened this issue Jul 20, 2022 · 2 comments
Open

Memory Corruption Error in Kernel _setup_linear #56

ethanbar11 opened this issue Jul 20, 2022 · 2 comments

Comments

@ethanbar11
Copy link

ethanbar11 commented Jul 20, 2022

Hey,
I'm trying to use the forward_state function. From time to time, I get
this error:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Jumping out of:

File "/media/data2/ethan_baron/state-spaces-improv/src/models/sequence/ss/kernel.py", line 434, in _setup_linear
    R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)

Meaning,
from this lines (433-436) in the NPLR Kernel:

        try:
            R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)
        except torch._C._LinAlgError:
            R = torch.tensor(np.linalg.solve(R.to(Q_D).cpu(), Q_D.cpu())).to(Q_D)

I changed very little this lines for debugging, for:

try:
    R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)
except:
    x1 = R.to(Q_D).cpu()
    x2 = R.to(Q_D).cpu()
    R = torch.tensor(np.linalg.solve(x1, x2)).to(Q_D)

EDIT: Removed stacktrace (was quite unhelpful and long) and edited the code to be in code snippets.

@albertfgu
Copy link
Contributor

I looked into this recently and also found the same issue, which wasn't present before. I wasn't able to figure out why. It's weird that it happens randomly.

Regardless, the implementation of "state forwarding" (README) is currently unoptimized for S4 so it is not recommended to use this. If you want this functionality, it should work with S4D. Feel free to file another issue if something comes up.

Finally, could you please edit the original issue here to be shorter, and in particular remove at least the last part of the stack trace. It might also help to put the whole thing in a code block. The last few lines are all parsed in a way that references other Issues which is confusing.

@ethanbar11
Copy link
Author

Yeah, I tried to look into it for a couple of days and didn't understand what happened.
I'm using now the S4D forward_state version and until now it works quite well.
Edited the issue, hopefully to be more readable.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants