Memory Corruption Error in Kernel _setup_linear #56

ethanbar11 · 2022-07-20T15:26:56Z

Hey,
I'm trying to use the forward_state function. From time to time, I get
this error:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Jumping out of:

File "/media/data2/ethan_baron/state-spaces-improv/src/models/sequence/ss/kernel.py", line 434, in _setup_linear
    R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)

Meaning,
from this lines (433-436) in the NPLR Kernel:

        try:
            R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)
        except torch._C._LinAlgError:
            R = torch.tensor(np.linalg.solve(R.to(Q_D).cpu(), Q_D.cpu())).to(Q_D)

I changed very little this lines for debugging, for:

try:
    R = torch.linalg.solve(R.to(Q_D), Q_D)  # (H r N)
except:
    x1 = R.to(Q_D).cpu()
    x2 = R.to(Q_D).cpu()
    R = torch.tensor(np.linalg.solve(x1, x2)).to(Q_D)

EDIT: Removed stacktrace (was quite unhelpful and long) and edited the code to be in code snippets.

The text was updated successfully, but these errors were encountered:

albertfgu · 2022-08-09T18:15:25Z

I looked into this recently and also found the same issue, which wasn't present before. I wasn't able to figure out why. It's weird that it happens randomly.

Regardless, the implementation of "state forwarding" (README) is currently unoptimized for S4 so it is not recommended to use this. If you want this functionality, it should work with S4D. Feel free to file another issue if something comes up.

Finally, could you please edit the original issue here to be shorter, and in particular remove at least the last part of the stack trace. It might also help to put the whole thing in a code block. The last few lines are all parsed in a way that references other Issues which is confusing.

ethanbar11 · 2022-08-10T05:41:52Z

Yeah, I tried to look into it for a couple of days and didn't understand what happened.
I'm using now the S4D forward_state version and until now it works quite well.
Edited the issue, hopefully to be more readable.
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Corruption Error in Kernel _setup_linear #56

Memory Corruption Error in Kernel _setup_linear #56

ethanbar11 commented Jul 20, 2022 •

edited

Loading

albertfgu commented Aug 9, 2022

ethanbar11 commented Aug 10, 2022

Memory Corruption Error in Kernel _setup_linear #56

Memory Corruption Error in Kernel _setup_linear #56

Comments

ethanbar11 commented Jul 20, 2022 • edited Loading

albertfgu commented Aug 9, 2022

ethanbar11 commented Aug 10, 2022

ethanbar11 commented Jul 20, 2022 •

edited

Loading