-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run more tests in parallel #35
Comments
Hey @raphlinus thanks for reaching out! Exciting to see other people interested in the minutiae of WebGPU's memory model :). I took a look at your example, and the parallelism aspect is very interesting, and definitely something worth exploring. I'd be curious to see if we could integrate parts of the stressing we do in our tests with the parallelism from your tests to get higher overall throughput and maybe reveal some more interesting behaviors. So yes, open to these types of changes! There's one other difference I found in our tests; our non-atomic variant of message passing (from the dropdown menu at https://users.soe.ucsc.edu/~reeselevine/webgpu/tests/message-passing/) includes an if condition before the final non-atomic read. This is because if the two barriers don't end up synchronizing the threads, there would otherwise be a data race between the read of the "data" variable and the subsequent write. However, it's likely that this conditional might be impacting our results. I went ahead and wrote a non-atomic variant that drops the conditional (#37), and you should see an option ( |
@tyler-utah also has an AMD gpu (5500), so we're going to see if he can get any violations as well. |
Well, that's very interesting indeed. Adding an If my understanding is correct, in this nonatomic version the unconditional read is a race, which I think makes my test invalid (it would be UB in C), but I'm not 100% sure the implications. I've also tested an atomic version where both data and flag are Unless I'm missing something (very likely, this stuff is confusing), I believe that configuration of: parallel tests, atomic data and flag, barriers, unconditional load is a real live violation of the memory model, and as such likely worth tracking down. There are still some mysteries. If you look at my draft prefix sum implementation, you'll see that there the data read (which happens to be atomic though I don't think that's very important) is protected by an I have spent some time experimenting with your version of the test, including fiddling with the stress parameters a bit, but have not (yet) found a configuration that I read as a violation of the memory model. |
I think the change to your page is wrong btw. Doesn't affect my results because I mostly ran the tests in Rust, but I also patched the web version locally:
|
Ah, yes that is a bug, thanks for catching that. I agree, if you can still see violations with all the accesses made atomic, that would definitely be a memory model bug. I think the non-atomic variant would end up being undefined, but since the atomic variant is also giving you bugs, I don't think that ends up being too important. I'd like to spend some time seeing if we can reproduce your memory model bug in the browser framework, I might try and write a parallel shader like you have and see if that works. It would be great to incorporate the parallelism into the stressing framework. |
After lots of conversations and work, we now have parallel tests! #40 has the changes, but it's probably easier to just check out how they work on the website, https://gpuharbor.ucsc.edu/. |
I've been doing some of my own GPU testing, and have been able to trigger failures on a test basically identical to "message passing" on my AMD 5700 XT; my code is at: googlefonts/compute-shader-101#11
Alan Baker pointed me at your repo, and I am very happy to see systematic litmus testing. I was curious whether your litmus tests would also be able to trigger similar failures. They are not (with barriers in place; I can see failures when running purely relaxed). I've been trying to bisect exactly what causes the difference, and think it's parallelism; I was trying to run a huge number of tests in one dispatch, and you're basically running a single litmus.
I think there's a lot of value to running more tests in parallel, both to catch things which might otherwise slip through, and also to provide higher testing throughput. It's a nontrivial change to the code though. I'm filing this issue partly to ask whether you're interested in pursuing it, or open to such changes.
The text was updated successfully, but these errors were encountered: