Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EGL surfaceless platform when windowing system is not found #2339

Merged
merged 10 commits into from
Feb 19, 2022

Conversation

dsseng
Copy link
Contributor

@dsseng dsseng commented Dec 31, 2021

Connections
Addresses #1551

Description
Falling back to egl::DEFAULT_DISPLAY usually results in X11 EGL platform being picked and then rejected because of unavailability on a head/GPU-less system. EGL_PLATFORM_SURFACELESS_MESA works with both radeonsi and llvmpipe/swrast when Xorg/Wayland sockets are being hidden from application. Needs to be tested in a truly GPU-less environment such as CI it is required to run in.

Testing
MESA_LOADER_DRIVER_OVERRIDE=llvmpipe RUST_BACKTRACE=1 RUST_LOG=info WAYLAND_DISPLAY= DISPLAY= WGPU_BACKEND=gl cargo run --example capture
MESA_LOADER_DRIVER_OVERRIDE=llvmpipe RUST_BACKTRACE=1 RUST_LOG=info WAYLAND_DISPLAY= DISPLAY= WGPU_BACKEND=vulkan cargo run --example capture

Tested on a desktop, but without X/Wayland accessible for test program (same conditions led to eglInitialize error). red.png is being created and has proper content.

@ maintainers please test in CI

Falling back to egl::DEFAULT_DISPLAY usually results in X11 EGL platform being picked and then rejected because of unavailability on a head/GPU-less system. EGL_PLATFORM_SURFACELESS_MESA works with both radeonsi and llvmpipe/swrast when Xorg/Wayland sockets are being hidden from application. Needs to be tested in a truly GPU-less environment such as CI it is required to run in. Addresses gfx-rs#1551

Signed-off-by: Dmitry Sharshakov <[email protected]>
Useful for testing surfaceless

Signed-off-by: Dmitry Sharshakov <[email protected]>
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Show resolved Hide resolved
wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved
dsseng added 2 commits January 1, 2022 09:04
Signed-off-by: Dmitry Sharshakov <[email protected]>
@dsseng dsseng requested a review from kvark January 1, 2022 06:17
@dsseng
Copy link
Contributor Author

dsseng commented Jan 1, 2022

If CI tests pass even without this, something definitely needs to be enabled there. Could you please enable those tests (or instruct me to do so if it's in code) to see if they work fine in CI with surfaceless.

dsseng added 2 commits January 1, 2022 09:29
Signed-off-by: Dmitry Sharshakov <[email protected]>
Signed-off-by: Dmitry Sharshakov <[email protected]>
@cwfitzgerald
Copy link
Member

cwfitzgerald commented Jan 3, 2022

https://github.com/gfx-rs/wgpu/blob/master/.github/workflows/ci.yml#L67 if you add gl back to that should being running all the basic tests on GL as well as vulkan.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

Thanks, that worked. Test failure is in skybox_etc2 unit test. Going to diagnose locally forcing llvmpipe. Only one failure, so might be some llvmpipe limit?

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Jan 3, 2022

You just need to bump the outlier count in that test: bump https://github.com/gfx-rs/wgpu/blob/master/wgpu/examples/skybox/main.rs#L502 to 110 or something, it's reporting 102 outliers.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

Okay, thank you!

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

Couldn't find the exact failure 🤷

@cwfitzgerald
Copy link
Member

I can't either, idk why it is returing -1...

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

There're some panics, but test logs just seem messy overall (some panics and backtraces) both on CI and my system (both radeonsi/RADV and llvmpipe).

@kvark kvark enabled auto-merge (squash) January 3, 2022 19:28
@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

Failure is related to skybox and texture compression again for some reason.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

test skybox_astc ... 102 outliers over max difference 90
ok
error: XDG_RUNTIME_DIR not set in the environment.
[2022-01-03T19:29:21Z ERROR wgpu_hal::gles::egl] EGL 'eglInitialize' code 0x3001: DRI2: failed to load driver
test skybox_bc1 ... TEST SKIPPED: MISSING FEATURES TEXTURE_COMPRESSION_BC
ok
error: XDG_RUNTIME_DIR not set in the environment.
[2022-01-03T19:29:21Z ERROR wgpu_hal::gles::egl] EGL 'eglInitialize' code 0x3001: DRI2: failed to load driver
test skybox_etc2 ... 102 outliers over max difference 90
ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.49s

I have previously increased max outliers to 105. Weird. Why does it say 4 passed while tests themselves fail and suite fails?

@kvark
Copy link
Member

kvark commented Jan 3, 2022

I have previously increased max outliers to 105. Weird. Why does it say 4 passed while tests themselves fail and suite fails?

the observed outlier numbers are lower, that's why it reports "ok".

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

How do I fix that test? What's the exact cause of pipeline failure?

@kvark
Copy link
Member

kvark commented Jan 3, 2022

From what I see, the failure is:

error: XDG_RUNTIME_DIR not set in the environment.
[2022-01-03T19:29:15Z ERROR wgpu_hal::gles::egl] EGL 'eglInitialize' code 0x3001: DRI2: failed to load driver
thread 'main' panicked at 'Image data mismatch! Outlier count 1242592 over limit 3. Max difference 91', wgpu/examples/skybox/../../tests/common/image.rs:134:13

So we need to find out why it fails to load the driver. Maybe some packages are missing? Or XDG_RUNTIME_DIR environment needs to be set?

@dsseng
Copy link
Contributor Author

dsseng commented Jan 3, 2022

This driver-related message have been seen earlier (on my machine with llvmpipe), but everything worked.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 4, 2022

Is there a way to extract the failing test snapshot as a build artifact to inspect it? Probably the test can be skipped since it's not directly related to EGL surface creation?

@kvark
Copy link
Member

kvark commented Jan 4, 2022

You can probably modify the CI in your fork of the repo and upload test results as artifacts to githubs.
I agree it may not be strictly necessary to fix it, given that GL was disabled on CI before this PR. So we can:

  1. either blocklist the test on GL somehow
  2. not enable GL on CI at all in this PR, leave for follow-ups

@dsseng
Copy link
Contributor Author

dsseng commented Jan 4, 2022

Probably first option seems preferable. Precisely we should disable (return passed) the test in renderer is llvmpipe

@cwfitzgerald
Copy link
Member

The weird thing is the test passes:

test skybox ... GOT EXPECTED TEST FAILURE: BACKEND
** <snip> **
ok

I'm not sure what is returning -1 from here.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 26, 2022

Any ideas on merging this? Some rules to blocklist those tests on llvmpipe/CI/headless?

@kvark
Copy link
Member

kvark commented Jan 27, 2022

Let's merge it without enabling GL testing on CI, so that the important code isn't left hanging around.

@dsseng
Copy link
Contributor Author

dsseng commented Jan 28, 2022

Maybe skip that test only?

@cwfitzgerald
Copy link
Member

The issue is I can't see any test in the set that is failing, it passes all the tests, then dies.

@cwfitzgerald
Copy link
Member

Hey, sorry for letting this languish, lets nix the CI change and merge this in.

@cwfitzgerald cwfitzgerald enabled auto-merge (squash) February 19, 2022 05:09
@cwfitzgerald cwfitzgerald merged commit 70db03d into gfx-rs:master Feb 19, 2022
@cwfitzgerald cwfitzgerald mentioned this pull request Feb 19, 2022
@cwfitzgerald
Copy link
Member

Upgrading to a new test framework in #2495 pinpointed what the issue was right away. Thank you for the contribution and sorry again for the delay!

@dsseng
Copy link
Contributor Author

dsseng commented Feb 19, 2022

No need to say sorry! Thank you for reviews and collaboration! It's great to work with your project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants