-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which browser / AT combinations to test #116
Comments
Feedback from Google's web accessibility team (#111) on this
cc @aleventhal FWIW, I agree with this. It seems more useful to test two separate browser engines, than testing two Chromium-based browsers. |
I'd also like to get confirmation from Apple that we should be testing VoiceOver+Chrome. According to the WebAIM survey, that combination is at 3% usage, and it is fairly widely known that combination does not currently work well. |
@cookiecrook, can you comment on this? |
Which version of each screen reader will we be using? Also, I just want to confirm that we will be testing with the Chromium rather than the EdgeHTML build of the Edge browser. I think, generally, that we should also specify the version of each browser to be used. I apologize if this is listed elsewhere, but I'm just starting to engage with the group and am trying to get up to speed. |
I think we want to test whatever is the latest "stable" version at the time of testing. We should make a decision to that effect and document it, though. We also need to resolve this issue soon. |
Apple doesn't have an official stance on which combinations get tested for the ARIA-AT project. But personally, I would agree that testing WebKit/Safari+VoiceOver is higher priority. Whether you should test more depends on your time availability and testing capacity. Sorry if that's not a very satisfying answer. |
We discussed this in today's meeting. We focused on browser version first asking what happens if the tester is using a managed browser that auto-updates to latest stable version. There was a fair amount of discussion of VMs and other ways of ensuring the precise browser build being used does not change. Thinking a bit more, and going back to first principles and goals, I wonder if controlling the browser version so precisely is essential. It certainly adds substantial complexity, especially for testers. An alternative to precise requirements for browser version could be requiring a minimum browser level and build type (e.g., stable, beta, nightly) within a given test cycle. For example, a test cycle could require Chrome Stable, version 80+. That way, if Chrome 81 becomes the stable version during the test cycle, it would be considered acceptable. I think we should compare the ramifications of requiring a precise browser version to requiring a minimum browser version within a given test cycle. I think this is easiest to do by talking about some specific scenarios. Scenario Conditions
Scenario 1 - browser is updated mid-cycle without effecting test results
Are there any problems with this scenario? As we answer this, we need to keep in mind that the purpose of the project is to improve assistive technology support for ARIA. Identifying and resolving problems with browser dependencies is serendipitous. When there are meaningful differences in support among browsers, the resolution falls to the ARIA working group, not the ARIA-AT project. Whereas, resolving deleterious differences among assistive technologies falls within the scope of ARIA-at. The end result of scenario 1 is that readers of the APG will see that the latest testing of checkbox with Chrom is with version 80 and the latest results of testing grid with Chrome are with version 81. I don't think this is a problem. Think down the road just a bit when there are 75 test plans run with 10 assistive technologies across 6 browsers. Until is possible to automate all regression testing of assistive technology support, it is likely such testing will be spread across several months. It may be split into several different test cycles. It would be impractical to promise that every ARIA pattern will be tested with the same combination of technologies. Also consider that the support for ARIA in leading browsers is relatively mature and robust. Obviously, the internals are complex and regressions do occur. Generally, though, browser issues should be uncommon. The question is whether browser bugs or regressions have to interfere with ARIA-AT. To answer that, let's consider scenario 2. Scenario 2 - Browser is updated mid-cycle and test results conflict
Results are:
Are there any problems here? If so, how could they be handled? For combobox and menubar, the situation is just like scenario 1. Each passed with T1 and T2 using the same AT/Browser combinations. Combobox will be reported with a browser version of 80 and menubar will be reported with a browser version of 81 -- no problem. For checkbox, the results are identical, regardless of browser version. It would make sense to report the final data with the later browser version. The likelihood that T1 results would be different if re-run with Chrome 81 is too close to 0 for concern. If we wanted to be ultra conservative, T1 could be asked to re-run the test with Chrome 81, but I think that is entirely unnecessary. The question is what to do about grid. The system will show that there are conflicting raw results. The ARIA-AT system will require the conflict to be resolved before the raw results can be marked draft. There are several possible causes of the differences for grid between the T1 and T2 results:
Causes 1 and 2 are already addressed by our process. So, the change in browser version is a non-issue. Once corrected, the result is the same as for checkbox; we could report the results using the later browser version. The resolution to cause 3 can also be addressed by our process. When T1 ran the grid test and was notified of differing results, that would have triggered the process for resolving result differences. Once the testers are satisfied they both interpret results the same way, they would land on the different output as the cause. This would trigger T2 to re-run grid with Chrome 81, and the difference in results would be resolved and the results would be reported with 81 as the browser version. This could also result in one of the testers raising a Chromium bug. Since there is a failed assertion, and it is known that the browser is the cause, I think we need a way of tracking that in the report so that assistive technologies are not seen as failing to support the expectation declared by the failed assertion. We currently do not have that in our model. However, that issue is separate from the browser version issue. We can analyze more scenarios. But, my initial thought about using a minimum browser version requirement is that it is adequate and could reduce complexity. There are a couple of AT/browser combinations that are tied to the operating system version -- VoiceOver with Safari on macOS and iOS. With these combinations, it is probably best that T1 and T2 sync up so that they run the same plans within the same time frame to reduce the probability of a forced upgrade getting the way of having matching results. Outside of that, it is easy for users to control the version of the AT. And in fact, many support having multiple versions on the system at the same time. So, I think we should consider having test cycles specify minimum versions and that we should work to ensure consistency across a cycle. However, if a forced upgrade interferes with consistency, we can manage that at the level of individual test runs. That might lead to some test runs within a cycle being reported with a slightly newer version of browser, and in rare cases AT, but I don't think that will have a negative impact on the value of any of the results. |
On the practical side, if we run tests manually on testers' own systems, I agree that it's problematic to require a specific version of software that normally auto-upgrades, but we can require a minimum version and a specific release channel ("stable" vs "beta", etc). However, it's another possible cause of differences between results, which could take more time to resolve. The final reports would also be less clear about what they apply to, if different tests were run in different versions. I think it's OK, but not ideal in the longer term. Recording the actual browser version while running the test is technically possible, and seems helpful to have when resolving differences in test results. I think we haven't planned for this in the runner, though. cc @s3ththompson @spectranaut @evmiguel |
I'll echo that. I like the idea of specifying a minimum version, but per my previous testing, it is fairly common to find browser bugs that affect AT support. I think it is okay to move forward with the minimum version strategy. We can revisit later if it turns out to be too much overhead or if it adds too much complexity to the project. |
I wonder AssistivLabs would be willing to donate access to their virtual environments for screen reader testing. I had an interesting conversation with Weston Thayer from that company a while back about setting up accessible RDP sessions a while back. I could reach out to him if folks think that might be a useful strategy. |
Thanks, @robfentress. We have an email thread with @WestonThayer about this. |
Is this really true? I thought Microsoft Edge and Chrome used different accessibility APIs, even though they both run on Chromium. Wouldn't this be expected to cause different results in some circumstances? |
cc @smhigley for the above question |
Update: NVDA latest version is now 2020.1. We should use this for the pilot test. The versions for browsers is minimal version -- auto-updates to something later is ok. The version for ATs should be the exact version listed above. |
Currently, the plan is to run tests in:
From #111, there's a suggestion from Vispero to test JAWS / Chromium-based Edge instead of JAWS / Firefox.
@JAWS-test wrote
Originally posted by @JAWS-test in #111 (comment)
The text was updated successfully, but these errors were encountered: