API based on simulating keypresses vs. invoking discrete commands #12

zcorpan · 2022-04-05T11:49:13Z

From the ARIA-AT automation meeting on March 14, 2022: #17 (minutes)

A screen reader automation API could allow simulating keypresses, or simulate invoking discrete screen reader commands, or both. Commands may be more robust or useful when you're writing a test directly to perform a task. Commands may be useful as they allow for recording actual user input.

Is perfect accuracy possible or even desirable?
How much of the keypress handling is even within the domain of the AT to control/simulate?
How do we support alternative gestures or input devices?

Related: #11

cc @cookiecrook @mcking65 @s3ththompson @aleventhal

aleventhal · 2022-04-07T10:50:33Z

Good Q's. I don't have all the answers, but wanted to make sure we record the use case here.

Use case: development of an input & output recording tool that can be used by expert screen reader users to develop tests, without requiring programming skills. This could vastly increase the number of tests we can create.

zcorpan · 2022-09-29T18:12:39Z

@cookiecrook said in our previous meeting that simulating keypresses in a way that VO acts on them is not possible on macOS for security reasons.

But followed up in email saying that for non-sandboxed, unsigned executables, it is possible with AppleScript.

So, what does this mean for the AT Driver spec? Should we not include a keypress API?

cookiecrook · 2022-10-02T18:14:32Z

To clarify, I mentioned Apple had no near-term plans to ship a keypress API for VO through a supported, secure means such as XCTest.

However, Michael Fairchild mentioned @ckundo’s https://github.com/AccessLint/screenreaders project which leverages System Events as the keypress driver. This requires the automation system owner to put the system into a less secure state, which may suffice for the context of ARIA-AT. Though it’s an unsupported method for automating VO, I think it could be a reasonable implementation for your proposed keypress API. HTH. Thanks.

jugglinmike · 2024-07-18T16:13:15Z

Here and elsewhere, folks have raised security concerns about any automation API that simulates HID-device input. The discussion above considers an API built around "commands" (and elsewhere, "user intents") as a safer alternative. We currently feel such an API could be feasible if it includes a means to fill text into form fields. Here, I'll explain why the capability is necessary and propose a definition which may avoid the risks of HID-level simulation.

The problem with commands/"user intents"

An API that is limited to high-level user gestures would be unable to simulate interactions like filling in form fields.

Rejected solution: delegate to WebDriver

It might be possible to circumvent this deficiency using WebDriver's "element send keys" command (since key presses could be simulated in the browser directly), but only if the browsers were aware of the location of the ATs' virtual cursors at all time.

Unfortunately, this is not the case.

If AT Driver's use-case for "form filling" is to be facilitated by WebDriver, then AT Driver would need a mechanism for conveying the target element to WebDriver. We feel that a coupling like that would dramatically increase the complexity of AT Driver and decrease its likelihood of implementation.

Proposed solution: "send text" command

Instead, we propose a command which allows clients to specify a sequence of characters to be entered into the currently-focused form field.

While this solution has similarities to the original HID-simulation approach, its differences preclude malicious applications without impacting the desirable use-case:

Control characters (e.g. Alt, Command, Shift, or Meta) could be sent
The implementation could reject presses at its discretion (e.g. if the target form field did not belong to an accredited process such as a web browser)

@cookiecrook, you’ve represented Apple's security concerns on this issue over the past few years. Could you weigh in on whether the API I've sketched out above would pass muster?

jugglinmike · 2024-07-22T22:06:26Z

The AT Driver subgroup meeting discussed this proposal today. Here is a summary of the resolution from that meeting.

jugglinmike · 2024-09-05T22:01:12Z

I've submitted a patch which implements the first part of our latest design in gh-76.

zcorpan mentioned this issue Apr 7, 2022

AT Automation API Roadmap #15

Open

7 tasks

cookiecrook mentioned this issue Oct 23, 2023

Consider using the VoiceOver AppleScript bridge instead of disabling System Integrity Protection (SIP) #74

Open

lolaodelola added the Q1 Internal classification of the proposed quarter to do the work label Jan 29, 2024

jugglinmike mentioned this issue Sep 20, 2024

"Send text" user intent #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API based on simulating keypresses vs. invoking discrete commands #12

API based on simulating keypresses vs. invoking discrete commands #12

zcorpan commented Apr 5, 2022

aleventhal commented Apr 7, 2022

zcorpan commented Sep 29, 2022

cookiecrook commented Oct 2, 2022

jugglinmike commented Jul 18, 2024 •

edited

Loading

jugglinmike commented Jul 22, 2024

jugglinmike commented Sep 5, 2024

API based on simulating keypresses vs. invoking discrete commands #12

API based on simulating keypresses vs. invoking discrete commands #12

Comments

zcorpan commented Apr 5, 2022

aleventhal commented Apr 7, 2022

zcorpan commented Sep 29, 2022

cookiecrook commented Oct 2, 2022

jugglinmike commented Jul 18, 2024 • edited Loading

The problem with commands/"user intents"

Rejected solution: delegate to WebDriver

Proposed solution: "send text" command

jugglinmike commented Jul 22, 2024

jugglinmike commented Sep 5, 2024

jugglinmike commented Jul 18, 2024 •

edited

Loading