Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API based on simulating keypresses vs. invoking discrete commands #12

Open
zcorpan opened this issue Apr 5, 2022 · 6 comments
Open
Labels
Q1 Internal classification of the proposed quarter to do the work

Comments

@zcorpan
Copy link
Member

zcorpan commented Apr 5, 2022

From the ARIA-AT automation meeting on March 14, 2022: #17 (minutes)

A screen reader automation API could allow simulating keypresses, or simulate invoking discrete screen reader commands, or both. Commands may be more robust or useful when you're writing a test directly to perform a task. Commands may be useful as they allow for recording actual user input.

  • Is perfect accuracy possible or even desirable?
  • How much of the keypress handling is even within the domain of the AT to control/simulate?
  • How do we support alternative gestures or input devices?

Related: #11

cc @cookiecrook @mcking65 @s3ththompson @aleventhal

@aleventhal
Copy link

Good Q's. I don't have all the answers, but wanted to make sure we record the use case here.

Use case: development of an input & output recording tool that can be used by expert screen reader users to develop tests, without requiring programming skills. This could vastly increase the number of tests we can create.

@zcorpan zcorpan mentioned this issue Apr 7, 2022
7 tasks
@zcorpan
Copy link
Member Author

zcorpan commented Sep 29, 2022

@cookiecrook said in our previous meeting that simulating keypresses in a way that VO acts on them is not possible on macOS for security reasons.

But followed up in email saying that for non-sandboxed, unsigned executables, it is possible with AppleScript.

So, what does this mean for the AT Driver spec? Should we not include a keypress API?

@cookiecrook
Copy link

To clarify, I mentioned Apple had no near-term plans to ship a keypress API for VO through a supported, secure means such as XCTest.

However, Michael Fairchild mentioned @ckundo’s https://github.com/AccessLint/screenreaders project which leverages System Events as the keypress driver. This requires the automation system owner to put the system into a less secure state, which may suffice for the context of ARIA-AT. Though it’s an unsupported method for automating VO, I think it could be a reasonable implementation for your proposed keypress API. HTH. Thanks.

@jugglinmike
Copy link
Contributor

jugglinmike commented Jul 18, 2024

Here and elsewhere, folks have raised security concerns about any automation API that simulates HID-device input. The discussion above considers an API built around "commands" (and elsewhere, "user intents") as a safer alternative. We currently feel such an API could be feasible if it includes a means to fill text into form fields. Here, I'll explain why the capability is necessary and propose a definition which may avoid the risks of HID-level simulation.

The problem with commands/"user intents"

An API that is limited to high-level user gestures would be unable to simulate interactions like filling in form fields.

Rejected solution: delegate to WebDriver

It might be possible to circumvent this deficiency using WebDriver's "element send keys" command (since key presses could be simulated in the browser directly), but only if the browsers were aware of the location of the ATs' virtual cursors at all time.

Unfortunately, this is not the case.

If AT Driver's use-case for "form filling" is to be facilitated by WebDriver, then AT Driver would need a mechanism for conveying the target element to WebDriver. We feel that a coupling like that would dramatically increase the complexity of AT Driver and decrease its likelihood of implementation.

Proposed solution: "send text" command

Instead, we propose a command which allows clients to specify a sequence of characters to be entered into the currently-focused form field.

While this solution has similarities to the original HID-simulation approach, its differences preclude malicious applications without impacting the desirable use-case:

  • Control characters (e.g. Alt, Command, Shift, or Meta) could be sent
  • The implementation could reject presses at its discretion (e.g. if the target form field did not belong to an accredited process such as a web browser)

@cookiecrook, you’ve represented Apple's security concerns on this issue over the past few years. Could you weigh in on whether the API I've sketched out above would pass muster?

@jugglinmike
Copy link
Contributor

The AT Driver subgroup meeting discussed this proposal today. Here is a summary of the resolution from that meeting.

@jugglinmike
Copy link
Contributor

I've submitted a patch which implements the first part of our latest design in gh-76.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Q1 Internal classification of the proposed quarter to do the work
Projects
None yet
Development

No branches or pull requests

5 participants