Archival/versioning of existing test plans #418

jscholes · 2021-04-07T22:49:47Z

While discussing deprecation/removal of #53 during the April 1, 2021 CG meeting, concerns were raised about how and when test plans should be deleted, and the potential impact on data and results:

Do we currently have a process for deleting a test plan and all associated data, should that need arise?
Should #1 ever be necessary? The group seemed to agree that deleting a test plan is only appropriate when meaningful data hasn't yet been gathered from testers. If data is in the system, some sort of archival process would be more appropriate.
The aria-at-app and aria-at repos currently seem very interconnected, whereby the latter depends on Git hashes from the former to act as test plan version identifiers. This seems untenable given future ideas on the project roadmap, like bringing in tests or examples from sources other than the APG. Also some concerns about Git hashes not being human-readable version strings.

The text was updated successfully, but these errors were encountered:

s3ththompson · 2021-04-08T14:24:24Z

@jscholes, thanks for raising this issue. I wrote up the context for how we currently handle test plan versions in ARIA-AT App. This is a long read, but I wanted to document the complexity since it's germane to this issue, and since it has come up in a number of related conversations recently.

But first, a quick followup question: could you expand on why bringing in test examples from sources other than the APG might pose a problem? Are you envisioning a system that supports referencing tests or test examples without first importing them into the w3c/aria-at repo? If so, how would we ensure that the examples or test plans don't change out from underneath us?

Okay now the longer context:

ARIA-AT App has a strict versioning system at its core. There are two primary requirements for the versioning system:

The versioning system should give admins the ability to manually control when and how testers begin testing new changes to an existing test plan.
The versioning system should enable the reports page to show a record of test results that is consistent (all results reflect test status at the same point in time), historical (results can be viewed for any previous point in time), and tied unambiguously to source files.

Requirement 1 is unique to ARIA-AT App (since it involves manual testing) but requirement 2 is shared among many similar web platform testing systems, including: Web Platform Tests (wpt.fyi) and Test 262 (test262.report).

WPT and Test262 use Git SHAs to mark test versions, likely for the following reasons:

A test version always corresponds unambiguously to a source code version
Any change to a test necessarily corresponds to a change in the test version (and thus all changes—whether a logic change or a typo—invalidate past results to ensure complete accuracy)
Since tests are automated, it is (relatively) easy to re-run all tests on a regular basis
A manual versioning systems is untenable at scale, given the number of tests, the frequency of changes, and the usage of automated or tool-assisted test writing.

Most of these rationales apply to ARIA-AT App, with one critical exception (more on this in a moment):

Since tests are (currently) manual, it is not possible to re-run all tests on a regular basis

For simplicity’s sake and consistency with WPT and Test 262, ARIA-AT App uses a Git SHA versioning system (with SHAs from the w3c/aria-at repo). Git SHAs are used extensively throughout the App, from the way we load tests (by proxying a file directly from github.com/w3c/aria-at/blob/{git-sha}/tests/{test-plan}/{test-file}.html) to the way we implement the test queue (admins set a single Git SHA with controls all tests that appear in the test queue).

This system has a few important shortcomings. A naive implementation of versioned test reports would have sparse data since tests change frequently while test results are recorded infrequently (given the manual nature of runs). (This is not an issue for WPT and Test262 since those projects can afford to rerun the entire test suite after a change to any individual test).

Our solution to this problem has been to implement an additional optimization, termed “smart test updates” or “evergreen test versions.” This approach relies on the fact that many test repo changes are localized to a particular test directory. If a test writer fixes a typo in the combobox test plan, a new Git SHA is produced for the entire test repo, but no changes have been made to any of the other test plans. ARIA-AT App can “carry-forward” the results from the last Git SHA for all test plans except combobox, the only test plan that changed. In practice, this means that an individual test plan (or even, in principle, an individual test) only needs to be re-run when it itself changes.

This optimization has an important benefit: it doesn’t preclude re-rerunning tests for some test plans more frequently than others. In other words, it anticipates a near future where automated tests and manual tests will co-exist in the same repository, with different requirements for testing frequency. Automated tests could be updated and re-run daily (for example) while manual tests in the same test repo could be updated and manually re-run only once a month. As long as the more frequent changes were isolated to the automated test directories, the App could “carry-forward” the more infrequently run results for the infrequently changing manual test plans and produce complete test reports (containing both manual and automated results) for every Git SHA.

The downside to this approach is that test writers must operate under the assumption that a commit to the main Git repo for a manual test repository will potentially invalidate expensive-to-run manual test results. In other words, the preferred approach to writing and changing manual test plans is to work in longer-kept feature branches that are infrequently merged into the main branch of the repository. While the current system doesn’t invalidate all test plans when a typo is fixed in a single test plan, it does invalidate results for that one test plan. There is no way to mark a change to a test plan as inconsequential or superficial.

The CG recently discussed this issue anew with a conversation about testing examples beyond aria-practices. Currently ARIA-AT App has a strongly held assumption that all tests come from a single Git repo (w3c/aria-at). The following concerns were raised:

How can we include test examples from other sources beyond APG?
What are the pros and cons to importing all external test examples into a single repository?

More broadly, a few alternatives to the Git SHA were proposed:

A manually-updated version string in the metadata of the test plan itself
A dependency management system like npm or cargo that potentially supports multiple upstream sources of tests with their own independent versioning system

Going forward, we will need to either extend, replace, or further optimize our versioning system. It would be great to continue to collect requirements and rationales in this thread.

Looking forward to discussing this issue in more depth going forward.

jscholes · 2021-04-08T14:52:28Z

@s3ththompson Thanks for writing this up! I haven't read through in detail yet, I wanted to point you to issue #394 first. It includes more context from a discussion the CG had about treating the APG as a dependency, and raises the same questions you have. Definitely something we need to discuss further.

jscholes · 2021-04-08T15:06:40Z

@s3ththompson This is great context. One thing that stands out after a first read-through:

In other words, the preferred approach to writing and changing manual test plans is to work in longer-kept feature branches that are infrequently merged into the main branch of the repository.

I understand the rationale behind this, but I don't think it fits current CG expectations or consensus. Our new way of thinking is that PR branches should be as short-lived as possible to avoid merge conflicts down the line.

This is driven by the fact that at present, merge conflicts are occuring with high frequency due to some aspects of the system's design. So of course, it follows that if those could be addressed, while facilitating the review process outlined in #300, #420, #410 and others, there could be other solutions that maintained longer-lived feature branches. Currently, the approach is untenable and will become even more so as the number of test plans increases.

s3ththompson · 2021-04-08T15:18:01Z

@jscholes I totally agree that this is a problem! The term "preferred approach" is absolutely the wrong word here... I meant something more like "unfortunately, the current system prefers/requires an approach as follows..."

mcking65 added app and removed tests About assistive technology tests test-runner Agenda+App Development For discussion during the next teleconference related to development of the ARIA-AT App labels May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archival/versioning of existing test plans #418

Archival/versioning of existing test plans #418

jscholes commented Apr 7, 2021 •

edited

Loading

s3ththompson commented Apr 8, 2021 •

edited

Loading

jscholes commented Apr 8, 2021

jscholes commented Apr 8, 2021

s3ththompson commented Apr 8, 2021

Archival/versioning of existing test plans #418

Archival/versioning of existing test plans #418

Comments

jscholes commented Apr 7, 2021 • edited Loading

s3ththompson commented Apr 8, 2021 • edited Loading

jscholes commented Apr 8, 2021

jscholes commented Apr 8, 2021

s3ththompson commented Apr 8, 2021

jscholes commented Apr 7, 2021 •

edited

Loading

s3ththompson commented Apr 8, 2021 •

edited

Loading