-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archival/versioning of existing test plans #418
Comments
@jscholes, thanks for raising this issue. I wrote up the context for how we currently handle test plan versions in ARIA-AT App. This is a long read, but I wanted to document the complexity since it's germane to this issue, and since it has come up in a number of related conversations recently. But first, a quick followup question: could you expand on why bringing in test examples from sources other than the APG might pose a problem? Are you envisioning a system that supports referencing tests or test examples without first importing them into the w3c/aria-at repo? If so, how would we ensure that the examples or test plans don't change out from underneath us? Okay now the longer context: ARIA-AT App has a strict versioning system at its core. There are two primary requirements for the versioning system:
Requirement 1 is unique to ARIA-AT App (since it involves manual testing) but requirement 2 is shared among many similar web platform testing systems, including: Web Platform Tests (wpt.fyi) and Test 262 (test262.report). WPT and Test262 use Git SHAs to mark test versions, likely for the following reasons:
Most of these rationales apply to ARIA-AT App, with one critical exception (more on this in a moment):
For simplicity’s sake and consistency with WPT and Test 262, ARIA-AT App uses a Git SHA versioning system (with SHAs from the w3c/aria-at repo). Git SHAs are used extensively throughout the App, from the way we load tests (by proxying a file directly from This system has a few important shortcomings. A naive implementation of versioned test reports would have sparse data since tests change frequently while test results are recorded infrequently (given the manual nature of runs). (This is not an issue for WPT and Test262 since those projects can afford to rerun the entire test suite after a change to any individual test). Our solution to this problem has been to implement an additional optimization, termed “smart test updates” or “evergreen test versions.” This approach relies on the fact that many test repo changes are localized to a particular test directory. If a test writer fixes a typo in the combobox test plan, a new Git SHA is produced for the entire test repo, but no changes have been made to any of the other test plans. ARIA-AT App can “carry-forward” the results from the last Git SHA for all test plans except combobox, the only test plan that changed. In practice, this means that an individual test plan (or even, in principle, an individual test) only needs to be re-run when it itself changes. This optimization has an important benefit: it doesn’t preclude re-rerunning tests for some test plans more frequently than others. In other words, it anticipates a near future where automated tests and manual tests will co-exist in the same repository, with different requirements for testing frequency. Automated tests could be updated and re-run daily (for example) while manual tests in the same test repo could be updated and manually re-run only once a month. As long as the more frequent changes were isolated to the automated test directories, the App could “carry-forward” the more infrequently run results for the infrequently changing manual test plans and produce complete test reports (containing both manual and automated results) for every Git SHA. The downside to this approach is that test writers must operate under the assumption that a commit to the main Git repo for a manual test repository will potentially invalidate expensive-to-run manual test results. In other words, the preferred approach to writing and changing manual test plans is to work in longer-kept feature branches that are infrequently merged into the main branch of the repository. While the current system doesn’t invalidate all test plans when a typo is fixed in a single test plan, it does invalidate results for that one test plan. There is no way to mark a change to a test plan as inconsequential or superficial. The CG recently discussed this issue anew with a conversation about testing examples beyond aria-practices. Currently ARIA-AT App has a strongly held assumption that all tests come from a single Git repo (w3c/aria-at). The following concerns were raised:
More broadly, a few alternatives to the Git SHA were proposed:
Going forward, we will need to either extend, replace, or further optimize our versioning system. It would be great to continue to collect requirements and rationales in this thread. Looking forward to discussing this issue in more depth going forward. |
@s3ththompson Thanks for writing this up! I haven't read through in detail yet, I wanted to point you to issue #394 first. It includes more context from a discussion the CG had about treating the APG as a dependency, and raises the same questions you have. Definitely something we need to discuss further. |
@s3ththompson This is great context. One thing that stands out after a first read-through:
I understand the rationale behind this, but I don't think it fits current CG expectations or consensus. Our new way of thinking is that PR branches should be as short-lived as possible to avoid merge conflicts down the line. This is driven by the fact that at present, merge conflicts are occuring with high frequency due to some aspects of the system's design. So of course, it follows that if those could be addressed, while facilitating the review process outlined in #300, #420, #410 and others, there could be other solutions that maintained longer-lived feature branches. Currently, the approach is untenable and will become even more so as the number of test plans increases. |
@jscholes I totally agree that this is a problem! The term "preferred approach" is absolutely the wrong word here... I meant something more like "unfortunately, the current system prefers/requires an approach as follows..." |
While discussing deprecation/removal of #53 during the April 1, 2021 CG meeting, concerns were raised about how and when test plans should be deleted, and the potential impact on data and results:
#1
ever be necessary? The group seemed to agree that deleting a test plan is only appropriate when meaningful data hasn't yet been gathered from testers. If data is in the system, some sort of archival process would be more appropriate.The text was updated successfully, but these errors were encountered: