Improve test reproducibility + Decaffeinate #45
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While working on the tests, we had the goal to not break any more tests than already were, which were previously stated as 21 Failed tests for Package Testing, and 51 Failed tests for Editor Testing.
But trying to reproduce this seemed iffy. After seeing that none of our current open PR's matched these numbers in a meaningful way, I started some testing.
We did all testing below on the
master
branch, on NodeJS 16.XX.XX (16.8.0 on Linux, and 16.16.0 on Windows)And all testing was done right after the other with no changes to the code between.
Below are the results: (For the editor only)
Linux:
Windows:
As you can see above there is a pretty significant variability to the status of these tests. In trying to find out why, there was on simple change made to the test runner itself, that got much more stable results, at least on Windows. Decaffeinate the test runner. By simply doing this, Windows tests were run consecutively varying from 33 - 34 failed tests, after over 15 runs.
With this singular change we can improve the reliability of the tests, and ideally afterwards can set out exact numbers for the amount of tests that are allowed to break, or alternatively, deny PR's until we have a passing status in all tests, to make them easy and simple to rely on.
(Although for the latter option we also would need to sort out what stops many GitHub Action runs from installing successfully.)