Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test reproducibility + Decaffeinate #45

Merged
merged 3 commits into from
Sep 5, 2022

Conversation

confused-Techie
Copy link
Member

@confused-Techie confused-Techie commented Sep 4, 2022

While working on the tests, we had the goal to not break any more tests than already were, which were previously stated as 21 Failed tests for Package Testing, and 51 Failed tests for Editor Testing.

But trying to reproduce this seemed iffy. After seeing that none of our current open PR's matched these numbers in a meaningful way, I started some testing.

We did all testing below on the master branch, on NodeJS 16.XX.XX (16.8.0 on Linux, and 16.16.0 on Windows)

And all testing was done right after the other with no changes to the code between.

Below are the results: (For the editor only)

Linux:

  • 2165 tests, 13816 assertions, 52 failures, 10 skipped
  • 2165 tests, 12382 assertions, 50 failures, 10 skipped
  • 2165 tests, 12112 assertions, 50 failures, 10 skipped
  • 2165 tests, 12346 assertions, 50 failures, 10 skipped
  • 2165 tests, 11896 assertions, 50 failures, 10 skipped

Windows:

  • 2167 tests, 12124 assertions, 32 failures, 8 skipped
  • 2167 tests, 12731 assertions, 33 failures, 8 skipped
  • 2167 tests, 12556 assertions, 34 failures, 8 skipped
  • 2167 tests, 12556 assertions, 156 failures, 8 skipped
  • 2167 tests, 12587 assertions, 31 failures, 8 skipped
  • 2167 tests, 12044 assertions, 33 failures, 8 skipped
  • 2167 tests, 12299 assertions, 33 failures, 8 skipped

As you can see above there is a pretty significant variability to the status of these tests. In trying to find out why, there was on simple change made to the test runner itself, that got much more stable results, at least on Windows. Decaffeinate the test runner. By simply doing this, Windows tests were run consecutively varying from 33 - 34 failed tests, after over 15 runs.

With this singular change we can improve the reliability of the tests, and ideally afterwards can set out exact numbers for the amount of tests that are allowed to break, or alternatively, deny PR's until we have a passing status in all tests, to make them easy and simple to rely on.

(Although for the latter option we also would need to sort out what stops many GitHub Action runs from installing successfully.)

@confused-Techie confused-Techie mentioned this pull request Sep 5, 2022
Copy link
Contributor

@mauricioszabo mauricioszabo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we didn't break any new tests, go ahead :)

@confused-Techie
Copy link
Member Author

Assuming that we didn't break any new tests, go ahead :)

Thanks, looking at the tests, they are within the expected failure rate:

Ubuntu Editor: 53
Ubuntu Package: 22

Mac Editor: Truncated
Mac Package: 22

Windows Editor: Failed to Install
Windows Package: Failed to Install

So I'll go ahead and merge in a moment!

@confused-Techie confused-Techie merged commit d36c898 into master Sep 5, 2022
@confused-Techie confused-Techie deleted the no-brittle-tests branch September 5, 2022 21:59
@DeeDeeG
Copy link
Member

DeeDeeG commented Sep 11, 2022

Super late comment:

I think anywhere in the Atom repos, the CoffeeScript (tooling) version being used has generally been 1.x, and there was a "breaking changes" 2.x edition of CoffeeScript released at some point after.

So it may be advantageous to convert CoffeeScript to JS ("decaffeinate") with CoffeeScript 1.x tooling, rather than CoffeeScript 2.x.

If there are any issues experienced after decaffeinating with CoffeeScript 2.x, worth a try with CoffeeScript 1.x instead.

e.g. upstream Atom seems to be using the [email protected] package? I assume that's what powers Atom's ability to interpret CoffeeScript files on the fly, and I further assume that includes e.g. the test CoffeeScript files.

https://github.com/atom/atom/blob/17a31e3a3729070768f31bbce7ce9bcc09f5a2b8/package.json#L46

Edit to add: The JS code generated by CofeeScript 2.x is probably nicer and more modern? But if it breaks that is probably not worth the nicer prettier more-modern code. Functional is probably priority # 1, modernity and readability and elegant code second. This is all vaguely from memory, and I don't really use CoffeeScript or its tooling (coffeescript command) very often.

@confused-Techie
Copy link
Member Author

@DeeDeeG this is a very helpful insight. And actually points to an exact issue we had earlier. Where things broke by decaffeination. So thanks for the insight, and I'll ensure to redo this PR with CoffeeScript 1.x tooling. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants