RFC [do not merge]: Visual regression testing #1087

pepopowitz · 2022-07-26T21:38:40Z

This PR is a Request For Comment. I do not intend to merge it as-is. I provide the changes as something tangible for us to talk about as I ask some questions about their value.

What?

This PR introduces a few visual regression tests using Playwright. Each of the tests visits a specific URL and captures a screenshot of the result. Each successive time the test is run, the URL is visited, and compared to the original screenshot. If it differs, the test fails; if the screenshot matches exactly, the test passes.

Why?

I originally wrote these tests in support of #1024. As I'm updating docusaurus, I'm finding small UI differences in many versions.

Many times these are unimportant, like when the styling of the breadcrumb bar changed a little:

But sometimes they're more important and ugly, like when an update to the underlying design system resulted in bullet points showing in our footer links:

I've spent a decent amount of time attempting to find these changes manually, by clicking around the site...but am I even a developer if I don't want to automate it?

So far these tests have helped me discover some small unimportant details. My hope is that they'll help me discover anything more breaking, like the bullets in the footer, in upcoming docusaurus updates. At the very least, I intend to keep a local branch with these changes so that I can run these tests locally during updates.

Seeking Feedback: Is this PR valuable?

Some questions on which I'm soliciting opinions:

Do you think this PR represents something more valuable to our docs than me keeping a local branch?
Is this something we might want to become part of our CI checks?
- Are these tests only useful for 1% of PRs and therefore not CI-worthy?
- Are they useful for CI but in a non-blocking way?
- Are they useful but belong in a separate project/repository?
If these tests were implemented in a CI-blocking manner, where builds failed if snapshots changed from production, how would this impact those writing the docs? Would it introduce too much friction and become a massive thorn for people?
There are services like chromatic and applitools where we can more deeply integrate this kind of test with CI checks. Chromatic, for example, provides a dashboard where you can accept/reject the visual differences, which affects the red/green status of the build. Is that workflow more intriguing?
Did I capture every interesting feature of our docs with the 4 tests I wrote? Should I also test usage of MDX, videos, or anything else?
Do you think these tests would be stable enough to become part of our workflow? Are there things in our docs that would cause them to be flaky? In my local experimentation, I noticed the cookie popup failed a couple tests, because it sometimes showed in test results and sometimes didn't -- anything else like that?

I welcome any answers, small or large. And if there's anything else in regards to this PR that I didn't capture above, please speak up!

How does this RFC get resolved?

I'll keep it open for a couple weeks to acquire feedback. I'll then close it with a decision on how we move forward with these tests, if we do.

What the tests look like locally

Here's a test run where there were no changes visible:

Here's a test run where there were changes detected:

Artifacts

I'd added nn to the end of a word on one of the tested pages. When the tests failed, they created a diff image in my local directory showing the changes:

Updating snapshots

In the event that changes are detected, but they are acceptable changes, I can specify the --update-snapshots flag:

As the output indicates, this overwrites the old snapshots with new ones. I'd need to commit these updated snapshots to the repo as part of my work, if I'd introduced changes.

…infima update in docusaurus beta-19

pepopowitz · 2022-07-26T21:45:53Z

oooo super nice unintentional play on words by me, I swear it was just muscle memory:

akeller · 2022-07-26T22:54:47Z

I like that when you signed off today, you hinted at digging into something, and it didn't take me too long to figure out that that was this 😆

I see some applications to this that tie in well to something I've chatted with @christinaausley regarding landing pages. I want a distinct set of curated pages that we monitor for both analytics and content. Think of a well-crafted, artisanal experience. I would want to know when those pages change and make sure there is a "why" behind it.

Thinking more out loud here - since our changelog can get really into the weeds on things, it may be good to sum up what's changed visually. Beyond new features, this may be the most disruptive thing docs users may pick up on.

pepopowitz · 2022-07-27T14:26:03Z

I see some applications to this that tie in well to something I've chatted with @christinaausley regarding landing pages. I want a distinct set of curated pages that we monitor for both analytics and content. Think of a well-crafted, artisanal experience. I would want to know when those pages change and make sure there is a "why" behind it.

I definitely think these tests could form a basis for that kind of monitoring. I also think those tests would be based on the actual text content rather than a visual snapshot -- which we can also do with playwright tests.

pepopowitz · 2022-07-27T14:28:08Z

since our changelog can get really into the weeds on things, it may be good to sum up what's changed visually.

this is a good callout -- when I do a release, I don't do anything beyond the auto-generated notes. Those call out that the docusaurus was updated, but they don't call out how.

korthout · 2022-07-28T15:34:12Z

@pepopowitz The only thing I kinda miss is how easy (or not) it is to replace the current state with the new state. This is especially important for those writing documentation content. Could you add something about that?

pepopowitz · 2022-07-28T19:30:59Z

@korthout I've added a section at the end named "Updating snapshots". Let me know if that covers it!

pepopowitz · 2022-07-29T19:53:56Z

This isn't "visual regression" testing, but #1101 illustrates that there's probably also a case for interactive regression tests (visiting certain pages, clicking through nav). We already check links by visiting them directly, but we don't do any checking that docusaurus is interpreting our configuration correctly when it renders pages/links.

I do think that issue was a pretty rare situation and that regression tests of the navigation links might be more trouble than they are worth if they are not reliable....but if this RFC results in visual regression tests, it might be good to think about regression tests more comprehensibly at that time.

korthout · 2022-08-01T09:04:49Z

@korthout I've added a section at the end named "Updating snapshots". Let me know if that covers it!

It should be clear to everyone what they have to do to resolve the failing test.

@pepopowitz would it be possible to add a printed line to the test failure that describes how to do this snapshot update?

pepopowitz · 2022-08-02T14:28:57Z

would it be possible to add a printed line to the test failure that describes how to do this snapshot update?

@korthout worst case I think we could conditionally echo a message when the npm test script fails. Something like:

"test": "playwright test || echo 'Snapshots have changed. If the changes are acceptable, update them by running ....'"

pepopowitz · 2022-08-17T21:03:08Z

Closing this. I've added #1171 in response.

pepopowitz added 12 commits July 25, 2022 17:08

chore(deps): update docusaurus from beta-18 to beta-19

7d7edc3

fix(ui): remove bullets from footer nav, which were introduced by an …

2921be7

…infima update in docusaurus beta-19

chore(deps): update broken markdown links for latest version

e2e15be

chore(deps): update broken markdown links for 0.26

3a5e14f

chore(deps): update broken markdown links for 1.0

f997793

chore(deps): update broken markdown links for 1.1

1f3ea5e

chore(deps): update broken markdown links for 1.2

cceaf0e

chore(deps): update broken markdown links for 1.3

300f9ee

chore(deps): throw errors on broken markdown links

f0e7fce

chore(deps): update docusaurus from beta-19 to beta-20

c883f95

poc: install playwright and wright some visual snapshot tests

c260f5b

tests: first run of visual snapshot tests

87056e0

pepopowitz requested review from menski, akeller, korthout and christinaausley July 26, 2022 21:38

pepopowitz self-assigned this Jul 26, 2022

Base automatically changed from pepopowitz/1024-beta-18-to-20 to main July 28, 2022 15:22

pepopowitz added component:docs Documentation improvements, including new or updated content dx Documentation infrastructure typically handled by the Camunda DX team labels Jul 28, 2022

pepopowitz mentioned this pull request Aug 17, 2022

Add regression testing #1171

Open

pepopowitz closed this Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC [do not merge]: Visual regression testing #1087

RFC [do not merge]: Visual regression testing #1087

pepopowitz commented Jul 26, 2022 •

edited

Loading

pepopowitz commented Jul 26, 2022

akeller commented Jul 26, 2022

pepopowitz commented Jul 27, 2022

pepopowitz commented Jul 27, 2022

korthout commented Jul 28, 2022

pepopowitz commented Jul 28, 2022

pepopowitz commented Jul 29, 2022

korthout commented Aug 1, 2022

pepopowitz commented Aug 2, 2022

pepopowitz commented Aug 17, 2022

RFC [do not merge]: Visual regression testing #1087

RFC [do not merge]: Visual regression testing #1087

Conversation

pepopowitz commented Jul 26, 2022 • edited Loading

What?

Why?

Seeking Feedback: Is this PR valuable?

How does this RFC get resolved?

What the tests look like locally

Here's a test run where there were no changes visible:

Here's a test run where there were changes detected:

Artifacts

Updating snapshots

pepopowitz commented Jul 26, 2022

akeller commented Jul 26, 2022

pepopowitz commented Jul 27, 2022

pepopowitz commented Jul 27, 2022

korthout commented Jul 28, 2022

pepopowitz commented Jul 28, 2022

pepopowitz commented Jul 29, 2022

korthout commented Aug 1, 2022

pepopowitz commented Aug 2, 2022

pepopowitz commented Aug 17, 2022

pepopowitz commented Jul 26, 2022 •

edited

Loading