Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC [do not merge]: Visual regression testing #1087

Closed
wants to merge 12 commits into from

Conversation

pepopowitz
Copy link
Collaborator

@pepopowitz pepopowitz commented Jul 26, 2022

This PR is a Request For Comment. I do not intend to merge it as-is. I provide the changes as something tangible for us to talk about as I ask some questions about their value.

What?

This PR introduces a few visual regression tests using Playwright. Each of the tests visits a specific URL and captures a screenshot of the result. Each successive time the test is run, the URL is visited, and compared to the original screenshot. If it differs, the test fails; if the screenshot matches exactly, the test passes.

Why?

I originally wrote these tests in support of #1024. As I'm updating docusaurus, I'm finding small UI differences in many versions.

Many times these are unimportant, like when the styling of the breadcrumb bar changed a little:

image

But sometimes they're more important and ugly, like when an update to the underlying design system resulted in bullet points showing in our footer links:

image

I've spent a decent amount of time attempting to find these changes manually, by clicking around the site...but am I even a developer if I don't want to automate it?

So far these tests have helped me discover some small unimportant details. My hope is that they'll help me discover anything more breaking, like the bullets in the footer, in upcoming docusaurus updates. At the very least, I intend to keep a local branch with these changes so that I can run these tests locally during updates.

Seeking Feedback: Is this PR valuable?

Some questions on which I'm soliciting opinions:

  • Do you think this PR represents something more valuable to our docs than me keeping a local branch?
  • Is this something we might want to become part of our CI checks?
    • Are these tests only useful for 1% of PRs and therefore not CI-worthy?
    • Are they useful for CI but in a non-blocking way?
    • Are they useful but belong in a separate project/repository?
  • If these tests were implemented in a CI-blocking manner, where builds failed if snapshots changed from production, how would this impact those writing the docs? Would it introduce too much friction and become a massive thorn for people?
  • There are services like chromatic and applitools where we can more deeply integrate this kind of test with CI checks. Chromatic, for example, provides a dashboard where you can accept/reject the visual differences, which affects the red/green status of the build. Is that workflow more intriguing?
  • Did I capture every interesting feature of our docs with the 4 tests I wrote? Should I also test usage of MDX, videos, or anything else?
  • Do you think these tests would be stable enough to become part of our workflow? Are there things in our docs that would cause them to be flaky? In my local experimentation, I noticed the cookie popup failed a couple tests, because it sometimes showed in test results and sometimes didn't -- anything else like that?

I welcome any answers, small or large. And if there's anything else in regards to this PR that I didn't capture above, please speak up!

How does this RFC get resolved?

I'll keep it open for a couple weeks to acquire feedback. I'll then close it with a decision on how we move forward with these tests, if we do.

What the tests look like locally

Here's a test run where there were no changes visible:

image

Here's a test run where there were changes detected:

image

Artifacts

I'd added nn to the end of a word on one of the tested pages. When the tests failed, they created a diff image in my local directory showing the changes:

Including-footer-1-diff

Updating snapshots

In the event that changes are detected, but they are acceptable changes, I can specify the --update-snapshots flag:

image

As the output indicates, this overwrites the old snapshots with new ones. I'd need to commit these updated snapshots to the repo as part of my work, if I'd introduced changes.

@pepopowitz
Copy link
Collaborator Author

oooo super nice unintentional play on words by me, I swear it was just muscle memory:

image

@akeller
Copy link
Member

akeller commented Jul 26, 2022

I like that when you signed off today, you hinted at digging into something, and it didn't take me too long to figure out that that was this 😆

I see some applications to this that tie in well to something I've chatted with @christinaausley regarding landing pages. I want a distinct set of curated pages that we monitor for both analytics and content. Think of a well-crafted, artisanal experience. I would want to know when those pages change and make sure there is a "why" behind it.

Thinking more out loud here - since our changelog can get really into the weeds on things, it may be good to sum up what's changed visually. Beyond new features, this may be the most disruptive thing docs users may pick up on.

@pepopowitz
Copy link
Collaborator Author

I see some applications to this that tie in well to something I've chatted with @christinaausley regarding landing pages. I want a distinct set of curated pages that we monitor for both analytics and content. Think of a well-crafted, artisanal experience. I would want to know when those pages change and make sure there is a "why" behind it.

I definitely think these tests could form a basis for that kind of monitoring. I also think those tests would be based on the actual text content rather than a visual snapshot -- which we can also do with playwright tests.

@pepopowitz
Copy link
Collaborator Author

since our changelog can get really into the weeds on things, it may be good to sum up what's changed visually.

this is a good callout -- when I do a release, I don't do anything beyond the auto-generated notes. Those call out that the docusaurus was updated, but they don't call out how.

Base automatically changed from pepopowitz/1024-beta-18-to-20 to main July 28, 2022 15:22
@korthout
Copy link
Member

@pepopowitz The only thing I kinda miss is how easy (or not) it is to replace the current state with the new state. This is especially important for those writing documentation content. Could you add something about that?

@pepopowitz pepopowitz added component:docs Documentation improvements, including new or updated content dx Documentation infrastructure typically handled by the Camunda DX team labels Jul 28, 2022
@pepopowitz
Copy link
Collaborator Author

@korthout I've added a section at the end named "Updating snapshots". Let me know if that covers it!

@pepopowitz
Copy link
Collaborator Author

This isn't "visual regression" testing, but #1101 illustrates that there's probably also a case for interactive regression tests (visiting certain pages, clicking through nav). We already check links by visiting them directly, but we don't do any checking that docusaurus is interpreting our configuration correctly when it renders pages/links.

I do think that issue was a pretty rare situation and that regression tests of the navigation links might be more trouble than they are worth if they are not reliable....but if this RFC results in visual regression tests, it might be good to think about regression tests more comprehensibly at that time.

@korthout
Copy link
Member

korthout commented Aug 1, 2022

@korthout I've added a section at the end named "Updating snapshots". Let me know if that covers it!

It should be clear to everyone what they have to do to resolve the failing test.

@pepopowitz would it be possible to add a printed line to the test failure that describes how to do this snapshot update?

@pepopowitz
Copy link
Collaborator Author

would it be possible to add a printed line to the test failure that describes how to do this snapshot update?

@korthout worst case I think we could conditionally echo a message when the npm test script fails. Something like:

"test": "playwright test || echo 'Snapshots have changed. If the changes are acceptable, update them by running ....'"

@pepopowitz
Copy link
Collaborator Author

Closing this. I've added #1171 in response.

@pepopowitz pepopowitz closed this Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:docs Documentation improvements, including new or updated content dx Documentation infrastructure typically handled by the Camunda DX team
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants