Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky E2E tests #8229

Closed
zutigrm opened this issue Feb 6, 2024 · 6 comments
Closed

Fix flaky E2E tests #8229

zutigrm opened this issue Feb 6, 2024 · 6 comments
Labels
P0 High priority QA: Eng Requires specialized QA by an engineer Team M Issues for Squad 2 Type: Bug Something isn't working Type: Infrastructure Engineering infrastructure & tooling

Comments

@zutigrm
Copy link
Collaborator

zutigrm commented Feb 6, 2024

Feature Description

The following E2E tests are failing with regularity in our CI test runs:

FAIL specs/api-cache.test.js (19.452 s)
● API cache › isolates client storage between sessions

FAIL specs/modules/analytics/write-scope-requests.test.js (25.275 s)
● Analytics write scope requests › prompts for additional permissions during a new Analytics web data stream creation if the user has not granted the Analytics edit scope

FAIL specs/modules/analytics/setup-gcp-no-account-no-tag.test.js (11.032 s)
● setting up the Analytics module using GCP auth with no existing account and no existing tag › displays account creation form when user has no Analytics account

FAIL specs/modules/analytics/setup-with-account-with-tag.test.js (15.663 s)
● setting up the Analytics module with an existing account and existing tag › informs about an existing tag that matches the current selected property
● setting up the Analytics module with an existing account and existing tag › does allow Analytics to be set up with an existing tag if it is a GA4 tag

Here's an example failing test run for each of the suites listed above:

https://github.com/google/site-kit-wp/actions/runs/11914120844/job/33201319873#step:11:280
https://github.com/google/site-kit-wp/actions/runs/11912971962/job/33197691525#step:11:244
https://github.com/google/site-kit-wp/actions/runs/11875312956/job/33092351004#step:11:248
https://github.com/google/site-kit-wp/actions/runs/11871934579/job/33085039750#step:11:258

These were easy to find looking through a current list of E2E test failures on develop: https://github.com/google/site-kit-wp/actions/workflows/e2e-tests.yml?query=branch%3Adevelop


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The tests listed in the Feature Description should pass every time in CI.

Implementation Brief

  • Iterate on E2E tests to resolve instabilities

Test Coverage

  • No changes expected although the issue is primarily focused on tests

QA Brief

  • No real QA needed as changes are only in E2E and the supporting infra. E2E could be re-run as a test.

Changelog entry

  • N/A
@zutigrm zutigrm added the Type: Infrastructure Engineering infrastructure & tooling label Feb 6, 2024
@techanvil techanvil changed the title E2E Test For API Cache Became Flaky On WordPress Nightly Fix flaky E2E tests Nov 20, 2024
@techanvil techanvil added P0 High priority Type: Bug Something isn't working labels Nov 20, 2024
@zutigrm zutigrm self-assigned this Dec 2, 2024
@binnieshah binnieshah added the Next Up Issues to prioritize for definition label Dec 11, 2024
@zutigrm
Copy link
Collaborator Author

zutigrm commented Dec 11, 2024

I resolved APi cache and initial write scope requests issues, but new one pops up, which is common for remaining tests - response is not a valid JSON thing. Couldn't replicate this locally in numerous tests, seems to be present only in CI.

Weirdly even when the survey endpoint is mocked and response delivered, it still fails for the same reason. I included the debug log for REST, but it never reaches the point of output what the response is, so it either does not log response that it fails on, or perhaps there is timing issue where reset happens earlier and prevent the request from executing which might cause this weird error. But couldn't get to the bottom of it.

This is the output of one of the tests, which failed, it can be seen that request to the survey timeout endpoint is sent, but not debugged/outputted:

Details

console.debug
>>> GET http://localhost:9002/wp-json/google-site-kit/v1/modules/analytics-4/data/property?propertyID=1000&_locale=user

  at observeRestRequest (config/bootstrap.js:350:11)
      at runMicrotasks (<anonymous>)

console.debug
>>> POST http://localhost:9002/wp-json/google-site-kit/v1/core/site/data/reset?_locale=user undefined

  at observeRestRequest (config/bootstrap.js:350:11)
      at runMicrotasks (<anonymous>)

console.debug
200 POST http://localhost:9002/wp-json/google-site-kit/v1/core/site/data/reset?_locale=user true

  at observeRestResponse (config/bootstrap.js:368:12)
      at runMicrotasks (<anonymous>)

console.debug
>>> POST http://localhost:9002/wp-json/google-site-kit/v1/core/site/data/reset?_locale=user undefined

  at observeRestRequest (config/bootstrap.js:350:11)
      at runMicrotasks (<anonymous>)

console.debug
200 POST http://localhost:9002/wp-json/google-site-kit/v1/core/site/data/reset?_locale=user true

  at observeRestResponse (config/bootstrap.js:368:12)
      at runMicrotasks (<anonymous>)

My attempts and fixes so far are in this PoC

Unassigning myself as I am out of ideas

@zutigrm zutigrm removed their assignment Dec 11, 2024
@binnieshah binnieshah added the Team S Issues for Squad 1 label Dec 19, 2024
@10upsimon 10upsimon self-assigned this Jan 3, 2025
@10upsimon
Copy link
Collaborator

I was also unable to replicate this locally, and am not going to be of much use attempting to identify a fix given my lack of E2E test knowledge within the codebase. Will un-assign myself.

@10upsimon 10upsimon assigned techanvil and unassigned 10upsimon and techanvil Jan 6, 2025
@binnieshah binnieshah removed the Team S Issues for Squad 1 label Jan 6, 2025
@aaemnnosttv aaemnnosttv self-assigned this Jan 6, 2025
@aaemnnosttv aaemnnosttv mentioned this issue Jan 16, 2025
19 tasks
@aaemnnosttv aaemnnosttv removed their assignment Jan 17, 2025
@aaemnnosttv aaemnnosttv removed the Next Up Issues to prioritize for definition label Jan 17, 2025
@techanvil techanvil self-assigned this Jan 17, 2025
@binnieshah binnieshah added the Team S Issues for Squad 1 label Jan 17, 2025
@techanvil techanvil assigned aaemnnosttv and unassigned techanvil Jan 17, 2025
@aaemnnosttv aaemnnosttv assigned techanvil and unassigned aaemnnosttv Jan 17, 2025
@techanvil techanvil added the QA: Eng Requires specialized QA by an engineer label Jan 18, 2025
@techanvil techanvil removed their assignment Jan 18, 2025
@binnieshah binnieshah added Team M Issues for Squad 2 and removed Team S Issues for Squad 1 labels Jan 20, 2025
@techanvil techanvil self-assigned this Jan 21, 2025
@techanvil
Copy link
Collaborator

techanvil commented Jan 21, 2025

QA ❌

Hi @aaemnnosttv, it pains me to say it as I did think/hope they were all fixed, but we are still seeing E2E failures with regularity since the PR for this issue was merged.

I've seen four of the five identified tests failing, here are some example CI runs:

I'm also seeing a couple of additional failures that weren't captured in the spec for this issue. These are failing due to Puppeteer timeouts; maybe they are generic failures or they could be specific to these test cases, it's hard to know at this point. We should probably raise a separate issue to tackle these, WDYT?

FAIL specs/modules/analytics/setup-proxy-no-account-no-tag.test.js (18.381 s)
  ● setting up the Analytics module with no existing account and no existing tag via proxy › preserves user-filled values provided and auto-submits after approving permissions

    TimeoutError: Timeout exceeded while waiting for event

      at Timeout.<anonymous> (../../node_modules/puppeteer/src/common/helper.ts:149:9)
FAIL specs/modules/tagmanager/setup.test.js (47.115 s)
  ● Tag Manager module setup › Setup with AMP active › with Secondary AMP › when validating › validates homepage AMP for non-logged-in users

    TimeoutError: Navigation timeout of 5000 ms exceeded

      at ../../node_modules/puppeteer/src/common/LifecycleWatcher.ts:204:18

@aaemnnosttv
Copy link
Collaborator

@techanvil I've opened a PR for an update to the api cache test we modified in this issue, but we'll need to iterate on this more in one or more follow up issues, which I'm happy to do.

@tofumatt tofumatt assigned tofumatt and unassigned tofumatt Jan 22, 2025
@tofumatt
Copy link
Collaborator

Moving directly to Approval as no explicit QA is needed.

@techanvil
Copy link
Collaborator

@techanvil I've opened a PR for an update to the api cache test we modified in this issue, but we'll need to iterate on this more in one or more follow up issues, which I'm happy to do.

Thanks @aaemnnosttv! Hopefully that's got the API cache test sorted out then, we'll no doubt find out soon enough if not but 🤞.

I've created four issues to follow up on the remaining tests, one per failing spec file:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 High priority QA: Eng Requires specialized QA by an engineer Team M Issues for Squad 2 Type: Bug Something isn't working Type: Infrastructure Engineering infrastructure & tooling
Projects
None yet
Development

No branches or pull requests

6 participants