Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brittle builds - intermittent failures on CircleCI and no way to resolve #2994

Closed
andrewryan1906 opened this issue Dec 26, 2018 · 6 comments
Closed

Comments

@andrewryan1906
Copy link

Current behavior:

I have an Angular application that runs locally with about 100 tests. The tests run fine. On CircleCI, there are intermittent, unpredictable test failures. When reviewing the recordings on the Cypress dashboards, it registers as missing element timeouts. And... I'm stuck.

Desired behavior:

When a build fails, I'd like to be able to review the Electron console output to see if Angular outputted an error, to help get a clue of why the build is failing in CI, but not locally. Without this, I have no tools to resolve brittle builds and CI failures, and this tool becomes challenging to use.

Steps to reproduce: (app code and test code)

Can't repro reliably; depending on when I run, different tests fail.

Versions

Windows 10, Chrome (locally), CircleCI/Debian (in the cloud CI/CD process)

@andrewryan1906
Copy link
Author

andrewryan1906 commented Dec 26, 2018

Also, if anyone has any suggestions/experience with failures like these - it's driving me up the wall. With every Cypress CircleCI build, 2-4 different tests fail each time. These tests work fine locally, and when I rerun the build in CircleCI, those tests pass and 2-4 other ones fail. And there's no way to see why they fail... it looks like a timeout on an XDR PUT request.

No idea what's going on, no idea how to stabilize the build.

p.s. I have tried running locally with Electron and Chrome. CANNOT reproduce any errors.

@jennifer-shehane
Copy link
Member

Most often in cases of flaky tests, we see that there is not enough assertions surrounding the actions or XHR requests/responses necessary before moving on to the next assertion. So if there is any variation in the speed of the XHR requests response in interactive mode versus run mode, then there will start to be failures in one over the other - or flaky tests.

So, for example if you had a test to check that when you click a 'Save' button that the users page refreshes with the correct content, you may write a test like so:

cy.get('.name-input').type('Jennifer')
cy.get('form').contains('Save').click()
cy.get('#user- name).should('contain', 'Jennifer')

Each command by default will wait 4 seconds to meet it's requirements, but if the POST for the form request sometimes takes longer to respond than it takes for the "cy.get('#user- name).should('contain', 'Jennifer')" to run, then it will fail sometimes. Because of this, we recommend asserting on as many required steps as possible - this also helps later to isolate where the exact failure is when debugging on an actual bug.

So, the example above would be less flaky written like this:

cy.server()
cy.route('POST', 'users/*').as('updateUser')
cy.route('GET', 'users').as('getUsers')

cy.get('.name-input').type('Jennifer')
cy.get('form').contains('Save').click()
cy.wait('@updateUser').then((xhr) => {
expect(xhr.requestBody).to.have.property('name', 'Jennifer')
}
cy.url('contains', 'users/123')
cy.wait('@getUsers')
cy.get('#user- name).should('contain', 'Jennifer') 

Now when the last line runs, we don't have to worry about the POST or that the POST sent the right information, that the url changed and the GET's or whatever requests needed to respond - we know it's in the right state to find the content here.

Unfortunately we'll have to close this issue since no reproducible example was provided, but hopefully this points you in the right direction.

@jennifer-shehane jennifer-shehane added the stage: needs information Not enough info to reproduce the issue label Dec 27, 2018
@andrewryan1906
Copy link
Author

Jennifer,

I'm having a hard time presenting a reproducible example since it's so spotty. But I get your point... this was one of the first things I observed. I got around this by increasing the wait timeout to 15 seconds for everything, but to your point, I'd be better off using the wait to deal with cold lamda starts that cause XHR operations to take >5 seconds sometimes.

But given that I've increased the wait timeout, this shouldn't be a problem. There's literally no way a REST operation can ever take more than 10 seconds in my system. So again - I need to be able to get at that browser console info to see whats happening, because what's more likely is that Angular is throwing an exception.

Is that not possible?

@andrewryan1906
Copy link
Author

Also - can I send you the video privately to demonstrate? That might help illustrate. The timeout is 20 seconds and so I know page isn't timing out... it's an Angular error I need to be able to observe.

@jennifer-shehane
Copy link
Member

Hey @andrewryan1906, I'd suggest asking questions about using Cypress in our community chat. We also have dedicated email support as part of some of our plans.

Sorry, we wish we could spend all day answering questions! I hope you understand.

@andrewryan1906
Copy link
Author

andrewryan1906 commented Dec 28, 2018

Jennifer,

I'm a paid customer. How do I get to support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants