[EPIC] Improve reliability of Windows CI #7617

m-allanson · 2018-08-24T18:22:32Z

Who will own this?

What Area of Responsibility does this fall into? Who will own the work, and who needs to be aware of the work?

Area of Responsibility:

Select the Area of Responsibility most impacted by this Epic

OSS

Summary

Gatsby uses a free Appveyor account to run Windows CI tests. The tests are very slow, and sometimes don't get reported at all. This means that PRs are often not tested on Windows before being merged in.

How will this impact Gatsby?

Domains

List the impacted domains here

Components

List the impacted Components here

Goals

What are the top 3 goals you want to accomplish with this epic? All goals should be specific, measurable, actionable, realistic, and timebound.

How will we know this epic is a success?

What changes must we see, or what must be created for us to know the project was a success. How will we know when the project is done? How will we measure success?

User Can Statement

User can...

Metrics to Measure Success

We will see an increase /decrease in...

Additional Description

In a few sentences, describe the current status of the epic, what we know, and what's already been done.

What are the risks to the epic?

In a few sentences, describe what high-level questions we still need to answer about the project. How could this go wrong? What are the trade-offs? Do we need to close a door to go through this one?

What questions do we still need to answer, or what resources do we need?

Is there research to be done? Are there things we don’t know? Are there documents we need access to? Is there contact info we need? Add those questions as bullet points here.

How will we complete the epic?

What are the steps involved in taking this from idea through to reality?

How else could we accomplish the same goal?

Are there other ways to accomplish the goals you listed above? How else could we do the same thing?

Next Steps

Under Pipeline select Proposed Epics (only if you are NOT the AoR owner)
Under Assignees select the AoR Owneryou listed in the Epic
Under Labels select Epic
Select Create Epic

You're all done!

The text was updated successfully, but these errors were encountered:

vtenfys · 2018-08-26T12:55:57Z

cc @m-allanson @KyleAMathews I've created #7652 which describes the two problems currently causing all Windows tests to fail. However, fixing these issues won't impact on the speed of Windows testing, so there might still be problems.

m-allanson · 2018-08-28T12:42:31Z

Great stuff, thanks @davidbailey00 👍

Here's some WIP notes on improving AppVeyor build times, which should also help with reliability.

Rolling builds

There's a "rolling builds" configuration setting for AppVeyor that will tell it to only test the newest commit from any given PR: https://www.appveyor.com/docs/build-configuration/#rolling-builds. This has to be enabled through the AppVeyor UI.

From the Appveyor docs:

"rolling builds" are great for very active OSS projects with lengthy queue. Whenever you do a new commit to the same branch OR pull request all current queued/running builds for that branch or PR are cancelled and the new one is queued. Other words, rolling builds make sure that only the most recent commit is built.

I can't see this option in the AppVeyor UI, I assume @KyleAMathews needs to give @pieh and myself additional permissions on the AppVeyor account?

Fail strategy

Appeyor's default behaviour is to run all build jobs even if one of them fails. There is a fast_finish option which will cancel all other jobs as soon as one job fails.

https://www.appveyor.com/docs/build-configuration/#failing-strategy

Concurrent jobs

Appveyor offers one concurrent job for OSS builds. Additional concurrency can be added by paying for a basic account, and then paying $25/month per additional concurrent job: https://www.appveyor.com/pricing/

Investigate caching

Cache node_modules between builds? Cache anything else?

https://www.appveyor.com/docs/build-cache/

Job matrix configuration

There is an install script that cancels most jobs in the matrix, running them only for releases or forced builds. However, this script does not run until after the repo has been cloned for each job, meaning they can take a couple of minutes to be cancelled. See example.

Can this functionality be replicated via Appveyor's config options? See config reference.

An alternative would be to temporarily drop these extra jobs, and look at adding them back in once everything else here has been investigated.

I assume these jobs don't run on every PR because they take a while - but it seems counterproductive to have tests that are only run under certain conditions. Maybe we should reduce the number of jobs in the matrix and always run them. Instead of having many jobs that are only run under certain conditions.

Other things to investigate

Enable shallow_clone: true? https://www.appveyor.com/docs/appveyor-yml/

KyleAMathews · 2018-08-29T15:14:20Z

Nice investigation @m-allanson! Stopping builds + paying for more concurrency seems like easy wins.

m-allanson · 2018-08-29T22:16:07Z

@KyleAMathews has enabled the rolling builds feature

m-allanson · 2018-09-10T23:09:20Z

This could be worth looking at: https://azure.microsoft.com/en-us/blog/announcing-azure-pipelines-with-unlimited-ci-cd-minutes-for-open-source/

jeremyepling · 2018-09-14T16:35:50Z

I'm a product manager on Azure Pipelines. Let me know if you have any questions or suggestions.

pieh · 2018-10-08T13:22:19Z

@jeremyepling Sorry for getting back to You late, we just recently started experimenting with Azure Pipelines and right now we are facing git checkout CRLF/LF problem:
in #8836 there is attempt to fix our unit tests for windows (which passes currently for appveyor CI but fails in our rudimentary Azure Pipelines setup) - snapshot don't match most likely because saved snapshot are checked out with CRLF style line endings (as opposed to LF style line endings that function we tests produce). Is there option to set Azure Pipelines checkout to use LF line endings?

I saw there is checkout configuration, but it doesn't seem to cover that part - https://docs.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=vsts#checkout

pieh · 2018-10-08T22:04:31Z

Seems like we can use .gitattributes to handle CRLF/LF issues - #8922 :)

jeremyepling · 2018-10-09T00:32:13Z

I got a successful build after I created a .gitattributes file that sets all line endings to LF for your repository. This file will override Git user settings for CRLF.

I'd like to have a way to specify gitconfig values from the Azure Pipelines YAML, but we don't have that yet.

jeremyepling · 2018-10-09T00:33:08Z

I should have refreshed the page earlier. I'm just now seeing your comment. :)

gatsbot · 2019-01-20T17:16:48Z

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

gatsbot · 2019-02-01T17:27:36Z

Hey again!

It’s been 30 since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

Thanks again for being part of the Gatsby community!

m-allanson assigned KyleAMathews Aug 24, 2018

m-allanson added the Epic label Aug 24, 2018

pieh added the 🎯 Open Source label Aug 24, 2018

gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 20, 2019

gatsbot bot closed this as completed Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Improve reliability of Windows CI #7617

[EPIC] Improve reliability of Windows CI #7617

m-allanson commented Aug 24, 2018 •

edited by Manoz

Loading

vtenfys commented Aug 26, 2018 •

edited

Loading

m-allanson commented Aug 28, 2018

KyleAMathews commented Aug 29, 2018

m-allanson commented Aug 29, 2018

m-allanson commented Sep 10, 2018

jeremyepling commented Sep 14, 2018

pieh commented Oct 8, 2018

pieh commented Oct 8, 2018

jeremyepling commented Oct 9, 2018

jeremyepling commented Oct 9, 2018

gatsbot bot commented Jan 20, 2019

gatsbot bot commented Feb 1, 2019

[EPIC] Improve reliability of Windows CI #7617

[EPIC] Improve reliability of Windows CI #7617

Comments

m-allanson commented Aug 24, 2018 • edited by Manoz Loading

Who will own this?

Summary

How will this impact Gatsby?

Goals

How will we know this epic is a success?

Additional Description

What are the risks to the epic?

What questions do we still need to answer, or what resources do we need?

How will we complete the epic?

How else could we accomplish the same goal?

Next Steps

You're all done!

vtenfys commented Aug 26, 2018 • edited Loading

m-allanson commented Aug 28, 2018

Rolling builds

Fail strategy

Concurrent jobs

Investigate caching

Job matrix configuration

Other things to investigate

KyleAMathews commented Aug 29, 2018

m-allanson commented Aug 29, 2018

m-allanson commented Sep 10, 2018

jeremyepling commented Sep 14, 2018

pieh commented Oct 8, 2018

pieh commented Oct 8, 2018

jeremyepling commented Oct 9, 2018

jeremyepling commented Oct 9, 2018

gatsbot bot commented Jan 20, 2019

gatsbot bot commented Feb 1, 2019

m-allanson commented Aug 24, 2018 •

edited by Manoz

Loading

vtenfys commented Aug 26, 2018 •

edited

Loading