-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Dev WF by making PR results actionable #5963
Comments
@rokonec - please work with @ChadNedzlek to refine the Phase 1 business priorities into more detailed requirements so we can came up with a design proposal. |
Changed title to give "build" it's own epic. Tests and builds are very different beasts when it comes to tackling them, because the quantity and quality of data is vastly different. |
After our v-team discussion yesterday where it proved difficult to make forward progress, I'll give a hand to see if the discussion can be "jump started". What I'll present should be thought of as a "starter fluid", not the actual "fuel". Meaning.....the right approach is probably still "out there" - at least to a degree. Perhaps this proposed approach can be applied the the data we have to see how it does (or doesn't) bring value. With that said, here are my "starter" thoughts: Action:
Action:
Action:
|
There are some interesting investigations we can take that can help distill this starter fluid. :-). Like what does it look like to run a test on the same machine... a different machine... how can we report that in a useful way to Azure Pipelines/something else so that we have the data we need to "do the right thing". I have a feeling that maybe for non-PR builds, we run enough of them that maybe just always looking at the last N builds might be statistically good enough without having to retry a test within a single build (sort of treating the N+1 as, more or less, an implicit retry of the N build). There were 6000 executions of every test in the last two weeks for runtime, for example... that's a ton of data that a 6001st test probably won't add much to. :-) That might save resources/complication, and help with reporting, since Azure Pipelines has some pretty good reporting around the analytics for a branch. When it gets down to the wire of shipping, maybe that means we need to investigate 2-3 random failures for the build we want to ship... but we'd probably want to do that anyway, even if we deemed them "flaky and irrelevant". |
I like the thinking Chad. :) Let's chat again and see what the right next steps are. Cheers |
Makes sense for non-PR builds, to compare against previous builds in the same branch. For PR builds, the mechanism to identify whether a test is chronically failing or is it flaky would require some sort of retry and bubbling up that info for reporting. We need to ensure the reporting structure can accomodate both, or way to demarcate the PR vs Non-PR if that is more convenient. |
For tests that are consistently failing need to be looked at and either turn the test off or mark it as flaky so we can skip it. Just blindly skipping a chronically failing test would mean sweeping it under the rug :) |
I'm not sure where the epic about getting everyone on a shared testing infrastructure... but any "retry" logic we write won't help anyone until that epic is complete. Right now, everyone has implemented their own test execution framework, so any work would have to be hand written into every single repository, which isn't a good use of our time (since it would all get deleted when that other epic started up anyway). We can make it work in arcade, assuming that will be the template for other teams test runs, since it's supposed to be the shared infrastructure place. Or maybe we need to bump that other epic up a bit so that we've got a shared execution place that every repo uses that we can put the retry stuff in. |
Found it. It was epic #5132. But that lost the "centralize the testing infrastructure" part of it in the title, and I'm not sure if it's focused on that or if we need another epic? Or to do that here? |
WE'RE DONE!! Closing!! |
Motivation and Business Impact
The north star of this epic is to improve the developer workflow by focusing on making the PR results accurate and actionable. 'Red' should mean that the dev can (and should) fix something, and 'Green' should mean that the change is good.
Today, it can be tough to figure out what the actual root problem is, compounded by periodic infrastructure outages and/or transient issues that are outside of the devs control. When "bad actor" tests are added, it becomes clear why we as devs are frustrated with seemingly "never" getting a green PR.
Jared Parsons wrote a great doc that much of this thinking came from: Resiliency in our infrastructure.docx
To follow the conversations, check out our V-Team's Teams Channel which is where you should be able to find most of the context.
Business Objectives
Improve MSPoll values around engineering, specifically engr107 and engr108(survey values no longer available)Functional Deliverables
Stretch goal: When a persistent outage is resolved, PRs are updatedDeliverables for Engineering
Metrics for Success
Where applicable, metrics should be sliceable by repo.
User Feedback
Usage
One-Pagers:
Notes
Networking notes on keeping CI failures low
Milestones
Setup: Migrate Helix scripts out of Arcade and into Helix. Deliverable: Customer functionality shouldn't change.Work was not necessaryV4??: "Notifications"?? Comments on PR? Teams notifications?no truncation
band-aid)Reporting for Repo OwnersReplaced by Known Issues epicAchievements
Milestone 1
Recently Triaged Issues
All issues in this section should be triaged by the v-team into one of their business objectives or features.
https://github.com/dotnet/core-eng/issues/11576
https://github.com/dotnet/core-eng/issues/11578
Define quarantine process for a test assembly #6662
Define quarantine process for a build leg or pipeline #6663
Get quarantining sorted #6964
https://github.com/dotnet/core-eng/issues/12221
https://github.com/dotnet/core-eng/issues/12660
https://github.com/dotnet/core-eng/issues/12683
https://github.com/dotnet/core-eng/issues/12684
https://github.com/dotnet/core-eng/issues/12692
https://github.com/dotnet/core-eng/issues/12696
https://github.com/dotnet/core-eng/issues/12702
https://github.com/dotnet/core-eng/issues/12833
https://github.com/dotnet/core-eng/issues/12906
https://github.com/dotnet/core-eng/issues/12940
https://github.com/dotnet/core-eng/issues/12972
https://github.com/dotnet/core-eng/issues/13116
https://github.com/dotnet/core-eng/issues/13142
https://github.com/dotnet/core-eng/issues/13145
Test Configuration name for docker based queues in helix is hard to read #7424
https://github.com/dotnet/core-eng/issues/13189
https://github.com/dotnet/core-eng/issues/13246
https://github.com/dotnet/core-eng/issues/13262
https://github.com/dotnet/core-eng/issues/13266
Build Analysis UX tweaks #7475
https://github.com/dotnet/core-eng/issues/13314
https://github.com/dotnet/core-eng/issues/13330
https://github.com/dotnet/core-eng/issues/13333
https://github.com/dotnet/core-eng/issues/13363
https://github.com/dotnet/core-eng/issues/13413
https://github.com/dotnet/core-eng/issues/13415
https://github.com/dotnet/core-eng/issues/13449
https://github.com/dotnet/core-eng/issues/13492
https://github.com/dotnet/core-eng/issues/13503
https://github.com/dotnet/core-eng/issues/13536
https://github.com/dotnet/core-eng/issues/13781
https://github.com/dotnet/core-eng/issues/13800
https://github.com/dotnet/core-eng/issues/13807
https://github.com/dotnet/core-eng/issues/13862
https://github.com/dotnet/core-eng/issues/13897
https://github.com/dotnet/core-eng/issues/13983
https://github.com/dotnet/core-eng/issues/14002
https://github.com/dotnet/core-eng/issues/14055
https://github.com/dotnet/core-eng/issues/14182
https://github.com/dotnet/core-eng/issues/14258
https://github.com/dotnet/core-eng/issues/14273
https://github.com/dotnet/core-eng/issues/14357
https://github.com/dotnet/core-eng/issues/14402
https://github.com/dotnet/core-eng/issues/14527
https://github.com/dotnet/core-eng/issues/14679
https://github.com/dotnet/core-eng/issues/14680
https://github.com/dotnet/core-eng/issues/14708
https://github.com/dotnet/core-eng/issues/14710
https://github.com/dotnet/core-eng/issues/14721
https://github.com/dotnet/core-eng/issues/14757
https://github.com/dotnet/core-eng/issues/14794
https://github.com/dotnet/core-eng/issues/14913
https://github.com/dotnet/core-eng/issues/14915
https://github.com/dotnet/core-eng/issues/15058
https://github.com/dotnet/core-eng/issues/15059
https://github.com/dotnet/core-eng/issues/15102
https://github.com/dotnet/core-eng/issues/15118
https://github.com/dotnet/core-eng/issues/15129
https://github.com/dotnet/core-eng/issues/15157
https://github.com/dotnet/core-eng/issues/15235
https://github.com/dotnet/core-eng/issues/15269
https://github.com/dotnet/core-eng/issues/15271
https://github.com/dotnet/core-eng/issues/15294
https://github.com/dotnet/core-eng/issues/15318
https://github.com/dotnet/core-eng/issues/15321
https://github.com/dotnet/core-eng/issues/15322
https://github.com/dotnet/core-eng/issues/15324
https://github.com/dotnet/core-eng/issues/15366
https://github.com/dotnet/core-eng/issues/15387
https://github.com/dotnet/core-eng/issues/15406
https://github.com/dotnet/core-eng/issues/15453
https://github.com/dotnet/core-eng/issues/15478
https://github.com/dotnet/core-eng/issues/15484
https://github.com/dotnet/core-eng/issues/15491
https://github.com/dotnet/core-eng/issues/15494
https://github.com/dotnet/core-eng/issues/15503
https://github.com/dotnet/core-eng/issues/15512
https://github.com/dotnet/core-eng/issues/15517
https://github.com/dotnet/core-eng/issues/15542
https://github.com/dotnet/core-eng/issues/15571
https://github.com/dotnet/core-eng/issues/15572
https://github.com/dotnet/core-eng/issues/15591
https://github.com/dotnet/core-eng/issues/15559
https://github.com/dotnet/core-eng/issues/15573
https://github.com/dotnet/core-eng/issues/15631
https://github.com/dotnet/core-eng/issues/15629
The text was updated successfully, but these errors were encountered: