Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[umbrella] k/k-wide triage workflow improvements #3456

Closed
nikopen opened this issue Mar 18, 2019 · 39 comments
Closed

[umbrella] k/k-wide triage workflow improvements #3456

nikopen opened this issue Mar 18, 2019 · 39 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@nikopen
Copy link
Contributor

nikopen commented Mar 18, 2019

This is an overview of ideas I've been thinking of in the last 6 months as triage lead on the release team. Related initial discussion for point 1. can be found here: https://groups.google.com/forum/#!topic/kubernetes-sig-contribex/BvGmOQ0v5f0 , the rest should be further discussed in some meeting - 1.14 release retro is a good candidate

Series of items and features that would be beneficial if implemented:

  1. All issues hitting K/K are auto-labeled as 'needs-sig-triage' or something similar.
    addressed here: Add needs-triage automation on k/k issues test-infra#11818
                 |
            open bug/PR
                 |
                 V
        WAITING-ROOM: needs-sig, needs-sig-triage
                 |         ^
            (assign SIG)   |
                 |         |
                 V         |
      --> TRIAGE: needs-sig-triage<----
     |       /           \          |
     |  (close with   (verify)      |
     |   reason)          |         |
     |      |             V         |
      -- CLOSED      BACKLOG: kind/*, priority/*
                          |
                      (assign or claim)
                          |
                          V
                       IN-PROGRESS: assignee
  1. SIGs are tasked by definition to regularly search all issues and appropriately label them / categorize them. This is made much easier by implementing point 1.

  2. Each SIG has a dedicated project/Kanban board each, where visibility of current and upcoming work and milestoned work is very, very visible with a quick glance - columns like Backlog, In Progress, Release-Blocking, etc. cc @parispittman @idvoretskyi on boards but for broader project usage

Case in point: https://github.com/orgs/kubernetes/projects/8 , the SIG-Windows board has worked great, both for them as a SIG and sig-release / release issue triage.

  1. After SIG reviews the new ticket(issue), it gets an appropriate category - either via direct labels or via Project Board automated labels. thockin suggested the use of triage labels which are a bit legacy and should be reworked in tandem with project boards to have the desired workflow.
    An example on a project board being: issues moved from 'backlog' to 'in progress' automatically get a 'triage/inprogress' label (or smth similar). Label Automation + Projectboards + searchQueries should all have seamless integration and compliment each other in the final iteration of the new workflow.

  2. Release team specific: Based on all above, incoming 'milestoned' work is work that belongs to SIGs and it should be a SIG's responsibility to control and estimate what can be done for each release cycle, with the release team stepping in only when needed (as release approaches). Standard calendar checkpoints in release-readiness will further help - this is what the 'Enhancements Deadline' stands for, but doesn't cover stuff outside of new features and that work is usually left for the release team to ponder upon their fate.

  3. Therefore, a prototype flowchart is: New Ticket -> SIG -> Labeling or Deletion <-> Project Boards <-> Re-labeling based on current status <-> Release Team is able to view status at any time via project boards

  4. For all above, mass rework of labels is needed.
    'priority' labels are a subject of discussion in every release cycle as it's a fuzzy concept in itself, should be reworked with ideas such as 'impact' and 'importance' in mind,
    'triage' labels are a bit old and currently mostly unused but can be very helpful if properly reworked and integrated into a standard system,
    'kind' labels can be further reworked as there are many issues that do not belong in any current 'kind' (cc @BenTheElder)
    deletion of unwanted labels or re-work into other ones,
    addition of new labels like 'needs-sig-triage', 'release-blocking', 'wontfix' etc.
    related initial issue for 'triage' labels: Nuke 'triage' labels / replace them with 'lifecycle' or other  #3455

  5. and with that all, rework of the old document located in https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md and possibly updating many others

Other generic improvements include:

/sig release pm contributor-experience

tl;dr make ticket management easier for everyone

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. sig/pm sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. labels Mar 18, 2019
@nikopen
Copy link
Contributor Author

nikopen commented Mar 18, 2019

@kubernetes/sig-release @kubernetes/sig-contributor-experience-feature-requests
@thockin @guineveresaenger @nikhita @idvoretskyi @justaugustus @BenTheElder @neolit123
@kubernetes/sig-testing @fejta @cjwagner @BenTheElder

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 18, 2019
@nikopen nikopen changed the title Umbrella ticket: Proposals for an updated project-wide triage workflow Umbrella ticket: Proposals for an updated k/k-wide triage workflow Mar 18, 2019
@nikopen
Copy link
Contributor Author

nikopen commented Mar 18, 2019

Summarized the points for discussion in the 1.14 retro doc
https://docs.google.com/document/d/1he2axf3adOIk3gA3vxFAewejtE2tm3Wl1NA1p-ooXpo/edit#

@nikopen
Copy link
Contributor Author

nikopen commented Mar 18, 2019

/assign

@justaugustus
Copy link
Member

/assign

@nikhita
Copy link
Member

nikhita commented Mar 20, 2019

/milestone May

@k8s-ci-robot k8s-ci-robot added this to the May milestone Mar 20, 2019
@nikopen nikopen changed the title Umbrella ticket: Proposals for an updated k/k-wide triage workflow [umbrella] k/k-wide triage workflow improvements Mar 30, 2019
@nikhita
Copy link
Member

nikhita commented May 8, 2019

This is an umbrella issue, so moving out of the current milestone.

/milestone Next

@k8s-ci-robot k8s-ci-robot modified the milestones: May, Next May 8, 2019
@stevekuznetsov
Copy link
Contributor

@fejta @spiffxp there seems to be enough work here on the Prow side to have this fulfill an Epic for us.

@justaugustus
Copy link
Member

Here's the state machine as a chart, based on what I have in my head:

State Machine

State Description Entry Criteria Bot Actions Human Actions Exit Criteria
Open Default state when an issue is opened N/A needs/sig and needs/triage are applied One or more sig/* labels are applied Has sig/* label
Triage SIG triages issue to determine if it needs more info, should be closed, or moved to the backlog Has sig/* and needs/triage label N/A Needs info: send /needs info, Closed: send /close <reason>, Backlog: apply kind/* and priority/* Has closed/* OR kind/* and priority/*
Closed/Complete SIG has determined that issue was completed or cannot be completed Has closed/* label needs/triage, needs/info are removed Can send /reopen to reopen the issue N/A, complete state
Backlog SIG has determined that issue is relevant and should be picked up by a SIG member Has kind/* and priority/* label needs/triage, needs/info are removed Assign the issue - self: /lifecycle active, /assign (applies lifecycle/active), org member: /assign <org-member> Has lifecycle/active label
In Progress SIG member has begun work on the issue Has lifecycle/active label N/A Work the issue, send /close [<reason>] Has closed/* OR stale labels
Stale Issue has been open for some interval without an update Issue has been open 30 days without an update lifecycle/{needs-attention,stale,rotten is applied, lifecycle/active is removed Active: send /lifecycle active, Close: send /close [<reason>], Freeze: send /lifecycle frozen Has lifecycle/active, closed/*, or lifecycle/frozen
Frozen Issue is a long-term priority for the SIG and should not be subject to stale labels Has lifecycle/frozen label lifecycle/{needs-attention,stale,rotten is removed Close: send /close [<reason>], Unfreeze: send /remove-lifecycle frozen Has closed/* label OR lifecycle/frozen is removed

Labels

Needs

  • needs/sig
  • needs/triage
  • needs/more-info

Closed

  • closed/complete
  • closed/support
  • closed/duplicate|dupe
  • closed/not-reproducible|no-repro
  • closed/unresolved

Lifecycle

  • lifecycle/active
  • lifecycle/needs-attention
  • lifecycle/stale
  • lifecycle/rotten
  • lifecycle/frozen

Priority

  • priority/critical-urgent
  • priority/important-soon
  • priority/important-longterm

Actions

  • Rename needs-* labels to needs/ and allow for /needs commands
  • Rename triage/needs-information to needs/[more-]info and the remaining triage/* to closed/*
  • Deprecate unused priority/* labels

/sig testing

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label May 26, 2019
@justaugustus
Copy link
Member

Thanks so much for putting this together, @nikopen!
Allow me to comment on a few of these items...

  1. All issues hitting K/K are auto-labeled as 'needs-sig-triage' or something similar.
    addressed here: kubernetes/test-infra#11818

Agreed. This is a great first step with the immediate impact of being able to search by a single label, instead of an aggregate of them.

I'm in favor of needs-triage or needs/triage.

  1. SIGs are tasked by definition to regularly search all issues and appropriately label them / categorize them. This is made much easier by implementing point 1.

Are SIGs indeed tasked with this by definition or is it an undocumented expectation?

  1. Each SIG has a dedicated project/Kanban board each, where visibility of current and upcoming work and milestoned work is very, very visible with a quick glance - columns like Backlog, In Progress, Release-Blocking, etc. cc @parispittman @idvoretskyi on boards but for broader project usage

    Case in point: https://github.com/orgs/kubernetes/projects/8 , the SIG-Windows board has worked great, both for them as a SIG and sig-release / release issue triage.

Agreed that this would be benefitial on the SIG level, but for the Release Team, they'd still have to run through multiple boards to get an idea of what's happening. Perhaps a dashboard would be more useful?

  1. After SIG reviews the new ticket(issue), it gets an appropriate category - either via direct labels or via Project Board automated labels. thockin suggested the use of triage labels which are a bit legacy and should be reworked in tandem with project boards to have the desired workflow.
    An example on a project board being: issues moved from 'backlog' to 'in progress' automatically get a 'triage/inprogress' label (or smth similar). Label Automation + Projectboards + searchQueries should all have seamless integration and compliment each other in the final iteration of the new workflow.

A few things here...

  • We shouldn't encourage direct application of labels, as not everyone has access to direct apply.
  • AFAIK, project boards don't support automated labels.
  • The triage labels I've seen seem to be more accurately described as post-triage labels. They really only seem useful in the case that an issue is closed and we want to grep the reason for that after the fact, hence the suggestion above to rename them to closed/*. Issues assigned and in progress could instead searched via lifecycle/active. Any other states seem to be covered by the state chart above.
  1. Release team specific: Based on all above, incoming 'milestoned' work is work that belongs to SIGs and it should be a SIG's responsibility to control and estimate what can be done for each release cycle, with the release team stepping in only when needed (as release approaches). Standard calendar checkpoints in release-readiness will further help - this is what the 'Enhancements Deadline' stands for, but doesn't cover stuff outside of new features and that work is usually left for the release team to ponder upon their fate.

What do you think we can do to improve this, without too much friction?

  1. Therefore, a prototype flowchart is: New Ticket -> SIG -> Labeling or Deletion <-> Project Boards <-> Re-labeling based on current status <-> Release Team is able to view status at any time via project boards

What are we trying to glean here? Completeness of the task? Last updated time?
Again, I think a dashboard would ultimately be more useful to the Release Team here.

  1. For all above, mass rework of labels is needed.
    'priority' labels are a subject of discussion in every release cycle as it's a fuzzy concept in itself, should be reworked with ideas such as 'impact' and 'importance' in mind,
    'triage' labels are a bit old and currently mostly unused but can be very helpful if properly reworked and integrated into a standard system,
    'kind' labels can be further reworked as there are many issues that do not belong in any current 'kind' (cc @BenTheElder)
    deletion of unwanted labels or re-work into other ones,
    addition of new labels like 'needs-sig-triage', 'release-blocking', 'wontfix' etc.
    related initial issue for 'triage' labels: Nuke 'triage' labels / replace them with 'lifecycle' or other  #3455

Agreed on some of the rework (see above), but I think we should punt on doing anything with the kind/*, priority/* labels in the near term. I only say that because these labels lead to some bikeshedding and I don't think refactoring them is strictly necessary to move this forward.

  1. and with that all, rework of the old document located in https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md and possibly updating many others

+1.

Other generic improvements include:

  • Mechanism that auto-applies milestone in PRs that are merged out of code freeze, so the full list of PRs included in 1.14 is easily grepped
    (issue is here kubernetes/test-infra#11611)

+1.

Let's land the standard and then reassess adding other label types.

  • Labels that indicate whether a ticket is release-blocking or good-to-have, e.g. (kind/release-blocking | kind/good-to-have)

release-blocking would probably be a priority; good-to-have I'm not sure about. Same opinion around punting this until we land the workflow.

  • Label + mechanism that automatically shifts a Ticket to the next milestone
    a few days after Freeze hits - this automates punting of 'good-to-have' stuff to the next milestone

+1.

  • Any ticket in the release-blocking column of a board automatically gets a kind/release-blocking label - this way, anyone can search github issues and PRs via label:kind/release-blocking+milestone:v1.14 query

I need to ponder how the board interaction would work, as this functionality doesn't exist natively.

I still owe a response for the enhancements tracking stuff. I'll add notes to that issue.

@justaugustus
Copy link
Member

justaugustus commented Feb 15, 2020

Enhancement issue opened: kubernetes/enhancements#1553
Provisional Issue Triage KEP opened: kubernetes/enhancements#1554

@justaugustus
Copy link
Member

/unassign @nikopen

@justaugustus
Copy link
Member

/remove-sig pm
/area enhancements

@k8s-ci-robot k8s-ci-robot added area/enhancements Issues or PRs related to the Enhancements subproject and removed sig/pm labels Apr 16, 2020
@justaugustus
Copy link
Member

Mislabeled:
/remove-area enhancements

@k8s-ci-robot k8s-ci-robot removed the area/enhancements Issues or PRs related to the Enhancements subproject label Apr 16, 2020
@markjacksonfishing
Copy link
Contributor

/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 20, 2020
@thockin
Copy link
Member

thockin commented May 20, 2020 via email

@justaugustus
Copy link
Member

@thockin -- I had some Releng work to do with anago, but will be picking this up later in the week and next week.

The PR is already mostly complete here: kubernetes/test-infra#16298

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 18, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 17, 2020
@mrbobbytables mrbobbytables modified the milestones: v1.19, v1.20 Sep 30, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests