Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CODEOWNERS file changes validation (gating), with baselining #4859

Closed
konrad-jamrozik opened this issue Dec 2, 2022 · 6 comments
Closed
Assignees
Labels
Central-EngSys This issue is owned by the Engineering System team. CODEOWNERS Linter Anything related to the CODEOWNERS linter CODEOWNERS Parser Anything related to the CODEOWNERS parser EngSys This issue is impacting the engineering system.

Comments

@konrad-jamrozik
Copy link
Contributor

konrad-jamrozik commented Dec 2, 2022

Background

Our Azure SDK repositories leverage CODEOWNERS files (e.g. azure-sdk-for-python, azure-sdk-for-net) in two ways.

First, to determine pull request reviewers, via GitHub interpreter of code owners file, as documented on GitHub docs "About code owners".

Second, to apply various additional augmentations, like automatically label PRs, notify about new issues being filed, or notify about relevant build failures. This is done by our own custom logic, as documented in azure-sdk/docs/pipelines/opensource.md.

Problem statement

The problem on the functionality relying on code owners contents, is that we do not have code owners file contents validation, thus resulting in ill-defined files, resulting in problems like notifications silently failing to be sent.

The GitHub-based logic, responsible for adding reviewers to PRs, has a GitHub-provided validator that results in a warnings banner if the file validation fails, e.g.:

image

image

The main issue here is these are just warnings, while we would like to prevent merging any changes to code owners file unless they pass validation (i.e. to gate the changes). For example, we would like to prevent folks from adding groups to the file, if they don't have necessary write permissions, as pointed out by this comment. We might also want to restrict the teams to a pre-vetted list, as seen here (Microsoft-internal).

More importantly, our own logic does not have any validation in place, hence if the code owners file is invalid, our augmentations will just silently be ignored, with no easy way to debug the issue.

Proposed solution

To address the code owners file contents issues described above, we propose to introduce a new pipeline and PR check that will validate our azure-sdk-* repos CODEOWNERS files and report any issues, both for the features provided by GitHub-based interpreter, and our logic providing all the custom augmentations.

The pipeline output would result in a set of well-defined problems, that can be manually uploaded beside the CODEOWNERS file as a text file used for baselining. Once baseline is available, the PR check will prevent any further changes to the code owners file unless they pass all validations.

Once such validations are in place, we could start addressing the various issues and limitations we have, as already reported by GitHub validator (see links above), or reported by our users, e.g. #4487.

An example validation rule we could implement: flag all aliases mentioned in CODEOWNERS file for folks who are no longer Microsoft employees. Once we discover them, we probably could delete them right now as part of a cleanup effort, even before we establish the baselines.

Implementation considerations

We could handle a team assigned to given path in a code owners file by expanding it to its members in referentially transparent way. This is nontrivial because we have bunch of custom tooling ingesting as input the output of our own customer CODEOWNERS parser, like FabricBot (see Reference section below for details). Bottom line, any additions or changes to our code owners parser require careful analysis, to ensure all our downstream tooling can properly interpret the changes/additions to supported format.

One possible way to figure out how to expand GitHub teams into Microsoft aliases is through usage of CloudMine Kusto clusters. Patrick Hallisey is possible point of contact to help with this.

Related work

Linter by @heaths:
https://github.com/heaths/gh-codeowners

If you have access, see also email thread RE: Proposed API Stewardship Board process changes for PRs from 2023/04.

Reference

This doc (Microsoft-internal) (see also #4837) explains how build failure notifications via Azure DevOps notification system are implemented. The main executable source is configuration-creator, run via notficiations.yml.

Our other augmentations are provided by our other solutions, like FabricBot providing the PR labeling augmentation. This is explained in detail in #2770 (comment).

@konrad-jamrozik konrad-jamrozik added EngSys This issue is impacting the engineering system. Central-EngSys This issue is owned by the Engineering System team. labels Dec 2, 2022
@konrad-jamrozik konrad-jamrozik self-assigned this Dec 2, 2022
@kurtzeborn kurtzeborn moved this from 🤔Triage to 📋Backlog in Azure SDK EngSys 🚢🎉 Dec 5, 2022
@weshaggard
Copy link
Member

Brain dump of potential rules to validate

  • Validate format is correct and can be parsed
  • Validate owners are valid and are MS employees and in Azure org and can be mapped to AAD identities.
  • Validate owners have correct write access to repo
  • Validate teams are part of Azure org and have write access to repo. Probably want to scope to our azure-sdk-write-* teams (or children teams of those), instead of allowing any random team and tracking those permissions.
  • Validate that the PRLabels and ServiceLabels are valid and defined in the repo
  • Validate that the path actually matches some files in the repo
  • Might want to add some validation to make sure there is a placeholder path for every sdk/ path in the repo.
  • Validate that multiple path rules don't accidently override each other.

@konrad-jamrozik
Copy link
Contributor Author

konrad-jamrozik commented Jan 4, 2023

Note: As of 5/22/2023 this spec has been implemented few months ago. In case of inconsistencies the implementation takes precedence. Please see the (massive) battery of unit tests included.

Note: this spec has been updated as of 1/23/2023 3:00 PM PST.

Per my chat with @weshaggard today, we want to add following restrictions on CODEOWNERS file which we will validate against:

CODEOWNERS paths must be relative to repository root

  • Every path in CODEOWNERS must start with /.

CODEOWNERS paths must avoid ambiguous use of wildcards

This section was added on 1/23/2023.

We forbid usage of ** of any other way than within slashes, i.e. as /**/:

  • Usage as prefix **/ is not possible because all paths must start with /.
  • Usage as suffix /** is invalid because it is equivalent to /.
  • Usage of ** in file or dir names, like /foo**bar/ or /**foo**/ is forbidden, because it is equivalent to single star, like /foo*bar/ or /*foo*/ respectively.

In addition, usage of suffix /**/ is invalid because it is equivalent to /**/** which is equivalent to /.

CODEOWNERS paths must not have some characters that have regex interpretation

The characters [, ], !, and ? are forbidden in CODEOWNERS paths. They are valid parts of file paths, but we forbid them to simplify implementation logic.

CODEOWNERS paths must point to files and directories that exist

  • If a CODEOWNERS path ends with /, there must be at least one existing directory it matches.
  • If a CODEOWNERS path does not end with /, there must be at least one existing file it matches.
    • Note that we assume that GitHub CODEOWNERS interpreter matches the CODEOWNERS contents only against input paths to files, not directories. This is because CODEOWNERS is used to determine reviewers of changed files, and in Git only files can be changed, not directories.
    • Note that it is not possible to have both file and directory with the same path in Windows or Ubuntu.
  • As a result of the above rules, it is not possible to have CODEOWNERS entries both for /foo and /foo/.

All files and directories must be covered by CODEOWNERS paths

  • Every file and directory in the repository must match at least one path in the CODEOWNERS file.

Drift management

  • If a PR removes a file or directory, its path must be removed from CODEOWNERS. Otherwise, PR validation must fail.
    • Note that no path may need to be removed if the removed file or directory was matching against a CODEOWNERS path that has wildcards, and that CODEOWNERS path still matches against at least one other file or directory.
  • If a PR adds a file or directory, there must exist a CODEOWNERS path that matches it. Otherwise, PR validation must fail.

Casing

  • CODEOWNERS paths are to be case sensitive, per the spec.
    • Note that our current prefix-based CODEOWNERS parser is case-insensitive.

Consequences for the parser logic implementation

As of 1/23/2023 this section has been updated. Previously, we assumed that if CODEOWNERS paths are matched against target path /foo, it wasn't known if it is a path to a file or directory. Now we assume that target path is always a path to a file.

As a consequence of the rules above, we can simplify our CODEOWNERS parser logic by making additional assumption:

If CODEOWNERS file is searched for a match for a target path, we can always match it against the first rule that matches it, using the same logic as GitHub parser.

This assumption works because:

  • If the target path is foo and our CODEOWNERS file has path /foo, then foo must be a path to an existing file, due to our validation. It is thus a match;
  • If the target path is foo and our CODEOWNERS file has path /foo/, then foo must be a path to file that cannot exist, because our validation ensures that /foo/ is an existing directory. Thus no CODEOWNERS path will match foo.

In addition, we can assume in the parser that all CODEOWNERS paths are absolute to the repository root.

Required review of existing CODEOWNERS files

To ensure our existing language repositories CODEOWNER files match the rules outlined above, we must do the following:

  • Iterate over all files in given repo and ensure all of them match against at least one path in CODEOWNERS;
  • Iterate over all CODEOWNERS paths and ensure each of them matches to at least one existing entry in the file system, which should be directory if the path ends with /, and file otherwise.
    • Note the match has to be case-sensitive.
  • Convert all paths of form foo (without any slashes in the beginning or middle) to /foo (i.e. add slash at the beginning). This is because GitHub CODEOWNERS parser interprets foo as **/foo per .gitignore spec, per this example:
# In this example, @octocat owns any file in an apps directory
# anywhere in your repository.
apps/ @octocat

# In this example, @doctocat owns any file in the `/docs`
# directory in the root of your repository and any of its
# subdirectories.
/docs/ @doctocat
  • Ensure notification-creator properly parses the new CODEOWNERS files and generates relevant rules.

@weshaggard
Copy link
Member

Every file and directory in the repository must match at least one path in the CODEOWNERS file.

I think that is a good goal but I don't know how feasible it will be.

Otherwise I believe what you wrote sounds correct based on our conversations.

@konrad-jamrozik
Copy link
Contributor Author

konrad-jamrozik commented Feb 2, 2023

Idea from @jsquire:

Have a weekly report with paths without owners or invalid owners. It would help understand where are the gaps while avoiding noise by avoiding "catch-all" ownership assignment.

konrad-jamrozik pushed a commit to Azure/azure-sdk-for-go that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240

Related PRs:
- Similar PR fixing invalid paths, but for `net` repo: Azure/azure-sdk-for-net#33584
- Similar PR deprioritizing the Azure SDK EngSys team ownership, but for `python` repo: Azure/azure-sdk-for-python#28534
konrad-jamrozik added a commit to Azure/azure-sdk-for-c that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-cpp that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-ios that referenced this issue Feb 10, 2023
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-android that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-dev that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
ellismg pushed a commit to Azure/azure-dev that referenced this issue Feb 10, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik added a commit to Azure/azure-sdk-for-c that referenced this issue Feb 13, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-ios that referenced this issue Feb 13, 2023
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-android that referenced this issue Feb 13, 2023
As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240
konrad-jamrozik pushed a commit to Azure/azure-sdk-for-go that referenced this issue Feb 13, 2023
* Update `CODEOWNERS` paths: fix invalid paths

As part of ongoing work of enabling wildcard support for `CODEOWNERS`:
- Azure/azure-sdk-tools#2770
- Azure/azure-sdk-tools#5088

and enabling stricter validation:
- Azure/azure-sdk-tools#4859

this PR:
- fixes invalid paths, to match rules explained [here](https://github.com/Azure/azure-sdk/blob/main/docs/policies/opensource.md#codeowners);
- removes `/**/tests.yml` and `/**/ci.yml`, to avoid all build failure notifications being routed to it once we enable the new regex-based, wildcard-supporting `CODEOWNERS` matcher, per: Azure/azure-sdk-tools#5088 (comment)

Once this PR is merged, I will enable the new `CODEOWNERS` matcher, similar to how it was done for `net` repo by these two PRs:
- Azure/azure-sdk-tools#5241
- Azure/azure-sdk-tools#5240

Related PRs:
- Similar PR fixing invalid paths, but for `net` repo: Azure/azure-sdk-for-net#33584
- Similar PR deprioritizing the Azure SDK EngSys team ownership, but for `python` repo: Azure/azure-sdk-for-python#28534

* Update .github/CODEOWNERS

Co-authored-by: Ben Broderick Phillips <[email protected]>

---------

Co-authored-by: Ben Broderick Phillips <[email protected]>
@JimSuplizio JimSuplizio moved this from 📋 Backlog to 🐝 Dev in Azure SDK EngSys 🚢🎉 Jul 17, 2023
@JimSuplizio
Copy link
Member

The parsing and linting have been completely rewritten as part of the work being done for the new syntax. The updates include baselining. The PR is currently in review. Once that goes in the YML for the linter pipeline needs to be written.

@JimSuplizio JimSuplizio added CODEOWNERS Linter Anything related to the CODEOWNERS linter CODEOWNERS Parser Anything related to the CODEOWNERS parser labels Oct 19, 2023
@JimSuplizio
Copy link
Member

This is finished. All of the new metadata changes for the parser as well as the linter and what it validates are in the codeowners-utils README.md file. The metadata is fully explained. As of this issue being fixed, all of the repositories have their linter pipelines enabled.

@github-project-automation github-project-automation bot moved this from 🐝 Dev to 🎊 Closed in Azure SDK EngSys 🚢🎉 Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Central-EngSys This issue is owned by the Engineering System team. CODEOWNERS Linter Anything related to the CODEOWNERS linter CODEOWNERS Parser Anything related to the CODEOWNERS parser EngSys This issue is impacting the engineering system.
Projects
Archived in project
Development

No branches or pull requests

3 participants