Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common engsys breaking changes doc #5938

Merged
merged 2 commits into from
Jun 14, 2023
Merged

Conversation

benbp
Copy link
Member

@benbp benbp commented Apr 7, 2023

No description provided.

@benbp benbp requested a review from a team as a code owner April 7, 2023 18:34
@benbp benbp added the Central-EngSys This issue is owned by the Engineering System team. label Apr 7, 2023
@benbp benbp self-assigned this Apr 7, 2023
@benbp benbp requested a review from weshaggard April 7, 2023 18:34

```
# azure-sdk-for-java
git cherry-pick 2f9cdd9319f1867707b5872f12d97d5cdbbb077a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you are being explicit to make this a simply copy and past but perhaps we just link to the tools PR and tell folks to find the sync merge. That would be more generic and isn't difficult to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern there is if you have to stack breaking changes on top of each other for an older branch, it could get pretty annoying to open up multiple sync PRs. It doesn't take me more than a minute to get all the cherry pick commits when I make a breaking change, since I already have the sync PRs open. But I can also see if we don't want the doc to get huge that it makes more sense to use the generic instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very worried about it at this point but something we might consider if we end up with more of these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think it's user friendly enough and it's going to create more work for us helping people. For example, I'm currently adding another breaking change referencing #5608. This PR has so many comments that the actual sync PR links are auto-hidden by the github UI. Then the user has to go and make sure to find the merge commit from the PR, not the actual PR commits. I think it's pretty likely we're going to have to help people troubleshoot either finding the right PR or the right commit, so it's going to be less effort to put the right commits into the markdown from the beginning. That way people can just click a link and copy a single command and there's less room for error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@konrad-jamrozik konrad-jamrozik Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I'm currently adding another breaking change referencing #5608.

@benbp What did I break? 😖

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konrad-jamrozik actually the opposite, you fixed something just in time (updating the codeowners tool version to the one built on net 6) because net 5 seems to have been removed from our build agents.

Copy link
Member Author

@benbp benbp Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the doc title is somewhat misleading. It's not actually listing "Breaking Changes" but rather "Changes that are required for things to continue working." Not sure how I should title that (maybe "Known Backwards Compatibility Issues and Fixes" ???)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I updated the description to hopefully clarify what is meant by "breaking changes" and I added collapsible sections (nice find @weshaggard!).

Copy link
Member

@weshaggard weshaggard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion but looks good otherwise.

@@ -0,0 +1,104 @@
This document contains a list of known breaking changes in the engineering system tools (mainly `eng/common` directories) and available workarounds. Typically this takes the form of infrastructure changes that are not backwards compatible with older versions of the engineering system configs/scripts/tools, mainly due to dependency changes in our test agent images.
Copy link
Contributor

@konrad-jamrozik konrad-jamrozik Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I immediately have a question: how these breaking changes are surfaced to me if I make them? It would help me here if you could perhaps link to some examples; if you link to builds, then of course it would be good to retain them indefinitely (I do it all the time).

If the answer is "it depends on the tool that broke" then perhaps each breaking change should have examples, or at least some elaboration how I can verify the breaking change occurred. Like, if you share an error message snippet, I still need to know where I have to look for it. In fact, you did this in at least one place below:

This error may occur in pipelines that call git sparse-checkout without the --no-cone flag set.

Still, would be nice to link to one known occurrence of it.

Also, I understand the idea is that ideally we would put this doc under some aka.ms url and link to it from relevant errors? In general, what is the process of sharing this doc with people?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to include the offending error messages so if people search for it they will find this doc. I don't intend that people would start from this doc and then inspect their branch, but rather that it can be a troubleshooting tool when one sees these errors, which ideally are very rare. If people have an old branch that they need to release from, then they should attempt to fully update our common directory via git checkout main eng/common. If for some reason that's not possible, then they can attempt to cherry pick any commits they are missing from this page.

Typing this out, I think that I should update the top of this doc to instruct people to first try git checkout main eng/common, but it also makes the diff for their hotfix branch a lot larger, which may not be desired.

I don't want to have to worry about build retention so I've been trying to include enough of the log message and a description to make it clear what the context is.

Also, I understand the idea is that ideally we would put this doc under some aka.ms url and link to it from relevant errors? In general, what is the process of sharing this doc with people?

Usually these errors come out of tooling/dependencies we don't control and we don't know which errors we'll see in advance. The original intent of this doc was to make it easier for us to help people who come to the channel with build breaks, since we can just link them to a section on the markdown page. In an ideal world, this doc would be as small as possible and used as little as possible. We want to keep our pipelines backwards compatible, but we've been a bit unlucky recently.

Copy link
Contributor

@konrad-jamrozik konrad-jamrozik Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read what you wrote, but I still have a little bit of hard time conceptualizing the workflow here. The doc header says:

This document contains a list of known breaking changes in the engineering system tools (mainly eng/common directories) and available workarounds. Typically this takes the form of infrastructure changes that are not backwards compatible with older versions of the engineering system configs/scripts/tools, mainly due to dependency changes in our test agent images.

Plus here you wrote:

So I guess the doc title is somewhat misleading. It's not actually listing "Breaking Changes" but rather "Changes that are required for things to continue working." Not sure how I should title that (maybe "Known Backwards Compatibility Issues and Fixes" ???)

Basically I have problems answers the following questions:

  1. when, exactly, this doc is applicable? (Scope)
  2. if it is applicable, how will I (as a member of EngSys team) remember that I need to consult this doc? (Discoverability)

I am hoping we have a better answer to 1. than "potentially to any tooling or process owned by EngSys" and better answer to 2. than "You just need to memorize it is there and recall it while troubleshooting anything we own".

Regarding 1., you also wrote:

The original intent of this doc was to make it easier for us to help people who come to the channel with build breaks, since we can just link them to a section on the markdown page. In an ideal world, this doc would be as small as possible and used as little as possible. We want to keep our pipelines backwards compatible, but we've been a bit unlucky recently.

Sooo, is it about some pipelines? Can we scope-down/specify-down this file to something like "Troubleshooting guide for EngSys CI/CD build failures" and say something like "Consult this file if one of the pipelines from the following set fails [description of the pipeline set]" ?

Regarding 2.: maybe we could somehow hook to all relevant pipelines with some kind of workflow/decorator like "If this pipeline fails, then as part of error message, output link to aka.ms/[thisFile]" ? Not sure how technically feasible this is. If we have docs explaining our CI/CD setup, then perhaps we could add links there too, like "Troubleshooting CI/CD issues".

Re this:

I've been trying to include the offending error messages so if people search for it they will find this doc

So the expectation here is they will do Azure-org-wide search using GitHub for the error message snippet? I think a lot of folks won't do that.

In any case, not a blocker of course, just some thoughts how to make sure this guidance gets discovered and applied when appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to think of this as just-in-time documentation. As in we add things to it when we discover folks are hitting an issue that we know we have fixed in main and then we point folks to those instructions for the next person that might hit this. Trying to discover what may or may not break is not an exercise I think we need to be in which is why I like simply adding as needed.

As a member of EngSys we will simply have to know from experience or to search when we are seeing a particular issue.

Copy link
Contributor

@konrad-jamrozik konrad-jamrozik Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use our ADO wiki for such just-in-time docs. Less friction to incrementally improve.

Copy link
Contributor

@konrad-jamrozik konrad-jamrozik Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to delete the corresponding breaking changes doc in the wiki to avoid duplication?
We could instead link from the breaking changes doc here to the wiki.

From what I understood docs in repo should be tightly coupled to the stuff they are documenting (like READMEs), and wiki is better suited for documenting cross-cutting concerns, so stuff like in this document.

My intuition about searching stuff is the opposite - folks generally understand that wiki is the place for docs to be, and first place to go to. Repository search is for next level, when one wants to dig into the code. Plus I am concerned about people knowing they need to search exactly this repository - how will they know? (in other words: which set is bigger, the set of SDK devs that don't know about the wiki, or set of SDK devs that don't know about the fact there are documents in this repo they need to look for). Or maybe there is some widespread culture of doing global search on Azure GitHub org that I am not aware of?

I also understood, from what Wes said, we don't really care about discoverability here, we just need to know about this doc. Then this removes main hypothetical downside of it being in a wiki.

But again - this is just me thinking aloud about the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are fair points in favor of using the wiki for this doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard thoughts on Konrad's response? You originally wanted this taken out of the wiki.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree discoverability is going to be an issue in either place and likely we will aways have to point people to it. However, I much prefer things live closer to where they have impact and if they can be in the public they should be there to potentially help the community. In this case these apply to eng/common changes so I'd prefer they live here, and it also potentially helps the community if they hit similar issues they can see the fixes we did. So I would still keep them in the MD file in the repo here and just remove the wiki entry or leave a link to the MD file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻


## Credscan - You must install or update .NET to run this application.

This error comes from the "CredScan running" step in the Compliance stage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the Compliance stage

This is a bit vague to me because I am missing context. Compliance stage of what? Some build? Of which pipeline? An example would be usfeful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, in this one it's pretty consistent which azure pipeline step this comes from. I will link to the config and add a screenshot.

@benbp benbp merged commit 21bdea0 into Azure:main Jun 14, 2023
@benbp benbp deleted the benbp/breaking-changes branch June 14, 2023 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Central-EngSys This issue is owned by the Engineering System team.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants