-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add common engsys breaking changes doc #5938
Conversation
|
||
``` | ||
# azure-sdk-for-java | ||
git cherry-pick 2f9cdd9319f1867707b5872f12d97d5cdbbb077a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you are being explicit to make this a simply copy and past but perhaps we just link to the tools PR and tell folks to find the sync merge. That would be more generic and isn't difficult to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern there is if you have to stack breaking changes on top of each other for an older branch, it could get pretty annoying to open up multiple sync PRs. It doesn't take me more than a minute to get all the cherry pick commits when I make a breaking change, since I already have the sync PRs open. But I can also see if we don't want the doc to get huge that it makes more sense to use the generic instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very worried about it at this point but something we might consider if we end up with more of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think it's user friendly enough and it's going to create more work for us helping people. For example, I'm currently adding another breaking change referencing #5608. This PR has so many comments that the actual sync PR links are auto-hidden by the github UI. Then the user has to go and make sure to find the merge commit from the PR, not the actual PR commits. I think it's pretty likely we're going to have to help people troubleshoot either finding the right PR or the right commit, so it's going to be less effort to put the right commits into the markdown from the beginning. That way people can just click a link and copy a single command and there's less room for error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Perhaps we consider some collapsible sections to help reduce the clutter as this grows. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-collapsed-sections#creating-a-collapsed-section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konrad-jamrozik actually the opposite, you fixed something just in time (updating the codeowners tool version to the one built on net 6) because net 5 seems to have been removed from our build agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the doc title is somewhat misleading. It's not actually listing "Breaking Changes" but rather "Changes that are required for things to continue working." Not sure how I should title that (maybe "Known Backwards Compatibility Issues and Fixes" ???)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I updated the description to hopefully clarify what is meant by "breaking changes" and I added collapsible sections (nice find @weshaggard!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion but looks good otherwise.
@@ -0,0 +1,104 @@ | |||
This document contains a list of known breaking changes in the engineering system tools (mainly `eng/common` directories) and available workarounds. Typically this takes the form of infrastructure changes that are not backwards compatible with older versions of the engineering system configs/scripts/tools, mainly due to dependency changes in our test agent images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I immediately have a question: how these breaking changes are surfaced to me if I make them? It would help me here if you could perhaps link to some examples; if you link to builds, then of course it would be good to retain them indefinitely (I do it all the time).
If the answer is "it depends on the tool that broke" then perhaps each breaking change should have examples, or at least some elaboration how I can verify the breaking change occurred. Like, if you share an error message snippet, I still need to know where I have to look for it. In fact, you did this in at least one place below:
This error may occur in pipelines that call
git sparse-checkout
without the--no-cone
flag set.
Still, would be nice to link to one known occurrence of it.
Also, I understand the idea is that ideally we would put this doc under some aka.ms url and link to it from relevant errors? In general, what is the process of sharing this doc with people?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to include the offending error messages so if people search for it they will find this doc. I don't intend that people would start from this doc and then inspect their branch, but rather that it can be a troubleshooting tool when one sees these errors, which ideally are very rare. If people have an old branch that they need to release from, then they should attempt to fully update our common directory via git checkout main eng/common
. If for some reason that's not possible, then they can attempt to cherry pick any commits they are missing from this page.
Typing this out, I think that I should update the top of this doc to instruct people to first try git checkout main eng/common
, but it also makes the diff for their hotfix branch a lot larger, which may not be desired.
I don't want to have to worry about build retention so I've been trying to include enough of the log message and a description to make it clear what the context is.
Also, I understand the idea is that ideally we would put this doc under some aka.ms url and link to it from relevant errors? In general, what is the process of sharing this doc with people?
Usually these errors come out of tooling/dependencies we don't control and we don't know which errors we'll see in advance. The original intent of this doc was to make it easier for us to help people who come to the channel with build breaks, since we can just link them to a section on the markdown page. In an ideal world, this doc would be as small as possible and used as little as possible. We want to keep our pipelines backwards compatible, but we've been a bit unlucky recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read what you wrote, but I still have a little bit of hard time conceptualizing the workflow here. The doc header says:
This document contains a list of known breaking changes in the engineering system tools (mainly
eng/common
directories) and available workarounds. Typically this takes the form of infrastructure changes that are not backwards compatible with older versions of the engineering system configs/scripts/tools, mainly due to dependency changes in our test agent images.
Plus here you wrote:
So I guess the doc title is somewhat misleading. It's not actually listing "Breaking Changes" but rather "Changes that are required for things to continue working." Not sure how I should title that (maybe "Known Backwards Compatibility Issues and Fixes" ???)
Basically I have problems answers the following questions:
- when, exactly, this doc is applicable? (Scope)
- if it is applicable, how will I (as a member of EngSys team) remember that I need to consult this doc? (Discoverability)
I am hoping we have a better answer to 1. than "potentially to any tooling or process owned by EngSys" and better answer to 2. than "You just need to memorize it is there and recall it while troubleshooting anything we own".
Regarding 1., you also wrote:
The original intent of this doc was to make it easier for us to help people who come to the channel with build breaks, since we can just link them to a section on the markdown page. In an ideal world, this doc would be as small as possible and used as little as possible. We want to keep our pipelines backwards compatible, but we've been a bit unlucky recently.
Sooo, is it about some pipelines? Can we scope-down/specify-down this file to something like "Troubleshooting guide for EngSys CI/CD build failures" and say something like "Consult this file if one of the pipelines from the following set fails [description of the pipeline set]" ?
Regarding 2.: maybe we could somehow hook to all relevant pipelines with some kind of workflow/decorator like "If this pipeline fails, then as part of error message, output link to aka.ms/[thisFile]" ? Not sure how technically feasible this is. If we have docs explaining our CI/CD setup, then perhaps we could add links there too, like "Troubleshooting CI/CD issues".
Re this:
I've been trying to include the offending error messages so if people search for it they will find this doc
So the expectation here is they will do Azure-org-wide search using GitHub for the error message snippet? I think a lot of folks won't do that.
In any case, not a blocker of course, just some thoughts how to make sure this guidance gets discovered and applied when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to think of this as just-in-time documentation. As in we add things to it when we discover folks are hitting an issue that we know we have fixed in main and then we point folks to those instructions for the next person that might hit this. Trying to discover what may or may not break is not an exercise I think we need to be in which is why I like simply adding as needed.
As a member of EngSys we will simply have to know from experience or to search when we are seeing a particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: use our ADO wiki for such just-in-time docs. Less friction to incrementally improve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to delete the corresponding breaking changes doc in the wiki to avoid duplication?
We could instead link from the breaking changes doc here to the wiki.
From what I understood docs in repo should be tightly coupled to the stuff they are documenting (like READMEs), and wiki is better suited for documenting cross-cutting concerns, so stuff like in this document.
My intuition about searching stuff is the opposite - folks generally understand that wiki is the place for docs to be, and first place to go to. Repository search is for next level, when one wants to dig into the code. Plus I am concerned about people knowing they need to search exactly this repository - how will they know? (in other words: which set is bigger, the set of SDK devs that don't know about the wiki, or set of SDK devs that don't know about the fact there are documents in this repo they need to look for). Or maybe there is some widespread culture of doing global search on Azure
GitHub org that I am not aware of?
I also understood, from what Wes said, we don't really care about discoverability here, we just need to know about this doc. Then this removes main hypothetical downside of it being in a wiki.
But again - this is just me thinking aloud about the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are fair points in favor of using the wiki for this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weshaggard thoughts on Konrad's response? You originally wanted this taken out of the wiki.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree discoverability is going to be an issue in either place and likely we will aways have to point people to it. However, I much prefer things live closer to where they have impact and if they can be in the public they should be there to potentially help the community. In this case these apply to eng/common changes so I'd prefer they live here, and it also potentially helps the community if they hit similar issues they can see the fixes we did. So I would still keep them in the MD file in the repo here and just remove the wiki entry or leave a link to the MD file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
|
||
## Credscan - You must install or update .NET to run this application. | ||
|
||
This error comes from the "CredScan running" step in the Compliance stage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the Compliance stage
This is a bit vague to me because I am missing context. Compliance stage of what? Some build? Of which pipeline? An example would be usfeful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, in this one it's pretty consistent which azure pipeline step this comes from. I will link to the config and add a screenshot.
No description provided.