Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When logging work item console log URI for failed work items, log machine name #8112

Closed
2 tasks
MattGal opened this issue Oct 27, 2021 · 6 comments
Closed
2 tasks

Comments

@MattGal
Copy link
Member

MattGal commented Oct 27, 2021

  • This issue is blocking
  • This issue is causing unreasonable pain

The API returns the Machine Name already; we should just log it so users can identify "every failure was on one machine" type scenarios:

https://helix.dot.net/api/jobs/cb310a24-aa78-4571-bc50-c25d05d9bde5/workitems/System.Text.Encoding.Tests?api-version=2019-06-17

@MattGal
Copy link
Member Author

MattGal commented Oct 27, 2021

@ilyas1974 this helps diagnosability of bad machines for customers, and should be relatively quick (first get the info and add it in https://github.com/dotnet/arcade/blob/main/src/Microsoft.DotNet.Helix/Sdk/GetHelixWorkItems.cs#L95 , then plumb that metadata through) but if you want to go through normal triage instead of FR just drop it out.

It IS possible this could end up getting throttled by the Helix API so it may not be feasible to do at scale.

@ChadNedzlek
Copy link
Member

Where did you want to log it to? It should already be in the console log (they all start with a little header line that has the job name, workitem name, and the machine name). The Kusto table should have it in the MachineName column for any workitem execution as well.

@MattGal
Copy link
Member Author

MattGal commented Oct 27, 2021

Indeed. @danmoseley is asking us to log it to the AzDO log. I looked at where this happens in the Arcade SDK and it's a bit funky and there's no existing workitem details call to piggyback off of, so this would be adding such a call. Easy to do, but the question is if this would push us over some usage limit during heavy times and cause 429s from the Helix API.

@ChadNedzlek
Copy link
Member

I'd rather not fill the AzDO log with a bunch of text. Is there a reason they can't use the test results area? The error messages full of guid and urls and nonsense make the build logs hard for normies to understand (since it's all useless information to them). I think it's important that build errors be concise and actionable... I'd be sad to move away from that if it could be avoided.

@MattGal
Copy link
Member Author

MattGal commented Oct 27, 2021

After discussing with @ChadNedzlek offline, I am closing this as not worth doing:

  • The data is easily available through other means
  • This will significantly increase API call count
  • The extra call would bet yet another place something might start randomly failing

@MattGal MattGal closed this as completed Oct 27, 2021
@danmoseley
Copy link
Member

fair enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants