-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔥🔥🔥 "Libraries Test Run checked coreclr Linux" timing out on all PRs #45061
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @ViktorHofer Issue Details
|
@safern can you please take a look at that one? |
Still happening in PRs, i.e. #45108. |
cc @aik-jahoda |
I was searching throug Kusto and I don't see even console link. SO this may be infrastructure. All the cases I look at are containers. |
@jkotas let me know about this and i'm catching up. |
Just catching up on this. I can help to look at data and follow up with FR if there is no thread already. |
It was here: |
Thanks @stephentoub. I'm taking over to drive closure on this one. |
I just looked at data for some jobs that were linked here, and it looks like that queue was either clogged or had a hicup. The workitems are running fine, taking less than 1 minute one they get a machine, but it looks like the average waiting time on the queue was 11 hours 😮 |
The core-eng issues are: https://github.com/dotnet/core-eng/issues/11485 and https://github.com/dotnet/core-eng/issues/11468. @dotnet/dnceng says it is fixed. I'm going to leave this open to see if we get more instances of this before EOD, if not I will close it. |
Still happening: #45137 |
Ok, I was just about to close this as the data suggested it didn't happen but I found out that the jobs where it happened are not showing on kusto, so I looked into swagger and that example just posted, shows all workitems as "waiting". https://helix.dot.net/api/jobs/48940d46-78a5-4bab-be97-a0f38db8c27a/details?api-version=2019-06-17 I pinged the FR thread and the issue on: https://github.com/dotnet/core-eng/issues/11468#issuecomment-732715076 Thanks, for reporting the new instance! |
Update: the queue should be back at capacity. We're killing all jobs that started more than 2 hours ago to ease the queue. Current issue to investigate why machines are suddenly going offline is here: https://github.com/dotnet/core-eng/issues/11503 |
Still happening in #45079. I did rerun the timed out test a couple of times without any luck. |
I haven't seen this anymore. I looked at the queue health and it is pretty healthy with average wait times of 15 mins since yesterday. Please re-open if you do see this happen again. |
Examples: #44688, #44945, ...
The text was updated successfully, but these errors were encountered: