Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/env/windows/arm64: windows/arm64 builders are down #58604

Closed
qmuntal opened this issue Feb 20, 2023 · 16 comments
Closed

x/build/env/windows/arm64: windows/arm64 builders are down #58604

qmuntal opened this issue Feb 20, 2023 · 16 comments
Labels
arch-arm64 Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@qmuntal
Copy link
Contributor

qmuntal commented Feb 20, 2023

windows\arm64 builders have been unresponsive since February 14.

I don't have more insights, but filling this issue to track the status of that failure.

@cagedmantis @heschi @dmitshur

@qmuntal qmuntal added OS-Windows Builders x/build issues (builders, bots, dashboards) arch-arm64 labels Feb 20, 2023
@gopherbot gopherbot added this to the Unreleased milestone Feb 20, 2023
@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 21, 2023
@cagedmantis
Copy link
Contributor

I just logged on to one of the machines. I can confirm the issue is not related to insufficient drive space. I'm restarting both machines.

@cagedmantis cagedmantis self-assigned this Feb 21, 2023
@dmitshur dmitshur moved this to In Progress in Go Release Feb 21, 2023
@cagedmantis
Copy link
Contributor

Both instances appear to be processing:

host-windows11-arm64-azure: 2/2

prod-arm-11-1 (13.92.137.79:1024) version 26, host-windows11-arm64-azure: connected 1m27.5s, working for 1m27.5s
prod-arm-11-2 (172.173.138.16:1025) version 26, host-windows11-arm64-azure: connected 7m17.8s, working for 7m17.8s

Going to close this out. Thank you for reporting it.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Release Feb 21, 2023
@qmuntal qmuntal reopened this Feb 14, 2024
@qmuntal
Copy link
Contributor Author

qmuntal commented Feb 14, 2024

windows/arm64 builders are down again. Reopening.

@dmitshur
Copy link
Contributor

dmitshur commented Feb 14, 2024

CC @thanm. This seems to affect primarily the old dashboard. The windows/arm64 builder in LUCI is up. Though I see only 1 is up, the other is quarantined and may also need to be restarted.

@thanm
Copy link
Contributor

thanm commented Feb 14, 2024

I restarted the old dashboard builders, farmer reports that they are up.

For the misbehving LUCI builder the machine looks ok (not out of disk space) but is reporting connection errors:

requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
)
10420 2024-02-08 09:50:16.710 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/poll, after 1 attempts or 240 timeout.
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
10420 2024-02-08 09:50:16.710 E: Swarming poll error: Failed to contact server
10420 2024-02-08 15:29:49.161 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/poll, after 1 attempts or 240 timeout.
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
10420 2024-02-08 15:29:49.161 E: Swarming poll error: Failed to contact server

I am going to restart it.

@thanm
Copy link
Contributor

thanm commented Feb 14, 2024

Looks like both new and old builders are up again. Closing this out, please re-open if things go south again...

@thanm thanm closed this as completed Feb 14, 2024
@qmuntal
Copy link
Contributor Author

qmuntal commented Feb 15, 2024

Thanks for looking into this @thanm. LUCI builders are now failing with an infra error, e.g. https://ci.chromium.org/ui/p/golang/builders/try/gotip-windows-arm64/b8756084754023713041/overview, so I'm going to reopen.

@qmuntal qmuntal reopened this Feb 15, 2024
@thanm
Copy link
Contributor

thanm commented Feb 15, 2024

I will take a look. From the swarming bot summary page for one of the builders it looks like the issue now is that things are timing out: https://chromium-swarm.appspot.com/bot?id=windows-arm64-azure--03 ... I'll poke at it and see what I can find out.

@thanm
Copy link
Contributor

thanm commented Feb 15, 2024

Both VMs seem to be up and happy at the moment, and I verified that all of the antivirus protection is off (that's usually the source of timeouts). I'll take another look tonight to see if there are more problems.

@qmuntal
Copy link
Contributor Author

qmuntal commented Feb 27, 2024

windows/arm64 LUCI builders have been up and running for a week. Seems like the issue is solved. Thanks @thanm.

@qmuntal qmuntal closed this as completed Feb 27, 2024
@qmuntal qmuntal reopened this Mar 7, 2024
@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 7, 2024

LUCI and legacy builders are down.

@thanm
Copy link
Contributor

thanm commented Mar 7, 2024

I will take a look this morning, thanks

@thanm
Copy link
Contributor

thanm commented Mar 7, 2024

OK, LUCI builders should be back up. I restarted the VMs, not totally clear on why they were wedged.

Still looking at the coordinator-based VMs.

@thanm
Copy link
Contributor

thanm commented Mar 7, 2024

Coordinator-based VMs are back now as well.

@qmuntal
Copy link
Contributor Author

qmuntal commented Mar 7, 2024

Thanks! Is there anything that I can do to fix this issue next time it happens so I don't have to be bothering you every now and then?

@qmuntal qmuntal closed this as completed Mar 7, 2024
@thanm
Copy link
Contributor

thanm commented Mar 7, 2024

Thanks! Is there anything that I can do to fix this issue next time it happens so I don't have to be bothering you every now and then?

Please do continue to ping when you see problems. I think the main issues with builder stability at the moment are due to the fact that we (release team) are still figuring out how to get our more "boutique" builders working properly with LUCI (and learning about LUCI for that matter). Hopefully as I become more of a LUCI expert we'll be able to move towards more stable/reliable builders...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
Archived in project
Development

No branches or pull requests

5 participants