Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Windows GHA failing with network issues often #3266

Closed
peternied opened this issue Aug 29, 2023 · 6 comments
Closed

[Tracking] Windows GHA failing with network issues often #3266

peternied opened this issue Aug 29, 2023 · 6 comments
Assignees
Labels
flaky-test Flaky Test issue triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. windows

Comments

@peternied
Copy link
Member

I've filed a ticket with GitHub about the failure rate associated with Windows runners and network issues. Using this to provide visibility to the investigation / resolution.

Additional context

@github-actions github-actions bot added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Aug 29, 2023
@peternied peternied added windows flaky-test Flaky Test issue and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Aug 29, 2023
@stephen-crawford
Copy link
Contributor

[Triage] @peternied to add associated link form GitHub team working on investigating the issue. Marking as triaged since this is a tracking issue.

@stephen-crawford stephen-crawford added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Sep 18, 2023
@peternied
Copy link
Member Author

peternied commented Sep 18, 2023

Thanks @scrawfor99. I've got an internal ticket open with GitHub Support which I am tracking separately. Here is a brief update on how that is going:

(2 week ago) While we investigate this issue, we would like to understand if you have been facing this issue earlier or it has started recently. If it has started recently, can you please let us know a date from when you have started facing this issue. The intention is to understand if there are any recent changes in the windows runner which is causing this issue.

(2 week ago) I believe we'd see differences between linux and windows network stability historically, but have not thought to capture and quantify it.

On August 15, a change to disable gradle caching of builds was merged to our repository and we started paying much closer attention to random failures.

(last week) We will definitely check internally on this. However, we would request you to also enable runner diagnostic logging which will give debug output of the run.

(Today) I've triggered a workflow [1] with debug logging enabled per [2], I'll reply after it has failed so it can be investigated.

@peternied
Copy link
Member Author

No failures on the first run [1], rerunning.

@peternied
Copy link
Member Author

Another failure
https://github.com/opensearch-project/security/actions/runs/6312153767/job/17137632339?pr=3388

Error: Exception in thread "main" javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
	at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1701)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1519)
	at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427)
	at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
	at org.gradle.wrapper.Download.downloadInternal(Download.java:129)
	at org.gradle.wrapper.Download.download(Download.java:109)
	at org.gradle.wrapper.Install.forceFetch(Install.java:171)
	at org.gradle.wrapper.Install.fetchDistribution(Install.java:104)
	at org.gradle.wrapper.Install.access$400(Install.java:46)
	at org.gradle.wrapper.Install$1.call(Install.java:81)
	at org.gradle.wrapper.Install$1.call(Install.java:68)
	at org.gradle.wrapper.ExclusiveFileAccessManager.access(ExclusiveFileAccessManager.java:69)
	at org.gradle.wrapper.Install.createDist(Install.java:68)
	at org.gradle.wrapper.WrapperExecutor.execute(WrapperExecutor.java:102)
	at org.gradle.wrapper.GradleWrapperMain.main(GradleWrapperMain.java:66)
	Suppressed: java.net.SocketException: Software caused connection abort: socket write error
		at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
		at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
		at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
		at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81)
		at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:396)
		at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303)
		at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:468)
		... 17 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511)
	... 19 more
Error: Gradle build failed: see console output for details

@peternied
Copy link
Member Author

Looks like there will be a modification to how GitHub runners operate - I'll update when this has completed.

We would like to update that engineering team is working towards fixing it. We will inform you as soon as the fix has been rolled out.

@peternied
Copy link
Member Author

I've looked at our recent run history and I do not see any errors from the windows runner associated issues downloading dependencies via gradle or wget due to network issues.

Note; there was a couple of errors associated with flaky tests, but we will have to continue to iterate on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Flaky Test issue triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. windows
Projects
None yet
Development

No branches or pull requests

2 participants