Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

qingboooo · 2023-10-12T13:25:41Z

Operation: Upgrade version from 5.12.0 to 6.8.1
Environment: Dual stack Openshift, but only IPv6 is reachable
Error:

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list]  for kind: [Service]  with name: [null]  in namespace: [xxx]  failed.

Root cause:

The Cluster is dual stack but IPv4 address unreachable
DNS resolver in okhttp lookup hostname to real IP addresses (v4 & v6), v4 ahead v6
okhttp spent 10 seconds for IP connectivity check for each IP family in order, IPv4 first
Fabric8 defines default request timeout to 10 seconds
Finally, requests will fail due to connectivity check for IPv4 address spent all 10 seconds
The request timeout should be set to 20 seconds at least for IPs connectivity check

Workaround: Prolong the request timeout to 20 seconds or override okhttp version to 5.0 alpha

Refer to:

Proposal: Is it possible to prolong the default request timeout to 20 seconds?

Fabric8 Kubernetes Client version

6.8.1

Steps to reproduce

Find/Setup a dual-stack cluster that IPv4 address unreachable
Run following test code snippet

String content = FileUtils.readFileToString(Paths.get("/home/qingboooo/.kube/config").toFile(), "UTF-8");
Config configFromFile = Config.fromKubeconfig(content);
// configFromFile.setRequestTimeout(20000);
try (KubernetesClient client = new KubernetesClientBuilder().withConfig(configFromFile).build()) {
    // Send any request to the cluster
    List<Service> services = client.services().inNamespace("xxx").list().getItems();
    Log.debug("Total: " + services.size());
}

Timeout exception thrown out
Prolong the request timeout from 10s(default) to 20s (see line 3), exception gone

Expected behavior

Success to list resources from the cluster

Runtime

OpenShift

Kubernetes API Server version

1.23

Environment

Linux

Fabric8 Kubernetes Client Logs

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list]  for kind: [Service]  with name: [null]  in namespace: [xxx]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:422)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:388)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:92)
    at com.qingboooo.demo.Demo.testKubectlApi(Demo.java:52)
Caused by: java.io.IOException: timeout
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:504)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:420)
    ... 30 more
Caused by: java.io.InterruptedIOException: timeout
    at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398)
    at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360)
    at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209)
    at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Canceled
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
    ... 4 more

Additional context

No response

The text was updated successfully, but these errors were encountered:

manusa · 2023-10-17T04:25:48Z

From your message I understand that incrementing your config's timeout is working.

What I'm not sure is what you're actually requesting.

If you're requesting to change the default value to 20s, what would be the main reason for this besides your specific needs?
i.e. why can't you just override the default value either how you're doing right now or by leveraging the kubernetes.request.timeout property or the KUBERNETES_REQUEST_TIMEOUT environment variable.

Other than that, it sound like you might be hitting #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters).

In this case, maybe you want to switch to a different HttpClient implementation: Kubernetes Client for Java: How to set up the underlying HTTP client

qingboooo · 2023-10-17T08:49:17Z

From your message I understand that incrementing your config's timeout is working.

Right, I can solve this issue by increasing the kubernetes.request.timeout to 20 seconds.

What I'm not sure is what you're actually requesting.

Whatever I request, e.g. Pod, Service, Ingress, and so on, HttpClient(okhttp) will establish a connection to the Cluster(IPv4 is there but unreachable, IPv6 available), and the HttpClient(okhttp) will spend 10 seconds(default timeout in okhttp) checking the connectivity of the IPv4 address, which means the request is break in fabric instead of OkHttp, since OkHttp has no time to try IPv6. So, we need at least 20 seconds to cover the total timeout over okhttp

request timeout in fabric8 is 10 seconds -> okhttp needs 10 seconds to check each IP family(20 seconds in total)

Other than that, it sound like you might be hitting #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters).

I truly tried #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters) out, and it works when I uplift okhttp to 5.0-alpha, benefit from new feature Fast Fallback.

In this case, maybe you want to switch to a different HttpClient implementation: Kubernetes Client for Java: How to set up the underlying HTTP client

Switching HttpClient type would be another solution but still needs additional configuration from the user side.

stale · 2024-01-16T00:51:46Z

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

qingboooo changed the title ~~Request timeout to dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1~~ Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 Oct 13, 2023

manusa added the Waiting on feedback Issues that require feedback from User/Other community members label Oct 17, 2023

stale bot added the status/stale label Jan 16, 2024

stale bot closed this as completed Jan 26, 2024

manusa removed the Waiting on feedback Issues that require feedback from User/Other community members label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

qingboooo commented Oct 12, 2023 •

edited

Loading

manusa commented Oct 17, 2023

qingboooo commented Oct 17, 2023

stale bot commented Jan 16, 2024

Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

Comments

qingboooo commented Oct 12, 2023 • edited Loading

Fabric8 Kubernetes Client version

Steps to reproduce

Expected behavior

Runtime

Kubernetes API Server version

Environment

Fabric8 Kubernetes Client Logs

Additional context

manusa commented Oct 17, 2023

qingboooo commented Oct 17, 2023

stale bot commented Jan 16, 2024

qingboooo commented Oct 12, 2023 •

edited

Loading