Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 #5521

Closed
qingboooo opened this issue Oct 12, 2023 · 3 comments

Comments

@qingboooo
Copy link
Contributor

qingboooo commented Oct 12, 2023

Operation: Upgrade version from 5.12.0 to 6.8.1
Environment: Dual stack Openshift, but only IPv6 is reachable
Error:

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list]  for kind: [Service]  with name: [null]  in namespace: [xxx]  failed.

Root cause:

  • The Cluster is dual stack but IPv4 address unreachable
  • DNS resolver in okhttp lookup hostname to real IP addresses (v4 & v6), v4 ahead v6
  • okhttp spent 10 seconds for IP connectivity check for each IP family in order, IPv4 first
  • Fabric8 defines default request timeout to 10 seconds
  • Finally, requests will fail due to connectivity check for IPv4 address spent all 10 seconds
  • The request timeout should be set to 20 seconds at least for IPs connectivity check

Workaround: Prolong the request timeout to 20 seconds or override okhttp version to 5.0 alpha

Refer to:

Proposal: Is it possible to prolong the default request timeout to 20 seconds?

Fabric8 Kubernetes Client version

6.8.1

Steps to reproduce

  1. Find/Setup a dual-stack cluster that IPv4 address unreachable
  2. Run following test code snippet
String content = FileUtils.readFileToString(Paths.get("/home/qingboooo/.kube/config").toFile(), "UTF-8");
Config configFromFile = Config.fromKubeconfig(content);
// configFromFile.setRequestTimeout(20000);
try (KubernetesClient client = new KubernetesClientBuilder().withConfig(configFromFile).build()) {
    // Send any request to the cluster
    List<Service> services = client.services().inNamespace("xxx").list().getItems();
    Log.debug("Total: " + services.size());
}
  1. Timeout exception thrown out
  2. Prolong the request timeout from 10s(default) to 20s (see line 3), exception gone

Expected behavior

Success to list resources from the cluster

Runtime

OpenShift

Kubernetes API Server version

1.23

Environment

Linux

Fabric8 Kubernetes Client Logs

io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list]  for kind: [Service]  with name: [null]  in namespace: [xxx]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:422)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:388)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:92)
    at com.qingboooo.demo.Demo.testKubectlApi(Demo.java:52)
Caused by: java.io.IOException: timeout
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:504)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:420)
    ... 30 more
Caused by: java.io.InterruptedIOException: timeout
    at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398)
    at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360)
    at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209)
    at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Canceled
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
    ... 4 more

Additional context

No response

@qingboooo qingboooo changed the title Request timeout to dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 Request timeout on dual-stack(IPv4 unreachable) cluster after upgrade to 6.8.1 Oct 13, 2023
@manusa
Copy link
Member

manusa commented Oct 17, 2023

From your message I understand that incrementing your config's timeout is working.

What I'm not sure is what you're actually requesting.

If you're requesting to change the default value to 20s, what would be the main reason for this besides your specific needs?
i.e. why can't you just override the default value either how you're doing right now or by leveraging the kubernetes.request.timeout property or the KUBERNETES_REQUEST_TIMEOUT environment variable.

Other than that, it sound like you might be hitting #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters).

In this case, maybe you want to switch to a different HttpClient implementation: Kubernetes Client for Java: How to set up the underlying HTTP client

@manusa manusa added the Waiting on feedback Issues that require feedback from User/Other community members label Oct 17, 2023
@qingboooo
Copy link
Contributor Author

From your message I understand that incrementing your config's timeout is working.

Right, I can solve this issue by increasing the kubernetes.request.timeout to 20 seconds.

What I'm not sure is what you're actually requesting.

Whatever I request, e.g. Pod, Service, Ingress, and so on, HttpClient(okhttp) will establish a connection to the Cluster(IPv4 is there but unreachable, IPv6 available), and the HttpClient(okhttp) will spend 10 seconds(default timeout in okhttp) checking the connectivity of the IPv4 address, which means the request is break in fabric instead of OkHttp, since OkHttp has no time to try IPv6. So, we need at least 20 seconds to cover the total timeout over okhttp

request timeout in fabric8 is 10 seconds -> okhttp needs 10 seconds to check each IP family(20 seconds in total)

Other than that, it sound like you might be hitting #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters).

I truly tried #2632 (Using KubernetesClient with IPv6 based Kubernetes Clusters) out, and it works when I uplift okhttp to 5.0-alpha, benefit from new feature Fast Fallback.

In this case, maybe you want to switch to a different HttpClient implementation: Kubernetes Client for Java: How to set up the underlying HTTP client

Switching HttpClient type would be another solution but still needs additional configuration from the user side.

Copy link

stale bot commented Jan 16, 2024

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@stale stale bot added the status/stale label Jan 16, 2024
@stale stale bot closed this as completed Jan 26, 2024
@manusa manusa removed the Waiting on feedback Issues that require feedback from User/Other community members label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants