Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming local cache may be ignored in a rare scenario #12644

Open
nkorange opened this issue Sep 13, 2024 · 7 comments
Open

Naming local cache may be ignored in a rare scenario #12644

nkorange opened this issue Sep 13, 2024 · 7 comments
Labels
kind/discussion Category issues related to discussion

Comments

@nkorange
Copy link
Collaborator

Describe the bug

Naming local cache may be ignored in a rare scenario. So even when local cache is not empty, user's invocation would still get exception.

Expected behavior

If local cache is not empty, Nacos client should never throw exception.

Actually behavior

In a rare case, when local cache is not empty, user's invocation would get exception.

How to Reproduce

It's hard to reproduce but it did happen in our production environment. Here is the related logic:

in the method at getServiceInfoBySubscribe:

  1. first it tried to get service info from local cache;
  2. then it will check if local cache is null or client is not subscribed:
image 3. if yes, it will try to subscribe from remote Nacos server.

Usually it will work, because whenever clientProxy.isSubscribed(...) returns false, it means the Nacos client has just reconnected to Nacos server and closed the old connection:
image

As new connection is ready, so the subscribe request would succeed.

But if the new connection is down immediately again so the subscribe request failed, then in the method getServiceInfoBySubscribe, an exception will be thrown.

Desktop (please complete the following information):

  • OS: [e.g. Centos]
  • Version: nacos-server 2.1.2, nacos-client 2.1.2
  • Module: naming
  • SDK: spring-cloud-alibaba-nacos 2021.1

Additional context

I suggest to add a protection logic in the method getServiceInfoBySubscribe, so that whenever the local cache is not empty, the remote request error or any other exception will be ignored.

@KomachiSion
Copy link
Collaborator

KomachiSion commented Sep 18, 2024

When client call subscribe api to do sub, it should throw exception when connection not ready and subscribe failed, which notify users there is some exception for connection and should retry or do other operation.

If call getAllInstances with subscribe=true, It can be discuss whether it should throw exception when cache exist.

In my option, when subscribe=true, connection not ready should throw exception to notify, because users think getAllInstances is get instances from server, not only cache.

@KomachiSion KomachiSion added the kind/discussion Category issues related to discussion label Sep 18, 2024
@nkorange
Copy link
Collaborator Author

nkorange commented Sep 18, 2024

In current implementation, subscribe=true and Nacos server disconnected would not throw exception. Only when subscribe=true and reconnect succeed and re-subscribe failed would the client throw exception.

@KomachiSion
Copy link
Collaborator

In current implementation, subscribe=true and Nacos server disconnected would not throw exception. Only when subscribe=true and reconnect succeed and re-subscribe failed would the client throw exception.

Why subscribe=true and Nacos server disconnected would not throw exception? I think it also throw exception.

@nkorange
Copy link
Collaborator Author

nkorange commented Sep 27, 2024

I have tested it.

  1. Start local Nacos server.
  2. Register service test.1 to local Nacos server.
  3. Run a consumer with the following code:
public static void main(String[] args) throws Exception {
        NamingService namingService = NamingFactory.createNamingService("127.0.0.1:8848");
        while (true) {
            System.out.println(namingService.getAllInstances("test.1", true));
            Thread.sleep(5000L);
        }
    }
  1. shutdown Nacos server.

The consumer code can still get the instance list of test.1

@nkorange
Copy link
Collaborator Author

I suggest to change the method getServiceInfoBySusbcribe to the following logic:

private ServiceInfo getServiceInfoBySubscribe(String serviceName, String groupName, String clusterString,
            boolean subscribe) throws NacosException {
        ServiceInfo serviceInfo;
        if (subscribe || !EnvUtil.isDirectQueryEnabled()) {
            serviceInfo = serviceInfoHolder.getServiceInfo(serviceName, groupName, clusterString);
            if (null == serviceInfo || !clientProxy.isSubscribed(serviceName, groupName, clusterString)) {
                try {
                    serviceInfo = clientProxy.subscribe(serviceName, groupName, clusterString);
                } catch (NacosException ne) {
                    NAMING_LOGGER.warn("getServiceInfoBySubscribe subscribe failed, {}", serviceName);
                    // 只有订阅失败,且本地缓存也为空的情况下,才抛异常:
                    if (serviceInfo == null) {
                        throw ne;
                    }
                }
            }
        } else {
            serviceInfo = clientProxy.queryInstancesOfService(serviceName, groupName, clusterString, 0, false);
        }
        return serviceInfo;
    }

@KomachiSion
Copy link
Collaborator

I see, it might be when first time call clientProxy.subscribe, clientProxy.isSubscribed will return true so that loop call will not entry condition.

And when first call while starting, it must be return false and entry condition to call clientProxy.subscribe and throw exception by connection loss.

I think your changes is ok. just do some enhance for Method complexity, such as extract to a new method to cover the logic.

@BDL-LTD
Copy link

BDL-LTD commented Nov 1, 2024

Describe the bug

Naming local cache may be ignored in a rare scenario. So even when local cache is not empty, user's invocation would still get exception.

Expected behavior

If local cache is not empty, Nacos client should never throw exception.

Actually behavior

In a rare case, when local cache is not empty, user's invocation would get exception.

How to Reproduce

It's hard to reproduce but it did happen in our production environment. Here is the related logic:

in the method at [email protected]:

  1. first it tried to get service info from local cache;
  2. then it will check if local cache is null or client is not subscribed:

image 3. if yes, it will try to subscribe from remote Nacos server.
Usually it will work, because whenever clientProxy.isSubscribed(...) returns false, it means the Nacos client has just reconnected to Nacos server and closed the old connection: image

As new connection is ready, so the subscribe request would succeed.

But if the new connection is down immediately again so the subscribe request failed, then in the method getServiceInfoBySubscribe, an exception will be thrown.

Desktop (please complete the following information):

  • OS: [MR BLAKE DAMIAN LING]
  • Version: nacos-server 2.1.2, nacos-client 2.1.2
  • Module: naming
  • SDK: spring-cloud-alibaba-nacos 2021.1

Additional context

I suggest to add a protection logic in the method getServiceInfoBySubscribe, so that whenever the local cache is not empty, the remote request error or any other exception will be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Category issues related to discussion
Projects
None yet
Development

No branches or pull requests

3 participants