Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eureka Server and Client incompatible between Angel (1.0.x) and Brixton (1.1.x) #978

Closed
shanman190 opened this issue Apr 19, 2016 · 22 comments

Comments

@shanman190
Copy link

shanman190 commented Apr 19, 2016

After doing a fair bit of investigation with this I have found that the Eureka server and client components have some odd functionality between the two versions. This issue is mostly just for others running into this same problem during the upgrade path. If a solution can be found then that would be great, but there seems to be some documentation that is missing if this is what is going to happen going forward.

So there are 4 scenarios in total: (Angel.SR6 and Brixton.RC2 are the versions being used below)

  1. Server (Brixton) and Client (Brixton)
  2. Server (Brixton) and Client (Angel)
  3. Server (Angel) and Client (Brixton)
  4. Server (Angel) and Client (Angel)

Numbers 1 and 4 work as expected. However, numbers 2 and 3 have some differing results.

With 2, the client registers, but continues to register time after time. I found that this is because the instanceId doesn't match what is stored in eureka server. With this setup, I don't have access to the hostname via the configuration properties, so I can't get the instanceId populated with the same value that eureka server contains without doing some code magic.

With 3, the client is able to register on the first lookup, but if the instance can't be found in eureka server then it never is able to re-register and get back to the up state. One thing to note on this flavor as well is that I had to make the eureka.instance.metadata.instanceId property look exactly like the second half of the instanceId created dynamically by the org.springframework.cloud.commons.util.IdUtils class. So the resulting value ended up being: ${spring.application.name}:${server.port} in my test case and the instanceId field was ${spring.cloud.client.hostname}:${spring.application.name}:${server.port}. Upon digging further into this version of the issues I found that in the com.netflix.discovery.shared.transport.jersey.AbstractJerseyEurekaHttpClient#sendHeartBeat method that it was updated to see if the response had a body or not, since the response is a 404 the body doesn't match what it's expecting to get back and then throws a mapping exception. This path is what ends up causing the failure to re-register.

@shanman190 shanman190 changed the title Eureka Server and Client incompatible between Angel (1.0.x) and Brixton (1.1.0) Eureka Server and Client incompatible between Angel (1.0.x) and Brixton (1.1.x) Apr 19, 2016
@dsyer dsyer added this to the 1.1.0 milestone Apr 26, 2016
@shanman190
Copy link
Author

So did some more poking around after the release of Brixton and I've discovered a couple more things. Also now providing the configuration that I'm using for testing this error.

https://github.com/shanman190/eureka-error

There are two branches in the above project that demonstrate the middle two issues.

So the first new thing that I've found is that at least in my setup I have the following facts:

Ubuntu 16.04 LTS
Oracle Java 8u91
Network: DHCP

When running in configuration 2 (server Brixton, client Angel), the instanceId is in the same configuration as what Spencer recommends in pr-608 from the customers service example. The thing that I notice here is that when the client calls over to the server it's trying to query for instanceId = localhost:eureka-client:8080 which doesn't exist because in the server. Instead, in the server it can be queried by instanceId = eureka-client:8080. From what I've search through in the Angel codebase there isn't a way to get the hostname onto the instanceId field at the query time on the eureka server side in order to get the values to collide correctly or to remove the localhost prefix from the instanceId property on the client side, so ultimately what the result is is that the client at every renew will perform a re-register of the lease.

When running in configuration 3 (server Angel, client Brixton), Initially, on startup, we register with an instanceId = eureka-client:8080, however when it came time to renew in my setup the lookup is sending an instanceId value of IP:eureka-client:8080 and the server is expecting to see an instanceId value of localhost:eureka-client:8080. I can force the IP to the hostname by adding an entry in my hosts file with the DHCP address, but that would only work until the IP was renewed. Once I do that though an additional issue rises up and that is that in this configuration the client registers on the startup just fine with an the hostname of localhost and an instanceId value of eureka-client:8080, however that doesn't match the same value as the instanceId when it goes to renew the lease as it's the IP version now. Once the two are consistent then the registration and renewal happens correctly. The last bit of trouble that I ran into was that if the eureka server is restarted at this point, the Brixton client won't be able to re-register without a restart of the application. This I found was because of the json response that spring boot returns on an error response.

In summary,
Config 2:

  • instanceId for the instance doesn't match what eureka server expects causing continuous re-registration

Config 3:

  • instanceId value changes during startup when using DHCP and doesn't collide for renewal, but because of the json return value is not able to re-register.
  • because the client is unable to re-register it is also unable to re-register after eureka server has been restarted.

@rozhok
Copy link

rozhok commented May 13, 2016

+1 for this, suffering from incompatibility between 1.0.3 server and 1.1.0 client.

@spencergibb spencergibb removed this from the 1.1.0 milestone May 13, 2016
@spencergibb
Copy link
Member

spencergibb commented May 13, 2016

afe8f4d introduces a fix for 3) by setting the following:

eureka.instance.instanceId=${spring.cloud.client.hostname}:client1
eureka.instance.metadataMap.instanceId=${eureka.instance.instanceId}

client1 is arbitrary, could be port, could be an id from platform, random number etc...

Angel Server expects the value of eureka.instance.hostname to prefix instanceId. The angel client does it by default. @dsyer why did we do that? Was it for running on a single machine?

@shanman190
Copy link
Author

shanman190 commented May 13, 2016

@spencergibb That part definitely looks good to fix the instanceId problem. Is there a way that you know of to easily disable the 404 json with the Angel eureka server? That would fix the second part of 3 preventing the instance from re-registering.

@spencergibb
Copy link
Member

@shanman190 I'm not seeing a 404 for 3 with my fix.

@shanman190
Copy link
Author

shanman190 commented May 14, 2016

@spencergibb The 404 will come from the eureka server after performing a restart of the server application and the client sending it's next heartbeat. In my example, I'm running a single node, so there isn't replication. The problem originates from a change to the Eureka DiscoveryClient.HeartbeatThread (1.1.147) and AbstractJerseyEurekaHttpClient (1.4.6) (this is where the heartbeat moved to) where it was looking for a status code of 404 coming back from Eureka Server and now it's looking for the response body having content.

DiscoveryCleint.HeartbeatThread:
https://github.com/Netflix/eureka/blob/v1.1.147/eureka-client/src/main/java/com/netflix/discovery/DiscoveryClient.java#L1591

AbstractJerseyEurekaHttpClient:
https://github.com/Netflix/eureka/blob/v1.4.6/eureka-client/src/main/java/com/netflix/discovery/shared/transport/jersey/AbstractJerseyEurekaHttpClient.java#L104

@shanman190
Copy link
Author

So I realize that my previous comment isn't very clear, so I'm going to add some more information in the hope that my previous comment becomes a little bit more clear.

Steps:

  1. Start server
  2. Start client and wait until registration is complete
  3. Restart server, client will no longer be registered, but the client thinks that it is
  4. Client will send heartbeat
  5. Server sends 404 with Spring Boot json response body (here's where things start to go wrong)
  6. AbstractJerseyEurekaHttpClient begins capturing the values of the response because the response has a json body it also tries to deserialize that body into an InstanceInfo object causing an exception to be thrown up the stack.

If there was a way to disable the json response body then the exception caused by the incorrect deserialization would be skipped and the Eureka response would return back to the renew method of the DiscoveryClient and it then checks for the 404 response code and re-registers the client with the server.

@dsyer
Copy link
Contributor

dsyer commented May 19, 2016

Thanks for all the analysis. Here's my summary:

  • We should give a low priority to scenario 3 (Brixton client, Angel server) because upgrading the server should always be easy.
  • Commit a0c1b4c should fix scenario 2 (Angel client, Brixton server).
  • The 404 problem in Eureka discovery client cannot handle 404 responses with a body #1033 is actually a barrier for the Brixton client, irrespective of the server anyway, so we should fix that urgently as well (in the server if necessary). It might even fix scenario 3.

@dsyer
Copy link
Contributor

dsyer commented May 19, 2016

Update: #1033 is only a problem with the Angel server, so it's not a high priority. As far as I can tell we have it all covered in the issue. There might still be some issues with the client thrashing its registration (see comments with log snippets in #1013), so I'm going to leave this open for a bit in case we can work out what that means.

@ouaibsky
Copy link

ouaibsky commented May 19, 2016

@dsyer In which branch live Commit a0c1b4c ?
This scenario (2) is very important because we cannot oblige all eureka clients to release themselves if we upgrade eureka server to brixton.

dsyer added a commit that referenced this issue May 19, 2016
The Angel client has an idiosyncratic way of calculating an instance
id, and the Brixton client aligned with that already, but the
Brixton server did not. This change should make Brixton Eureka
Servers work with Angel clients.

See gh-978
@dsyer
Copy link
Contributor

dsyer commented May 19, 2016

It's on master (now).

@shanman190
Copy link
Author

I've tested the issue as well using the latest Brixton snapshot and can confirm that upgrading the eureka server to the Brixton snapshot allows both eureka clients running Angel and Brixton to properly register and re-register after the server has been restarted. This most definitely fixes scenario 2 and gives users an upgrade path from Angel to Brixton when utilizing Eureka service discovery.

@dsyer You're welcome. Glad to help and thank you and @spencergibb for all the work that went into correcting this issue!

@ouaibsky
Copy link

Does this fix will be part of release 1.1.1 ?
Thx
Christophe

@spencergibb
Copy link
Member

@ouaibsky yes, @dsyer's fix for item 3 a0c1b4c would be part of 1.1.1 (Brixton.SR1). My fix for item 2 (#978 (comment)) would be part of 1.0.x and Angel.SR7.

@ouaibsky
Copy link

Thx a lot
Christophe

@shanman190
Copy link
Author

Is there any thought on when the Brixton.SR1 release is going to be?

@dsyer
Copy link
Contributor

dsyer commented Jun 3, 2016

Probably next week (still waiting for a couple of bug fixes).

@ouaibsky
Copy link

What is the process to push releases on maven central ?
I can see 1.1.1 and 1.1.2 in spring repo: https://repo.spring.io/release/org/springframework/cloud/spring-cloud-netflix-eureka-server/,

but nothing in central: http://search.maven.org/#search|gav|1|g%3A%22org.springframework.cloud%22%20AND%20a%3A%22spring-cloud-netflix-eureka-server%22

Christophe

@dsyer
Copy link
Contributor

dsyer commented Jun 13, 2016

The process isn't finished yet.

@nissel
Copy link

nissel commented Jun 15, 2016

Brixton SR1 fixed the problem for me. Good job, thanks.

@dsyer dsyer closed this as completed Jun 15, 2016
@copa2
Copy link

copa2 commented Jun 16, 2016

In my opinion this works only when in the Angel client has no custom 'eureka.instance.metadataMap.instanceId' set.

In my case the Angel client has:

eureka:  
  instance:
    metadataMap:
      instanceId: ${spring.application.name}:${spring.application.instance_id:${random.value}}

In this case the client tries to register itself all 30 sec:

2016-06-16 17:37:39.620  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6 - Re-registering apps/XXX-EUREKA-CLIENT-ANGEL
2016-06-16 17:37:39.620  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6: registering service...
2016-06-16 17:37:39.621  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6 - registration status: 204
2016-06-16 17:38:09.661  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6 - Re-registering apps/XXX-EUREKA-CLIENT-ANGEL
2016-06-16 17:38:09.661  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6: registering service...
2016-06-16 17:38:09.662  INFO 11044 --- [pool-2-thread-1] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_XXX-EUREKA-CLIENT-ANGEL/U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6 - registration status: 204

Brixton.SR1 Eureka Server logs:

2016-06-16 17:37:39.620  WARN 11392 --- [nio-8761-exec-5] c.n.e.registry.AbstractInstanceRegistry  : DS: Registry: lease doesn't exist, registering resource: XXX-EUREKA-CLIENT-ANGEL - U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6
2016-06-16 17:37:39.620  WARN 11392 --- [nio-8761-exec-5] c.n.eureka.resources.InstanceResource    : Not Found (Renew): XXX-EUREKA-CLIENT-ANGEL - U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6
2016-06-16 17:37:39.621  INFO 11392 --- [nio-8761-exec-7] c.n.e.registry.AbstractInstanceRegistry  : Registered instance XXX-EUREKA-CLIENT-ANGEL/U245496 with status UP (replication=false)
2016-06-16 17:38:09.661  WARN 11392 --- [nio-8761-exec-6] c.n.e.registry.AbstractInstanceRegistry  : DS: Registry: lease doesn't exist, registering resource: XXX-EUREKA-CLIENT-ANGEL - U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6
2016-06-16 17:38:09.661  WARN 11392 --- [nio-8761-exec-6] c.n.eureka.resources.InstanceResource    : Not Found (Renew): XXX-EUREKA-CLIENT-ANGEL - U245496:xxx-eureka-client-angel:983d24591515138d72191e85eb0479a6
2016-06-16 17:38:09.662  INFO 11392 --- [nio-8761-exec-9] c.n.e.registry.AbstractInstanceRegistry  : Registered instance XXX-EUREKA-CLIENT-ANGEL/U245496 with status UP (replication=false)

com.netflix.eureka.registry.AbstractInstanceRegistry.register will register with hostname(instanceInfo.getId()) when no instanceId is set(never set in Angel clients).

@dsyer
Copy link
Contributor

dsyer commented Jun 16, 2016

Thanks for the logs. Can you please open a new issue specifically about this scenario?

spencergibb added a commit that referenced this issue Jun 17, 2016
Updates so CloudJacksonJson extension is properly recognized by Eureka
Server as LegacyJacksonJson.

see gh-978
fixes gh-1111
spencergibb added a commit that referenced this issue Jun 17, 2016
Updates so CloudJacksonJson extension is properly recognized by Eureka
Server as LegacyJacksonJson.

see gh-978
fixes gh-1111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants