NullPointerException when multiple zones are used #16967

ghorkov · 2016-03-05T04:44:27Z

Elasticsearch version: 2.1.2
JVM version:1.8.0_74
OS version:Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
GCE Discovery is not working when multiple zones are used. I followed the steps here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/cloud-gce-usage-discovery-zones.html
I have tested my configuration by changing the configuration file to use 1 zone and GCE discovery works correctly and the nodes can communicate correctly. As soon as I add a second zone the nodes stop communicating to each other.

I was expecting to configure all nodes to communicate with each other across all Google Cloud zones. If I need a new node in a different zone I could clone my existing instance template and move it to the new zone and communication between nodes will happen automatically across all zones.

Steps to reproduce:

Install Elastic 2.1.2 + GCE plugin for 2 Linux instances in the Google Cloud
Configure one of the instances to use multiple zones

Provide logs (if relevant):

[2016-03-04 13:25:35,730][WARN ][discovery.gce ] [Firebrand] Exception caught during discovery java.lang.NullPointerException : null
[2016-03-04 13:25:35,732][TRACE][discovery.gce ] [Firebrand] Exception caught during discovery
java.lang.NullPointerException
at com.google.common.collect.Iterables$3.transform(Iterables.java:512)
at com.google.common.collect.Iterables$3.transform(Iterables.java:509)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:548)
at org.elasticsearch.common.util.CollectionUtils.iterableAsArrayList(CollectionUtils.java:390)
at org.elasticsearch.cloud.gce.GceComputeServiceImpl.instances(GceComputeServiceImpl.java:97)
at org.elasticsearch.discovery.gce.GceUnicastHostsProvider.buildDynamicNodes(GceUnicastHostsProvider.java:123)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:879)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:335)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$5000(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-04 13:25:35,743][DEBUG][discovery.gce ] [Firebrand] 0 node(s) added

My configuration yml file:

cluster.name: elastic
network.host: 0.0.0.0
http.port: 9201
http.cors.enabled : true
http.cors.allow-origin : /.*/
transport.tcp.port: 9301
cloud:
  gce:
      project_id: app
      zone: ["asia-east1-a", "us-central1-a"]
discovery:
      type: gce
shield.enabled: false
readonlyrest:
    enable: true
    response_if_req_forbidden: Not found
    access_control_rules:

    - name: Accept only requests with api keys
      type: allow
      api_keys: [XXX]
      methods: [GET,POST,PUT,DELETE,OPTIONS]

By the way: I've also configured the metadata es_port=9301

GCE discovery only works if the zone property is changed to one of the following:

zone: ["asia-east1-a"] 
zone: asia-east1-a

The text was updated successfully, but these errors were encountered:

dadoonet · 2016-03-05T10:58:49Z

I looked at the source code and I think that this could happen if you have absolutely no instance running in one of the zones you mentioned. Is that your case?
If I'm right, that means we need to catch this case properly instead of sending a NPE.

Could you confirm that please?

For the record, we have a test which tries settings with 2 zones: https://github.com/elastic/elasticsearch/blob/2.1/plugins/cloud-gce/src/test/java/org/elasticsearch/discovery/gce/GceDiscoverySettingsTests.java#L72.

ghorkov · 2016-03-05T22:08:27Z

Thank you dadoonet,

That is correct at the moment I don't have any instances running in the second zone. This is my current setup:

zone: asia-east1-a has node1 & node2
zone: us-central1-a doesn't have any active nodes

The problem is that when a second zone is added to the configuration file the communication between the nodes is terminated. I'm using autoscaling so depending on traffic I may need a new node on a different zone or if traffic is low the nodes in a certain zone can be shutdown leaving that zone empty

dadoonet · 2016-03-06T16:03:32Z

Thanks for confirming. Definitely something we need to fix.

When GCE region is empty we get back from the API something like: ``` { "id": "dummy" } ``` instead of: ``` { "id": "dummy", "items":[ ] } ``` This generates a NPE when we aggregate all the lists into a single one. Closes elastic#16967.

dadoonet · 2016-06-30T09:18:44Z

@ghorkov I was able to reproduce it and came with fixes for 5.x and 2.x versions.

ghorkov mentioned this issue Mar 5, 2016

NullPointerException when multiple zones are used elastic/elasticsearch-cloud-gce#57

Closed

clintongormley added :Plugin Cloud GCE >bug help wanted adoptme labels Mar 5, 2016

dadoonet self-assigned this Mar 5, 2016

dadoonet removed the help wanted adoptme label Mar 5, 2016

This was referenced Jun 30, 2016

Fix NPE when GCE region is empty #19175

Merged

Fix NPE when GCE region is empty #19176

Merged

dadoonet closed this as completed in #19176 Jun 30, 2016

clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Cloud GCE labels Feb 14, 2018

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException when multiple zones are used #16967

NullPointerException when multiple zones are used #16967

ghorkov commented Mar 5, 2016 •

edited by dadoonet

Loading

dadoonet commented Mar 5, 2016

ghorkov commented Mar 5, 2016

dadoonet commented Mar 6, 2016

dadoonet commented Jun 30, 2016

NullPointerException when multiple zones are used #16967

NullPointerException when multiple zones are used #16967

Comments

ghorkov commented Mar 5, 2016 • edited by dadoonet Loading

dadoonet commented Mar 5, 2016

ghorkov commented Mar 5, 2016

dadoonet commented Mar 6, 2016

dadoonet commented Jun 30, 2016

ghorkov commented Mar 5, 2016 •

edited by dadoonet

Loading