KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

coderplay · 2017-08-19T06:39:02Z

Rally version (get with esrally --version):
0.7.0
Invoked command:

esrally --track=geonames --target-hosts=https://elasticsearchhost --pipeline=benchmark-only

Configuration file (located in ~/.rally/rally.ini)):

JVM version:
Oracle JDK 1.8.0_144

OS version:
linux kernel: 4.9.38-16.35.amzn1.x86_64

Description of the problem including expected versus actual behavior:
This bug was introduced by 0.7.0, because I haven't seen any problem when running with0.6.2

Steps to reproduce:

Provide logs (if relevant):

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 243, in receiveMessage
    self.transition_when_all_children_responded(sender, msg, "starting", "nodes_started", self.on_all_nodes_started)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 324, in transition_when_all_children_responded
    transition()
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 423, in on_all_nodes_started
    self.cluster = self.cluster_launcher.start()
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/launcher.py", line 56, in start
    t.attach_to_cluster(c)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/telemetry.py", line 47, in attach_to_cluster
    device.attach_to_cluster(cluster)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/telemetry.py", line 462, in attach_to_cluster
    self.metrics_store.add_meta_info(metrics.MetaInfoScope.node, node_name, "jvm_vendor", node["jvm"]["vm_vendor"])
KeyError: 'vm_vendor'

Describe the feature:

The text was updated successfully, but these errors were encountered:

danielmitterdorfer · 2017-08-22T07:18:32Z

@coderplay Thanks for the report. Can you please tell me which version of Elasticsearch you are benchmarking? (you can just run curl "https://elasticsearchhost:9200/?pretty"). Also, the output of the node info API would be interesting: curl "https://elasticsearchhost:9200/_nodes/jvm?pretty"

coderplay · 2017-08-22T07:42:25Z

It's a 3 physical nodes ES 5.3.2 cluster.

one jvm info of them, removed some sensitive info.

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "...",
  "nodes" : {
...
      "jvm" : {
        "pid" : 18269,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503357600738,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    },
...
      "jvm" : {
        "pid" : 17611,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503344772450,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    },
...
      "jvm" : {
        "pid" : 4931,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503142165784,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    }
  }
}

But from your source code, I saw you are retrieving from _nodes/stats instead of _nodes/jvm. The former one doesn't have vm_vendor in jvm.

danielmitterdorfer · 2017-08-22T07:46:46Z

Thanks for the feedback. It's odd that it fails to extract "vm_vendor" at this point because the field is there. I see that I can reproduce it or make Rally at least more robust if it cannot extract this data.

Does this happen on every benchmark or occasionally?

coderplay · 2017-08-22T07:47:20Z

Everytime. Please see the update of my previous comment.

danielmitterdorfer · 2017-08-22T08:02:22Z

But from your source code, I saw you are retrieving from _nodes/stats instead of _nodes/jvm. The former one doesn't have vm_vendor in jvm.

Rally calls nodes.info(node_id="_all") via the Python client to get this information. The Python client issues a nodes info API call.

But you are right: I just asked for the output of _nodes/jvm here to avoid that you accidentally post sensitive information.

danielmitterdorfer · 2017-08-22T14:23:31Z

I could not reproduce the problem but I made sure Rally will not fail anymore when this happens. This fix will be included in the next release (0.7.1) which I plan to release this week. Thanks once more for reporting this.

danielmitterdorfer · 2017-08-24T08:30:21Z

I've just released the fix with Rally 0.7.1.

danielmitterdorfer added feedback needed An open question blocks progress :Telemetry Telemetry Devices that gather additional metrics bug Something's wrong labels Aug 22, 2017

danielmitterdorfer removed the feedback needed An open question blocks progress label Aug 22, 2017

danielmitterdorfer added this to the 0.7.1 milestone Aug 22, 2017

danielmitterdorfer closed this as completed in a8ec27c Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

coderplay commented Aug 19, 2017 •

edited

Loading

danielmitterdorfer commented Aug 22, 2017 •

edited

Loading

coderplay commented Aug 22, 2017 •

edited

Loading

danielmitterdorfer commented Aug 22, 2017

coderplay commented Aug 22, 2017

danielmitterdorfer commented Aug 22, 2017

danielmitterdorfer commented Aug 22, 2017

danielmitterdorfer commented Aug 24, 2017

KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

Comments

coderplay commented Aug 19, 2017 • edited Loading

danielmitterdorfer commented Aug 22, 2017 • edited Loading

coderplay commented Aug 22, 2017 • edited Loading

danielmitterdorfer commented Aug 22, 2017

coderplay commented Aug 22, 2017

danielmitterdorfer commented Aug 22, 2017

danielmitterdorfer commented Aug 22, 2017

danielmitterdorfer commented Aug 24, 2017

coderplay commented Aug 19, 2017 •

edited

Loading

danielmitterdorfer commented Aug 22, 2017 •

edited

Loading

coderplay commented Aug 22, 2017 •

edited

Loading