Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'vm_vendor' when running benchmark against single host cluster with 0.7.0 #316

Closed
coderplay opened this issue Aug 19, 2017 · 7 comments
Labels
bug Something's wrong :Telemetry Telemetry Devices that gather additional metrics
Milestone

Comments

@coderplay
Copy link

coderplay commented Aug 19, 2017

Rally version (get with esrally --version):
0.7.0
Invoked command:

esrally --track=geonames --target-hosts=https://elasticsearchhost --pipeline=benchmark-only

Configuration file (located in ~/.rally/rally.ini)):

JVM version:
Oracle JDK 1.8.0_144

OS version:
linux kernel: 4.9.38-16.35.amzn1.x86_64

Description of the problem including expected versus actual behavior:
This bug was introduced by 0.7.0, because I haven't seen any problem when running with0.6.2

Steps to reproduce:

Provide logs (if relevant):

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 243, in receiveMessage
    self.transition_when_all_children_responded(sender, msg, "starting", "nodes_started", self.on_all_nodes_started)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 324, in transition_when_all_children_responded
    transition()
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/mechanic.py", line 423, in on_all_nodes_started
    self.cluster = self.cluster_launcher.start()
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/launcher.py", line 56, in start
    t.attach_to_cluster(c)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/telemetry.py", line 47, in attach_to_cluster
    device.attach_to_cluster(cluster)
  File "/usr/local/lib/python3.5/site-packages/esrally/mechanic/telemetry.py", line 462, in attach_to_cluster
    self.metrics_store.add_meta_info(metrics.MetaInfoScope.node, node_name, "jvm_vendor", node["jvm"]["vm_vendor"])
KeyError: 'vm_vendor'

Describe the feature:

@danielmitterdorfer
Copy link
Member

danielmitterdorfer commented Aug 22, 2017

@coderplay Thanks for the report. Can you please tell me which version of Elasticsearch you are benchmarking? (you can just run curl "https://elasticsearchhost:9200/?pretty"). Also, the output of the node info API would be interesting: curl "https://elasticsearchhost:9200/_nodes/jvm?pretty"

@danielmitterdorfer danielmitterdorfer added feedback needed An open question blocks progress :Telemetry Telemetry Devices that gather additional metrics bug Something's wrong labels Aug 22, 2017
@coderplay
Copy link
Author

coderplay commented Aug 22, 2017

It's a 3 physical nodes ES 5.3.2 cluster.

one jvm info of them, removed some sensitive info.

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "...",
  "nodes" : {
...
      "jvm" : {
        "pid" : 18269,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503357600738,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    },
...
      "jvm" : {
        "pid" : 17611,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503344772450,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    },
...
      "jvm" : {
        "pid" : 4931,
        "version" : "1.8.0_112",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "25.112-b15",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1503142165784,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4277534720,
          "non_heap_init_in_bytes" : 2555904,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 4277534720
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true"
      }
    }
  }
}

But from your source code, I saw you are retrieving from _nodes/stats instead of _nodes/jvm. The former one doesn't have vm_vendor in jvm.

@danielmitterdorfer
Copy link
Member

Thanks for the feedback. It's odd that it fails to extract "vm_vendor" at this point because the field is there. I see that I can reproduce it or make Rally at least more robust if it cannot extract this data.

Does this happen on every benchmark or occasionally?

@danielmitterdorfer danielmitterdorfer removed the feedback needed An open question blocks progress label Aug 22, 2017
@danielmitterdorfer danielmitterdorfer added this to the 0.7.1 milestone Aug 22, 2017
@coderplay
Copy link
Author

Everytime. Please see the update of my previous comment.

@danielmitterdorfer
Copy link
Member

But from your source code, I saw you are retrieving from _nodes/stats instead of _nodes/jvm. The former one doesn't have vm_vendor in jvm.

Rally calls nodes.info(node_id="_all") via the Python client to get this information. The Python client issues a nodes info API call.

But you are right: I just asked for the output of _nodes/jvm here to avoid that you accidentally post sensitive information.

@danielmitterdorfer
Copy link
Member

I could not reproduce the problem but I made sure Rally will not fail anymore when this happens. This fix will be included in the next release (0.7.1) which I plan to release this week. Thanks once more for reporting this.

@danielmitterdorfer
Copy link
Member

I've just released the fix with Rally 0.7.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

No branches or pull requests

2 participants