Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elasticsearch] Restricts 1.x specific stats to v1.x #2232

Merged
merged 3 commits into from
Apr 8, 2016

Conversation

bdharrington7
Copy link
Contributor

Some of the stats I introduced for 1.x are removed in 2.x. 😢
Resolves #2186

@@ -393,6 +396,9 @@ def _define_params(self, version, cluster_stats):
if version >= [1, 0, 0]:
stats_metrics.update(self.ADDITIONAL_METRICS_POST_1_0_0)

if version >= [1, 0, 0] && version < [2, 0, 0]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and instead of && in python ;)

@olivielpeau olivielpeau self-assigned this Jan 29, 2016
@olivielpeau
Copy link
Member

Thanks for taking care of this @bdharrington7!

Only found one small thing to fix, we'll merge once it's fixed :)

@bdharrington7
Copy link
Contributor Author

Thanks @olivielpeau! Could we upload the new versions of Elasticsearch as well? 2.0.2 and 2.1.1

@olivielpeau
Copy link
Member

Yes, makes a lot of sense, I've just uploaded 2.0.2 and 2.1.1 to our mirror, could you add these versions in .travis.yml ?

Had a quick look at our build setup code, it should still work with ES 2.x.

Thanks!

@bdharrington7 bdharrington7 force-pushed the lithium/elastic_stats2x branch from b76194a to 02e0698 Compare January 29, 2016 17:10
@irabinovitch
Copy link
Contributor

@bdharrington7 Thanks again for helping to improve the ES check. When you have a moment could you add the new versions to your travis.yml file?

@bdharrington7
Copy link
Contributor Author

1.7.5 and 2.2?

-Brian

On Feb 5, 2016, at 12:27 PM, Ilan Rabinovitch [email protected] wrote:

@bdharrington7 Thanks again for helping to improve the ES check. When you have a moment could you add the new versions to your travis.yml file?


Reply to this email directly or view it on GitHub.

@olivielpeau
Copy link
Member

@bdharrington7 Thanks for adding the 2.0 and 2.1 versions, I think that they make the tests fail because the default hostname that ES assigns by default to nodes has changed in 2.x (at least that's where the current failure comes from IMO, other issues may show up once this one is fixed).

I'm looking at how we could fix the tests.

@bdharrington7
Copy link
Contributor Author

By default, ES 2.x nodes bind to 127.0.0.1, does the ci tests not spin up a
node on the local machine?
On Mon, Feb 8, 2016 at 2:46 PM Olivier Vielpeau [email protected]
wrote:

@bdharrington7 https://github.com/bdharrington7 Thanks for adding the
2.0 and 2.1 versions, I think that they make the tests fail because the
default hostname that ES assigns by default to nodes has changed in 2.x
(at least that's where the current failure comes from IMO, other issues may
show up once this one is fixed).

I'm looking at how we could fix the tests.


Reply to this email directly or view it on GitHub
#2232 (comment).

@olivielpeau
Copy link
Member

@bdharrington7 It does. But our test expects the metrics to be sent with a hostname that has the value of socket.gethostname() (when the check is configured with cluster_stats: true), which is not 127.0.0.1 in our CI environment.

@olivielpeau
Copy link
Member

@bdharrington7 so a simple way to fix the test would be to replace our test on these lines:

(socket.gethostname(), default_tags)
and
and hostname == socket.gethostname()):

so that when we're on ES 2.x and with cluster_stats: true we change the expected hostname to 127.0.0.1.
I've applied this approach in this commit: e7ae39b , if it looks good to you could you include it in your PR?

@bdharrington7
Copy link
Contributor Author

looks to have passed but some tests flaked out...

@olivielpeau
Copy link
Member

Thanks, I've re-run the tests that flaked out, all are passing now!

One last thing that we should think about now is whether we need to adapt the check to the new behavior of ES on the host name that it returns for every node on the /_nodes/stats endpoint. Given that the check uses that value to identify which node each metric comes from (when the cluster_stats option is enabled), we have to be sure that the value of host makes sense for each node that's hit.

@bdharrington7 Do you happen to know if there's a new config option in ES that allows setting the node's host (as returned by that endpoint), or if there's a new default for that value?

@bdharrington7
Copy link
Contributor Author

Are you talking about the value at nodes.<hash>.name or nodes.<hash>.host? You can set that name in the config: node.name: some_name or node.name: ${HOSTNAME} although I don't know how that would work in your CI

@olivielpeau
Copy link
Member

@bdharrington7: I'm talking about the value at nodes.<hash>.host.

Full explanation: when the check is configured to query an external cluster (cluster_stats set to true), the hostname of the metrics are overridden with the value of nodes.<hash>.host (so, instead of being tagged with the host of the agent, these metrics are tagged with the value of host that ES returns).

This approach doesn't work anymore with ES 2.x as the node that's queried returns 127.0.0.1 for its own value of nodes.<hash>.host instead of returning its "actual" hostname (as ES 1.x was apparently returning). So, with ES 2.x and with cluster_stats enabled, the metrics that apply to the node that's queried are tagged with a hostname of 127.0.0.1 instead of the "actual" hostname of the node, and these metrics would appear as coming from the 127.0.0.1 host on Datadog.

Maybe the check should instead use the value at nodes.<hash>.name, and users would manually have to configure their nodes with node.name: ${HOSTNAME}.

@bdharrington7
Copy link
Contributor Author

Ok, I see what you mean. So what changed then is that in 2.x elasticsearch changed the default binding for network.host from 0.0.0.0 (listen to everything) to 127.0.0.1 (only listen to localhost), as it's more secure. Would the value at node.<hash>.name be sufficient to send for host name? The only gotcha I see there is if you don't set the node.name config to ${HOSTNAME}, you get a default Marvel character name.

@olivielpeau olivielpeau modified the milestones: 5.8.0, 5.7.0 Feb 24, 2016
@bdharrington7
Copy link
Contributor Author

@olivielpeau I noticed that there was a recent release (congrats!), would it be possible to backpatch this change to it? And what else needs to happen in this PR to move it forward?

@olivielpeau
Copy link
Member

@bdharrington7 Sorry for my late response here, I'm going to merge this PR, it'll be part of our next minor release (5.8.0).

Thanks for your explanation on the change of the default binding for network.host, after some more testing it looks like the nodes.<hash>.host field returned by the stats endpoint does have a value that the check can use (as long as the configuration of network.host on ES makes sense).

Thanks again!

@olivielpeau olivielpeau merged commit fb9d49a into DataDog:master Apr 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants