-
Notifications
You must be signed in to change notification settings - Fork 1.3k
ElasticSearch sink - Improve mappings, timestamps, index separations, etc #1701
Comments
@AlmogBaku need your opinion in this :) |
The reason we have multiple timestamps is to allow Kibana create different
types...
On Mon, Jun 26, 2017 at 6:09 PM Ricardo Katz ***@***.***> wrote:
@AlmogBaku <https://github.com/almogbaku> need your opinion in this :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1701 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGCpgmBBiC8Ds6EA3CIHou1uSAnx2DIks5sH8magaJpZM4OFciK>
.
--
<http://www.rimoto.net/>
www.rimoto.com <http://www.rimoto.net/>
*Almog Baku*
*CTO & Cofounder *
Mobile: +972.50.2288.744
Social: * <http://www.facebook.com/AlmogBaku>
<http://www.linkedin.com/in/almogbaku>*
|
OK, maybe I'm missing something or I don't understand well how Kibana works, but when you add a new 'index' pattern, you have to specify if the index contains 'time based events', and what's the Time-field base. If we do add a common 'index' (like testevent-*) pointing only to 'CpuMetricsTimestamp', the different events (memory, network) are 'discarded' from Kibana search. If you add a non timestamped index in Kibana, we can plot graphs but the behaviour is pretty strange. You have to add graphs with 'X axis' using a Histogram, but the plotting doesn't work well (and also selecting the Time range). Do you have any Kibana Dashboard example? Thank you! |
I don't have an example.. but I guess adding an extra field couldn't hurt.. However, it's interesting to try and see if having a unified timestamp outside of the |
@AlmogBaku No problems at all. I'll make some tests here, including Kibana Dashboards and check if everything works fine. |
@AlmogBaku I am confused, how is having multiple timestamp fields helps kibana? I actually see several issues with current sink implementation of ElasticSearch:
|
@outcoldman about the issues: Separating the indexes per metrics could be applicable, but I think would make harder to create Kibana and/or Grafana Dashboards. Let's take a look at how Elastic does this, as they have a Metricbeat Plugin for kubernetes/kubelet: Regardless of using kube-state-metrics (this is another subject) or kubelet (and the collected metrics), Metricbeat would everytime export the metrics to the same Elasticsearch cluster (or any other output) as defined here. The most important thing here is that the metrics use the same Timestamp field, but different type mapping per metrics. Waiting for your considerations :) |
@rikatz yes, you can send all the metrics to just one index, but as Do your documents have similar mappings? If no, use different indices. With metricsbeat you can: a) you can specify dedicated index for every type with elasticsearch output. I agree that having multiple indexes per every type can be hard to manage if you are just playing with that setup. But in case of real production use - that will pay off. Considering that sinks are configured only on command arguments level - maybe having a For consideration - we can implement another sink - generic HTTP/JSON one, so it will be possible to use for example logstash http input and so folks will be able to do anything they want on pipeline. Btw, considering that ES5 supports ingest pipelines - I guess I can use that now as a workaround for separating types and putting stuff in right indexes. |
btw, regarding this one. If you will have indexes with names like |
Agreed, you can use wildcards in that. Maybe a new argument would help anyway. @AlmogBaku what do you think about this rebasing:
|
@rikatz also I would suggest to update the mappings:
|
@outcoldman let's take this by parts :D first of all, changing the timestamp schema to something more generic would help? For me would a lot. Need more opinions before 'breaking' stuffs! Then let's see how to refactor the fields / mapping structures (in another PR) and break other stuffs :) Let's make a 'list' of things that we think need to be changed (fields to be removed, arguments to be inserted, things to be changed) and achieve this in a lot of PRs to improve the ES5 sink :) |
@outcoldman Have changed the Issue description, take a look if I've missed something. Anyway, need some other people from community (instead or just 3 of us), as I think some changes would break existing production environments :) |
@rikatz sure, just one common timestamp field will help a lot, that will be good start. As for breaking changes - I would suggest to break just once. If with every release this sink will have breaking changes - folks will be very unhappy. I would suggest to actually make all of them at once and standartize the schema with best practices from elasticsearch. Btw as for mappings - you can take a look on which mappings ElasticSearch is using https://www.elastic.co/guide/en/beats/metricbeat/master/exported-fields-kubernetes.html#_kubernetes_event_message - they do not have any Btw on the list 3rd and 4th is the same. Curious if maybe metricbeats/kubernetes authors can help with advise as well. cc @exekias @monicasarbu - the community will really apprieate if you will help heapster be much nicer with ElasticSearch, we understand that you want to make your |
Hi there! I would say As for the labels, we use nesting to store them, so they would be something like @outcoldman pointed out to our exported fields list, it's a good reference if you want to match our names. |
About the types and indices, https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch - in ES 6 - there will be no types, only one type per index |
|
@AlmogBaku I think that, by the end of 'types' in ES6, it's a good idea to have a per-metric Index, probably the impact migrating from 5 to 6 would be less than expected. About the type, I think you can use it (the field _type) to filter which metrics you want. If this is not the case to Kibana < 5, than probably this needs to be a breaking change. Maybe, after all we need to decide wheter to support or not anymore ES 2x versions. Let's close those questions, and we can start working on that :D |
@AlmogBaku |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
let's continue the discussion in #1909 and decide what's the new schema will be. |
As mentioned here, maybe using / creating a Generic 'MetricsTimestamp' field usable by all the metrics regardless of which type it uses is better than creating a per-type Timestamp.
Using a per-type Timestamp forces us to create a Datasource per metric in Grafana (like one per CPU, one per memory and so) or even a new index mapping per metric type, when we could filter the desired metric per '_type' field (as already inserted).
The idea here is to insert a general
point["MetricsTimestamp"] = date.UTC()
in sinks/elasticsearch/driver.go instead of creating aesCommon.MetricFamilyTimestamp(family)
Need your opinion in this, as I'm thinking about start working in a PR for this. I've taken a look in #1313 but couldn't realize why Timestamp is separated, as there's a _type field that can be used to separate everything in Kibana :)
TODO (as suggested by @outcoldman)
Thanks in advice!
The text was updated successfully, but these errors were encountered: