-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Index Time vs. Event Time #8
Comments
ECS contains the field I'm not sure I fully understood on why you would like to leave |
If I don't have an If I try to set something before I send the document for indexing, I could get the moment wrong because of timeouts, retries, heavy load, etc. A lot of scenarios to code around. That's so ugly I'm balking at it. I'm thinking of using |
Elasticsearch does not create From an ECS perspective you should use |
Oh dang, you're right, what was I thinking! :-p Oh, I see, it used to do that, but was deprecated. |
As a best of all worlds scenario, and I recognize that this is getting off topic for ECS now, if I had an ingest pipeline in elasticsearch that added the date I would get a really good time out of that. I'm going to close this, too, since it's on me to figure out the times, not ECS. |
In case any one comes and looks at this in the future, here's what I did to solve my problem.
We will add more processors to this obviously, but the idea is that Logstash (or what have you) can name this pipeline when it sends events in to Elasticsearch. Because the event has to actually arrive at Elasticsearch for this pipeline to happen, this automatically adjusts to things like timeouts, connection issues and things like that. The |
Hello,
I love this initiative and I wanted to start a dialog about time.
We have some business processes that hinge on the time elasticsearch indexes an event, and others that hinge on when the event occurred, so we want to track both. The following examples all have the same flavor, but:
Index time facilitates alerting by allowing you to do a query every
1m
for events that were indexednow-1m
. We have some sources take over an hour to get to us. It's very hard to know how long you have to look back even you're using the event time. You have to donow-2h
and then keep track of whether you've already alerted on something.The time gap can be variable, too. If you want to know if one of your twenty pipelines has died, tracking index time is the best way. If you're using event time and there's a twenty to forty minute gap, it's hard to know when it stops working. Maybe there's another way to do that per feed, but tracking index time makes it trivial.
The delta between occurrence and indexing time gives you a nice metric for how smoothly your ingest pipeline is running. You can watch the gap in timelion, set up alerts, etc.
Anyway, assuming I've convinced you it's valuable to know index time, what can we do? Most people map the time the event happened into the @timestamp, as I think the description for ECS @timestamp field says. (You have "generated" vs. "read" but don't say who's doing the reading.)
I can map timestamps from our data sources into something like
event.timestamp
to track when they occurred. I could see adding something to a pipeline that adds anindex.timestamp
or something just before indexing to track index time. It wouldn't be totally accurate but close enough. Leaving@timestamp
blank gives you an indexed time correctly and automatically, including handling timeouts and errors automatically (if it fails to index, there's no @timestamp) but the name of the field is kind of generic.I would love to leave @timestamp blank, but it kinds of goes against common practice. Any thoughts?
Thanks,
Dave Jaccard
The text was updated successfully, but these errors were encountered: