Tracking Index Time vs. Event Time #8

ave19 · 2018-05-30T17:17:29Z

Hello,

I love this initiative and I wanted to start a dialog about time.

We have some business processes that hinge on the time elasticsearch indexes an event, and others that hinge on when the event occurred, so we want to track both. The following examples all have the same flavor, but:

Index time facilitates alerting by allowing you to do a query every 1m for events that were indexed now-1m. We have some sources take over an hour to get to us. It's very hard to know how long you have to look back even you're using the event time. You have to do now-2h and then keep track of whether you've already alerted on something.
The time gap can be variable, too. If you want to know if one of your twenty pipelines has died, tracking index time is the best way. If you're using event time and there's a twenty to forty minute gap, it's hard to know when it stops working. Maybe there's another way to do that per feed, but tracking index time makes it trivial.
The delta between occurrence and indexing time gives you a nice metric for how smoothly your ingest pipeline is running. You can watch the gap in timelion, set up alerts, etc.

Anyway, assuming I've convinced you it's valuable to know index time, what can we do? Most people map the time the event happened into the @timestamp, as I think the description for ECS @timestamp field says. (You have "generated" vs. "read" but don't say who's doing the reading.)

I can map timestamps from our data sources into something like event.timestamp to track when they occurred. I could see adding something to a pipeline that adds an index.timestamp or something just before indexing to track index time. It wouldn't be totally accurate but close enough. Leaving @timestamp blank gives you an indexed time correctly and automatically, including handling timeouts and errors automatically (if it fails to index, there's no @timestamp) but the name of the field is kind of generic.

I would love to leave @timestamp blank, but it kinds of goes against common practice. Any thoughts?

Thanks,
Dave Jaccard

The text was updated successfully, but these errors were encountered:

ruflin · 2018-05-31T06:19:01Z

ECS contains the field event.created. It is the time for example filebeat reads a log line and @timestamp is the timestamp from the log line itself, so normally when it was written. I'm wondering if you could reuse event.created for your use case as it seems for you the event.created time is when it is indexed.

I'm not sure I fully understood on why you would like to leave @timestamp blank?

ave19 · 2018-05-31T14:06:55Z

If I don't have an @timestamp in my document/event, then elasticsearch will automatically add one, and that time will be the moment of indexing, which is exactly what I need to know.

If I try to set something before I send the document for indexing, I could get the moment wrong because of timeouts, retries, heavy load, etc. A lot of scenarios to code around. That's so ugly I'm balking at it.

I'm thinking of using event.timestamp to store when the event happened.

ruflin · 2018-06-01T06:20:36Z

Elasticsearch does not create @timestamp on ingestion time. This has to be added by the client / agent ingesting the data.

From an ECS perspective you should use @timestamp for when the event happened and probably something like event.indexed for what you propose above.

ave19 · 2018-06-01T14:06:02Z

Oh dang, you're right, what was I thinking! :-p

Oh, I see, it used to do that, but was deprecated.

ave19 · 2018-06-01T15:21:31Z

As a best of all worlds scenario, and I recognize that this is getting off topic for ECS now, if I had an ingest pipeline in elasticsearch that added the date I would get a really good time out of that.

I'm going to close this, too, since it's on me to figure out the times, not ECS.

ave19 · 2018-06-03T15:55:54Z

In case any one comes and looks at this in the future, here's what I did to solve my problem.

PUT _ingest/pipeline/elastic_common_schema
{
  "description": "Elastic Common Schema Translastion.",
  "processors": [
    {
      "set": {
        "field": "event.index_time",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

We will add more processors to this obviously, but the idea is that Logstash (or what have you) can name this pipeline when it sends events in to Elasticsearch. Because the event has to actually arrive at Elasticsearch for this pipeline to happen, this automatically adjusts to things like timeouts, connection issues and things like that. The _ingest.timestamp only exists during the event's transit of the pipeline and is not stored or indexed in elasticsearch.

ave19 closed this as completed Jun 1, 2018

mbrancato mentioned this issue Feb 14, 2019

Clarify @timestamp vs event.created #326

Closed

webmat mentioned this issue Aug 18, 2020

[meta] Add support for pipeline details in ECS #940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Index Time vs. Event Time #8

Tracking Index Time vs. Event Time #8

ave19 commented May 30, 2018

ruflin commented May 31, 2018

ave19 commented May 31, 2018

ruflin commented Jun 1, 2018

ave19 commented Jun 1, 2018 •

edited

Loading

ave19 commented Jun 1, 2018

ave19 commented Jun 3, 2018

Tracking Index Time vs. Event Time #8

Tracking Index Time vs. Event Time #8

Comments

ave19 commented May 30, 2018

ruflin commented May 31, 2018

ave19 commented May 31, 2018

ruflin commented Jun 1, 2018

ave19 commented Jun 1, 2018 • edited Loading

ave19 commented Jun 1, 2018

ave19 commented Jun 3, 2018

ave19 commented Jun 1, 2018 •

edited

Loading