Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Index Time vs. Event Time #8

Closed
ave19 opened this issue May 30, 2018 · 6 comments
Closed

Tracking Index Time vs. Event Time #8

ave19 opened this issue May 30, 2018 · 6 comments

Comments

@ave19
Copy link

ave19 commented May 30, 2018

Hello,

I love this initiative and I wanted to start a dialog about time.

We have some business processes that hinge on the time elasticsearch indexes an event, and others that hinge on when the event occurred, so we want to track both. The following examples all have the same flavor, but:

  • Index time facilitates alerting by allowing you to do a query every 1m for events that were indexed now-1m. We have some sources take over an hour to get to us. It's very hard to know how long you have to look back even you're using the event time. You have to do now-2h and then keep track of whether you've already alerted on something.

  • The time gap can be variable, too. If you want to know if one of your twenty pipelines has died, tracking index time is the best way. If you're using event time and there's a twenty to forty minute gap, it's hard to know when it stops working. Maybe there's another way to do that per feed, but tracking index time makes it trivial.

  • The delta between occurrence and indexing time gives you a nice metric for how smoothly your ingest pipeline is running. You can watch the gap in timelion, set up alerts, etc.

Anyway, assuming I've convinced you it's valuable to know index time, what can we do? Most people map the time the event happened into the @timestamp, as I think the description for ECS @timestamp field says. (You have "generated" vs. "read" but don't say who's doing the reading.)

I can map timestamps from our data sources into something like event.timestamp to track when they occurred. I could see adding something to a pipeline that adds an index.timestamp or something just before indexing to track index time. It wouldn't be totally accurate but close enough. Leaving @timestamp blank gives you an indexed time correctly and automatically, including handling timeouts and errors automatically (if it fails to index, there's no @timestamp) but the name of the field is kind of generic.

I would love to leave @timestamp blank, but it kinds of goes against common practice. Any thoughts?

Thanks,
Dave Jaccard

@ruflin
Copy link
Contributor

ruflin commented May 31, 2018

ECS contains the field event.created. It is the time for example filebeat reads a log line and @timestamp is the timestamp from the log line itself, so normally when it was written. I'm wondering if you could reuse event.created for your use case as it seems for you the event.created time is when it is indexed.

I'm not sure I fully understood on why you would like to leave @timestamp blank?

@ave19
Copy link
Author

ave19 commented May 31, 2018

If I don't have an @timestamp in my document/event, then elasticsearch will automatically add one, and that time will be the moment of indexing, which is exactly what I need to know.

If I try to set something before I send the document for indexing, I could get the moment wrong because of timeouts, retries, heavy load, etc. A lot of scenarios to code around. That's so ugly I'm balking at it.

I'm thinking of using event.timestamp to store when the event happened.

@ruflin
Copy link
Contributor

ruflin commented Jun 1, 2018

Elasticsearch does not create @timestamp on ingestion time. This has to be added by the client / agent ingesting the data.

From an ECS perspective you should use @timestamp for when the event happened and probably something like event.indexed for what you propose above.

@ave19
Copy link
Author

ave19 commented Jun 1, 2018

Oh dang, you're right, what was I thinking! :-p

Oh, I see, it used to do that, but was deprecated.

@ave19
Copy link
Author

ave19 commented Jun 1, 2018

As a best of all worlds scenario, and I recognize that this is getting off topic for ECS now, if I had an ingest pipeline in elasticsearch that added the date I would get a really good time out of that.

I'm going to close this, too, since it's on me to figure out the times, not ECS.

@ave19 ave19 closed this as completed Jun 1, 2018
@ave19
Copy link
Author

ave19 commented Jun 3, 2018

In case any one comes and looks at this in the future, here's what I did to solve my problem.

PUT _ingest/pipeline/elastic_common_schema
{
  "description": "Elastic Common Schema Translastion.",
  "processors": [
    {
      "set": {
        "field": "event.index_time",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

We will add more processors to this obviously, but the idea is that Logstash (or what have you) can name this pipeline when it sends events in to Elasticsearch. Because the event has to actually arrive at Elasticsearch for this pipeline to happen, this automatically adjusts to things like timeouts, connection issues and things like that. The _ingest.timestamp only exists during the event's transit of the pipeline and is not stored or indexed in elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants