Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New option timestamp.precision to control timestamp precision #31682

Merged
merged 13 commits into from
May 23, 2022

Conversation

kvch
Copy link
Contributor

@kvch kvch commented May 19, 2022

What does this PR do?

This PR makes timestamp precision configurable. A new config option is added called timestamp.precision. Available options are millisecond microseconds and nanosecond. Default value is millisecond.

Why is it important?

We made nanosecond support default in #31553. But such timestamps require more storage. Thus, we are making it opt-in.

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 19, 2022
@mergify mergify bot assigned kvch May 19, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 19, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-23T08:45:39.771+0000

  • Duration: 102 min 49 sec

Test stats 🧪

Test Results
Failed 0
Passed 22293
Skipped 1933
Total 24226

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kvch kvch force-pushed the nanosecond-opt-in-ii branch from 4d55be5 to 82d66a6 Compare May 19, 2022 17:01
@kvch kvch changed the title configurable timestamp precision New option timestamp.precision to control timestamp precision May 19, 2022
@kvch kvch marked this pull request as ready for review May 19, 2022 17:02
@kvch kvch requested review from a team as code owners May 19, 2022 17:02
@kvch kvch requested review from rdner and fearful-symmetry and removed request for a team May 19, 2022 17:02
@kvch kvch added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 19, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 19, 2022
@@ -287,6 +287,10 @@ heartbeat.jobs:
# sub-dictionary. Default is false.
#fields_under_root: false

# Configure the precision of all timestamps in Heartbeat.
Copy link
Contributor

@andrewvc andrewvc May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in understanding that this basically just changes how time.Time fields are serialized when used with common.Time? If so, this should work.

However, we should add a note that time fields will keep their us (microsecond) paths, such as monitor.duration.us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think we should be fine here, all the us stuff is durations not timestamps.

@andrewkroh andrewkroh self-requested a review May 19, 2022 22:37
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume (but not sure) this is meant to control the precision of timestamps used in events sent by outputs. We need to add a test case for this. I don't think changing the implementation of common.Time#MarshalJSON is sufficient.

The outputs that allow configurable formats have a concept of a codec that is involved in controlling the output format. Perhaps the configuration for this belongs in the codecs that are responsible for marshaling the data.

For example the outputs/codec/json already supports some configuration in the timestamp format by allowing the local time zone to be used.

// create new encoder with custom time.Time encoding
e.folder, err = gotype.NewIterator(visitor,
gotype.Folders(
codec.MakeUTCOrLocalTimestampEncoder(e.config.LocalTime),
codec.MakeBCTimestampEncoder(),
),
)

And the format of the timestamps used by the Elasticsearch output are controlled here:

b.folder, err = gotype.NewIterator(visitor,
gotype.Folders(
codec.MakeTimestampEncoder(),
codec.MakeBCTimestampEncoder()))

return nil
}

func SetTimestampPrecision(c *conf.C) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems important and I think it needs some documentation. Like it should state that it modifies the behavior of common.Time's MarshalJSON() and String() methods . And I think it should call out that it's not safe to call this after you have any other threads using common.Time because it modifies the global formatter that they all share.

@andrewvc
Copy link
Contributor

Thinking about this more, I'm wondering if we can opt out of this for heartbeat. This isn't a feature any of our users want / need, and it only creates a support burden. If we can decrease heartbeat's surface area where possible that's a win.

@kvch
Copy link
Contributor Author

kvch commented May 20, 2022

@andrewkroh @andrewvc For the record the timestamps Beats produce are already in nanosecond precision as implemented in #31553

This PR is just making it configurable for users.

Thinking about this more, I'm wondering if we can opt out of this for heartbeat. This isn't a feature any of our users want / need, and it only creates a support burden. If we can decrease heartbeat's surface area where possible that's a win.

What you you mean by opt out? Do you want to opt out of configuring timestamp precision and stay with the new nanosecond setting? Or do you want to stay on the previous millisecond precision? If you want to opt out of nanosecond precision timestamps, this is the PR that implements it for you. :)

@kvch
Copy link
Contributor Author

kvch commented May 20, 2022

I assume (but not sure) this is meant to control the precision of timestamps used in events sent by outputs. We need to add a test case for this.

@andrewkroh Initially, it was my intention. However, only changing the timestamp precision of @timestamp of events is just a partial solution. If an input or dataset supports different precisions we should also make it available to users or integrations developers. It seems a bit weird that e.g. someone changes the timestamps in an Ingest pipeline and they loose the higher precision value.

My reasoning is that all timestamps were globally millisecond precision in Beats. The precisions of timestamps should be configured together where possible. It looks inconsistent if one timestamp is nanosecond, another one is in milliseconds, etc.

I don't think changing the implementation of common.Time#MarshalJSON is sufficient.

It's not only changing MarshalJSON, the PR also affects String function. Event transformers, serializers use the String function of common.Time to get the string representation of the field. (TBH I haven't seen anyone using MarshalJSON to marshal timestamp fields.)

I indeed missed MakeUTCOrLocalTimestampEncoder, I will pass down the configuration there as well. Thanks for finding it.

@kvch
Copy link
Contributor Author

kvch commented May 20, 2022

Note to self: check if datetime has to be updated in libs as well. Yes...

@kvch kvch requested a review from andrewkroh May 20, 2022 13:59
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather see the output format controlled without the use of a global state in a common package. The global nature makes test isolation difficult because one test can modify a global and affect other tests. This is less of a concern, but it also creates a concurrency issue since there is no synchronization on the global. Rather than using a global, the desired format could be passed into each output codec instance.

I have a similar feeling about the global nature of timestamp.precision. If this were scoped to the output config this might give a bit more flexibility in the behavior if it were ever needed (like allowing independent precision for the monitoring output vs the event output).

@@ -87,7 +171,7 @@ func ParseTime(timespec string) (Time, error) {
}

func (t Time) String() string {
str, _ := defaultTimeFormatter.Format(time.Time(t).UTC())
str, _ := timeFormatter.Format(time.Time(t).UTC())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only changing MarshalJSON, the PR also affects String function. Event transformers, serializers use the String function of common.Time to get the string representation of the field.

Apart from some test code that uses the stdlib json encoder, I can't think of anything that should be dependent upon common.Time's format in the output path (including processors). Could common.Time retain its static behavior of generating ISO8601 timestamps with nanosecond precision without affecting the output format?

@andrewkroh andrewkroh dismissed their stale review May 23, 2022 12:35

I'm not blocking the change.

@kvch
Copy link
Contributor Author

kvch commented May 23, 2022

I would rather see the output format controlled without the use of a global state in a common package. The global nature makes test isolation difficult because one test can modify a global and affect other tests. This is less of a concern, but it also creates a concurrency issue since there is no synchronization on the global. Rather than using a global, the desired format could be passed into each output codec instance.

I agree. My initial implementation included a new common.Time struct that included a precision, so each timestamp can have a separate precision. However, after refactoring the code for a long time, I reconsidered my plans because it required tremendous change. But your idea sounds simpler. I will merge this PR as is for now. I have to adopt the changes in elastic-agent-libs as well. There I will use your approach and then adopt it in beats. Thank you!

I have a similar feeling about the global nature of timestamp.precision. If this were scoped to the output config this might give a bit more flexibility in the behavior if it were ever needed (like allowing independent precision for the monitoring output vs the event output).

On the input side users can rely on the timestamp processor to modify the precision of the timestamp. With that in mind, I think we can get away with setting the granularity globally to nanoseconds, and then decrease it on the input level if needed.

@kvch kvch merged commit b63e5be into elastic:main May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants