-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unit test to support Logstash 8 nano timestamp #67
Fix unit test to support Logstash 8 nano timestamp #67
Conversation
spec/filters/fingerprint_spec.rb
Outdated
let(:fingerprint_method) { "SHA1" } | ||
let(:data) { { "@timestamp" => epoch_time } } | ||
it "fingerprints the timestamp correctly" do | ||
expect(fingerprint).to eq('1d5379ec92d86a67cfc642d55aa050ca312d3b9a') | ||
digest = OpenSSL::Digest::SHA1.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal with having explicit values was that we could compute the digests "manually" independently from the code we use in the plugin(e.g. https://cimi.io/murmurhash3js-revisited/). It's not clear if this change causes the digests of a few tests to change, or if it's just a code reorganization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the test fails in Logstash 8 due to the breaking change of nanosecond timestamp in elastic/logstash#12797 which changes the string format of @timestamp
, as a result, the hash value change in Logstash 8.
I can make a version specific hash value in test, or getting the value from OpenSSL::Digest::SHA1.new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 because LS8's Timestamp#to_s
does not include a trailing .000
for whole-second values, and does include .XXX
for timestamps with fractional seconds, this breaks the fingerprinting of timestamps in 1 out of 1000 events.
Can we normalize the serialization in this library to include a minimum precision?
We could do it with refinements like:
diff --git a/lib/logstash/filters/fingerprint.rb b/lib/logstash/filters/fingerprint.rb
index e027109..2e39d46 100644
--- a/lib/logstash/filters/fingerprint.rb
+++ b/lib/logstash/filters/fingerprint.rb
@@ -24,6 +24,18 @@ require "logstash/plugin_mixins/ecs_compatibility_support"
# To generate UUIDs, prefer the <<plugins-filters-uuid,uuid filter>>.
class LogStash::Filters::Fingerprint < LogStash::Filters::Base
+ module MinimumSerializationLengthTimestamp
+ THREE = java.time.format.DateTimeFormatterBuilder.new.appendInstant(3).toFormatter
+
+ refine LogStash::Timestamp do
+ def to_s
+ return super unless nsec == 0
+
+ THREE.format(to_java.toInstant)
+ end
+ end
+ end
+
INTEGER_MAX_32BIT = (1 << 31) - 1
INTEGER_MIN_32BIT = -(1 << 31)
@@ -119,6 +131,9 @@ class LogStash::Filters::Fingerprint < LogStash::Filters::Base
end
end
+
+ using MinimumSerializationLengthTimestamp if LOGSTASH_VERSION.split('.').first.to_i >= 8
+
def filter(event)
case @method
when :UUID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave a test record here
Logstash 7.17.0
irb(main):001:0> LogStash::Timestamp.new('2000-01-01T05:00:00.12Z').to_s
=> "2000-01-01T05:00:00.120Z"
irb(main):002:0> LogStash::Timestamp.new('2000-01-01T05:00:00.000Z').to_s
=> "2000-01-01T05:00:00.000Z"
irb(main):003:0> LogStash::Timestamp.new('2000-01-01T05:00:00.123456Z').to_s
=> "2000-01-01T05:00:00.123Z"
Logstash 8.1.1
irb(main):001:0> LogStash::Timestamp.new('2000-01-01T05:00:00.12Z').to_s
=> "2000-01-01T05:00:00.120Z"
irb(main):002:0> LogStash::Timestamp.new('2000-01-01T05:00:00.000Z').to_s
=> "2000-01-01T05:00:00Z"
irb(main):003:0> LogStash::Timestamp.new('2000-01-01T05:00:00.123456Z').to_s
=> "2000-01-01T05:00:00.123456Z"
7.17 always gives three fractional digit while 8 formats fractional digits in groups of three.
The patch is good enough to make zero to three fractional digits backward compatible.
@yaauie I think your patch is the best fit for user. I am wondering is there any reason not to put using MinimumSerializationLengthTimestamp
right after MinimumSerializationLengthTimestamp
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refinement scoping can get... weird. They're active until the lexical scope in which the using
is sent is closed, and affect methods that are defined while they are active, so my first thought was to minimize the scope in which it can be active by sending the using
after defining other methods.
Looking at the implementation though, I think you are right that they should be grouped together -- there is little risk of accidental timestamp serialization during the lexical scope or using the methods defined in this lexical scope.
The following diff makes the build green.
diff --git a/lib/logstash/filters/fingerprint.rb b/lib/logstash/filters/fingerprint.rb
index e027109..cf0f20c 100644
--- a/lib/logstash/filters/fingerprint.rb
+++ b/lib/logstash/filters/fingerprint.rb
@@ -24,6 +24,27 @@ require "logstash/plugin_mixins/ecs_compatibility_support"
# To generate UUIDs, prefer the <<plugins-filters-uuid,uuid filter>>.
class LogStash::Filters::Fingerprint < LogStash::Filters::Base
+ ##
+ # Logstash 8+ has variable-length serialization of timestamps
+ # that do not include subsecond info for whole-second timestamps.
+ # For backward-compatibility we refine the implementation to use
+ # our own three-decimal-place formatter for whole-second
+ # timestamps.
+ if LOGSTASH_VERSION.split('.').first.to_i >= 8
+ module MinimumSerializationLengthTimestamp
+ THREE_DECIMAL_INSTANT_FORMATTER = java.time.format.DateTimeFormatterBuilder.new.appendInstant(3).toFormatter
+
+ refine LogStash::Timestamp do
+ def to_s
+ return super unless nsec == 0
+
+ THREE_DECIMAL_INSTANT_FORMATTER.format(to_java.toInstant)
+ end
+ end
+ end
+ using MinimumSerializationLengthTimestamp
+ end
+
INTEGER_MAX_32BIT = (1 << 31) - 1
INTEGER_MIN_32BIT = -(1 << 31)
I would like to see:
- some additional specs with fractional seconds, passing both before and after this change, AND
- some nano-precise specs that only run on Logstash 8+ (since the extra precision is truncated while parsing in
Timestamp#new
on 7.x).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7.17.0
irb(main):007:0> LogStash::Timestamp.new(Time.at(946702800.12)).to_s
=> "2000-01-01T05:00:00.120Z"
8.1.1
irb(main):038:0> LogStash::Timestamp.new(Time.at(946702800.12)).to_s
=> "2000-01-01T05:00:00.120000004Z"
LogStash::Timestamp of 2000-01-01T05:00:00.12Z
is consistent with the patch. However, LogStash::Timestamp( Time -> RubyTime -> Instant ), RubyTime convert .12
to millisecond and 4 nanasecond.
I think Logstash core is a better place to handle this case. I believe most of the use cases are converting from datetime string, except customized script in ruby filter, so the patch is still valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaauie I have created an issue to track the timestamp elastic/logstash#14088
Do you think we can merge the patch now?
Co-authored-by: Ry Biesemeyer <[email protected]>
fix red ci
Closes: #68