Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unit test to support Logstash 8 nano timestamp #67

Merged
merged 6 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
## Next release
- Fixed unit test to support Logstash 8 timestamp [#87](https://github.com/logstash-plugins/logstash-filter-fingerprint/pull/67)

## 3.4.0
- Added support for 128bit murmur variant [#66](https://github.com/logstash-plugins/logstash-filter-fingerprint/pull/66).

Expand Down
14 changes: 9 additions & 5 deletions spec/filters/fingerprint_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -273,31 +273,35 @@
end

context 'Timestamps' do
epoch_time = Time.at(0).gmtime
let(:epoch_time) { Time.at(0).gmtime }
let(:logstash_epoch_time) { LogStash::Timestamp.new(epoch_time).to_s }
let(:config) { super().merge("source" => ['@timestamp']) }

describe 'OpenSSL Fingerprinting' do
let(:config) { super().merge("key" => '0123') }
let(:key) { '0123' }
let(:config) { super().merge("key" => key) }
let(:fingerprint_method) { "SHA1" }
let(:data) { { "@timestamp" => epoch_time } }
it "fingerprints the timestamp correctly" do
expect(fingerprint).to eq('1d5379ec92d86a67cfc642d55aa050ca312d3b9a')
digest = OpenSSL::Digest::SHA1.new
Copy link
Member

@jsvd jsvd May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal with having explicit values was that we could compute the digests "manually" independently from the code we use in the plugin(e.g. https://cimi.io/murmurhash3js-revisited/). It's not clear if this change causes the digests of a few tests to change, or if it's just a code reorganization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the test fails in Logstash 8 due to the breaking change of nanosecond timestamp in elastic/logstash#12797 which changes the string format of @timestamp, as a result, the hash value change in Logstash 8.

I can make a version specific hash value in test, or getting the value from OpenSSL::Digest::SHA1.new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 because LS8's Timestamp#to_s does not include a trailing .000 for whole-second values, and does include .XXX for timestamps with fractional seconds, this breaks the fingerprinting of timestamps in 1 out of 1000 events.

Can we normalize the serialization in this library to include a minimum precision?

We could do it with refinements like:

diff --git a/lib/logstash/filters/fingerprint.rb b/lib/logstash/filters/fingerprint.rb
index e027109..2e39d46 100644
--- a/lib/logstash/filters/fingerprint.rb
+++ b/lib/logstash/filters/fingerprint.rb
@@ -24,6 +24,18 @@ require "logstash/plugin_mixins/ecs_compatibility_support"
 # To generate UUIDs, prefer the <<plugins-filters-uuid,uuid filter>>.
 class LogStash::Filters::Fingerprint < LogStash::Filters::Base
 
+  module MinimumSerializationLengthTimestamp
+    THREE = java.time.format.DateTimeFormatterBuilder.new.appendInstant(3).toFormatter
+
+    refine LogStash::Timestamp do
+      def to_s
+        return super unless nsec == 0
+
+        THREE.format(to_java.toInstant)
+      end
+    end
+  end
+
   INTEGER_MAX_32BIT = (1 << 31) - 1
   INTEGER_MIN_32BIT = -(1 << 31)
 
@@ -119,6 +131,9 @@ class LogStash::Filters::Fingerprint < LogStash::Filters::Base
     end
   end
 
+
+  using MinimumSerializationLengthTimestamp if LOGSTASH_VERSION.split('.').first.to_i >= 8
+
   def filter(event)
     case @method
     when :UUID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a test record here

Logstash 7.17.0

irb(main):001:0> LogStash::Timestamp.new('2000-01-01T05:00:00.12Z').to_s
=> "2000-01-01T05:00:00.120Z"
irb(main):002:0> LogStash::Timestamp.new('2000-01-01T05:00:00.000Z').to_s
=> "2000-01-01T05:00:00.000Z"
irb(main):003:0> LogStash::Timestamp.new('2000-01-01T05:00:00.123456Z').to_s
=> "2000-01-01T05:00:00.123Z"

Logstash 8.1.1

irb(main):001:0> LogStash::Timestamp.new('2000-01-01T05:00:00.12Z').to_s
=> "2000-01-01T05:00:00.120Z"
irb(main):002:0> LogStash::Timestamp.new('2000-01-01T05:00:00.000Z').to_s
=> "2000-01-01T05:00:00Z"
irb(main):003:0> LogStash::Timestamp.new('2000-01-01T05:00:00.123456Z').to_s
=> "2000-01-01T05:00:00.123456Z"

7.17 always gives three fractional digit while 8 formats fractional digits in groups of three.
The patch is good enough to make zero to three fractional digits backward compatible.

@yaauie I think your patch is the best fit for user. I am wondering is there any reason not to put using MinimumSerializationLengthTimestamp right after MinimumSerializationLengthTimestamp module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refinement scoping can get... weird. They're active until the lexical scope in which the using is sent is closed, and affect methods that are defined while they are active, so my first thought was to minimize the scope in which it can be active by sending the using after defining other methods.

Looking at the implementation though, I think you are right that they should be grouped together -- there is little risk of accidental timestamp serialization during the lexical scope or using the methods defined in this lexical scope.

The following diff makes the build green.

diff --git a/lib/logstash/filters/fingerprint.rb b/lib/logstash/filters/fingerprint.rb
index e027109..cf0f20c 100644
--- a/lib/logstash/filters/fingerprint.rb
+++ b/lib/logstash/filters/fingerprint.rb
@@ -24,6 +24,27 @@ require "logstash/plugin_mixins/ecs_compatibility_support"
 # To generate UUIDs, prefer the <<plugins-filters-uuid,uuid filter>>.
 class LogStash::Filters::Fingerprint < LogStash::Filters::Base
 
+  ##
+  # Logstash 8+ has variable-length serialization of timestamps
+  # that do not include subsecond info for whole-second timestamps.
+  # For backward-compatibility we refine the implementation to use
+  # our own three-decimal-place formatter for whole-second
+  # timestamps.
+  if LOGSTASH_VERSION.split('.').first.to_i >= 8
+    module MinimumSerializationLengthTimestamp
+      THREE_DECIMAL_INSTANT_FORMATTER = java.time.format.DateTimeFormatterBuilder.new.appendInstant(3).toFormatter
+
+      refine LogStash::Timestamp do
+        def to_s
+          return super unless nsec == 0
+
+          THREE_DECIMAL_INSTANT_FORMATTER.format(to_java.toInstant)
+        end
+      end
+    end
+    using MinimumSerializationLengthTimestamp
+  end
+
   INTEGER_MAX_32BIT = (1 << 31) - 1
   INTEGER_MIN_32BIT = -(1 << 31)
 

I would like to see:

  • some additional specs with fractional seconds, passing both before and after this change, AND
  • some nano-precise specs that only run on Logstash 8+ (since the extra precision is truncated while parsing in Timestamp#new on 7.x).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7.17.0

irb(main):007:0> LogStash::Timestamp.new(Time.at(946702800.12)).to_s
=> "2000-01-01T05:00:00.120Z"

8.1.1

irb(main):038:0> LogStash::Timestamp.new(Time.at(946702800.12)).to_s
=> "2000-01-01T05:00:00.120000004Z"

LogStash::Timestamp of 2000-01-01T05:00:00.12Z is consistent with the patch. However, LogStash::Timestamp( Time -> RubyTime -> Instant ), RubyTime convert .12 to millisecond and 4 nanasecond.
I think Logstash core is a better place to handle this case. I believe most of the use cases are converting from datetime string, except customized script in ruby filter, so the patch is still valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaauie I have created an issue to track the timestamp elastic/logstash#14088
Do you think we can merge the patch now?

hash = OpenSSL::HMAC.hexdigest(digest, key, logstash_epoch_time).force_encoding(Encoding::UTF_8)
expect(fingerprint).to eq hash
end
end

describe 'MURMUR3 Fingerprinting' do
let(:fingerprint_method) { "MURMUR3" }
let(:data) { { "@timestamp" => epoch_time } }
it "fingerprints the timestamp correctly" do
expect(fingerprint).to eq(743372282)
expect(fingerprint).to eq MurmurHash3::V32.str_hash(logstash_epoch_time)
end
end

describe 'MURMUR3_128 Fingerprinting' do
let(:fingerprint_method) { "MURMUR3_128" }
let(:data) { { "@timestamp" => epoch_time } }
it "fingerprints the timestamp correctly" do
expect(fingerprint).to eq("37785b62a8cae473acc315d39b66d86e")
expect(fingerprint).to eq MurmurHash3::V128.str_hexdigest(logstash_epoch_time, 2)
end
end
end
Expand Down