Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compressable:Zstd comp support #4657

Merged
merged 7 commits into from
Jan 21, 2025

Conversation

Athishpranav2003
Copy link
Contributor

Which issue(s) this PR fixes:
Fixes #4162

What this PR does / why we need it:
Adds new compression method support to handle messages
Docs Changes:
TODO
Release Note:
N/A

@Athishpranav2003
Copy link
Contributor Author

@daipom @ashie need some comments on this since its a very new compression method. I tried to make sure the existing support is not broken and added this feature additionally. Need to perform some more work but wanted to get some comments from you guys

@Athishpranav2003 Athishpranav2003 marked this pull request as draft October 3, 2024 18:16
@Athishpranav2003 Athishpranav2003 force-pushed the zstd_comp_support branch 2 times, most recently from 6bf74ae to 0b79317 Compare October 3, 2024 19:04
@daipom daipom self-requested a review October 4, 2024 01:29
@daipom
Copy link
Contributor

daipom commented Oct 4, 2024

Thanks! I will see this this weekend!

@Athishpranav2003
Copy link
Contributor Author

@daipom did u get chance to look at this?

@daipom
Copy link
Contributor

daipom commented Oct 17, 2024

Sorry, I haven't made time for this. 😢
I'll make time for it this week.

@Athishpranav2003
Copy link
Contributor Author

Its fine i guess,
Even i was very busy in last 2 weeks 👍

@daipom
Copy link
Contributor

daipom commented Oct 21, 2024

@Athishpranav2003
I'm sorry for my late response. 😢

I have confirmed the entire direction!
It's great! Thanks for starting this enhancement!

Sorry I haven't made time to see the detailed implementation, such as Compressable module,
but the overall design of Fluentd would be essential for us now.
So, now, I comment on the overall direction of further modification.

It looks basically good as in_forward support!
All that is left is to support Output/Buffer/Chunk logic for output plugins. (Currently, these logic assumes gzip only)
To do so, it would be a good idea to support out_forward first.

The core logic would be the following.

  • EventStream#to_compressed_msgpack_stream
    • def to_compressed_msgpack_stream(time_int: false, packer: nil)
      packed = to_msgpack_stream(time_int: time_int, packer: packer)
      compress(packed)
      end
  • Output#generate_format_proc
    • FORMAT_MSGPACK_STREAM = ->(e){ e.to_msgpack_stream(packer: Fluent::MessagePackFactory.thread_local_msgpack_packer) }
      FORMAT_COMPRESSED_MSGPACK_STREAM = ->(e){ e.to_compressed_msgpack_stream(packer: Fluent::MessagePackFactory.thread_local_msgpack_packer) }
      FORMAT_MSGPACK_STREAM_TIME_INT = ->(e){ e.to_msgpack_stream(time_int: true, packer: Fluent::MessagePackFactory.thread_local_msgpack_packer) }
      FORMAT_COMPRESSED_MSGPACK_STREAM_TIME_INT = ->(e){ e.to_compressed_msgpack_stream(time_int: true, packer: Fluent::MessagePackFactory.thread_local_msgpack_packer) }
      def generate_format_proc
      if @buffer && @buffer.compress == :gzip
      @time_as_integer ? FORMAT_COMPRESSED_MSGPACK_STREAM_TIME_INT : FORMAT_COMPRESSED_MSGPACK_STREAM
      else
      @time_as_integer ? FORMAT_MSGPACK_STREAM_TIME_INT : FORMAT_MSGPACK_STREAM
      end
      end
  • Chunk::Decompressable
    • module Decompressable
      include Fluent::Plugin::Compressable
      def append(data, **kwargs)
      if kwargs[:compress] == :gzip
      io = StringIO.new
      Zlib::GzipWriter.wrap(io) do |gz|
      data.each do |d|
      gz.write d
      end
      end
      concat(io.string, data.size)
      else
      super
      end
      end
      def open(**kwargs, &block)
      if kwargs[:compressed] == :gzip
      super
      else
      super(**kwargs) do |chunk_io|
      output_io = if chunk_io.is_a?(StringIO)
      StringIO.new
      else
      Tempfile.new('decompressed-data')
      end
      output_io.binmode if output_io.is_a?(Tempfile)
      decompress(input_io: chunk_io, output_io: output_io)
      output_io.seek(0, IO::SEEK_SET)
      yield output_io
      end
      end
      end
      def read(**kwargs)
      if kwargs[:compressed] == :gzip
      super
      else
      decompress(super)
      end
      end
      def write_to(io, **kwargs)
      open(compressed: :gzip) do |chunk_io|
      if kwargs[:compressed] == :gzip
      IO.copy_stream(chunk_io, io)
      else
      decompress(input_io: chunk_io, output_io: io)
      end
      end
      end

It is complicated, but the compression of Buffer and Chunk and the flush process of output plugins are closely related.
The following existing implementations may be helpful for how output plugins behaves:

  • unless @as_secondary
    if @compress == :gzip && @buffer.compress == :text
    @buffer.compress = :gzip
    elsif @compress == :text && @buffer.compress == :gzip
    log.info "buffer is compressed. If you also want to save the bandwidth of a network, Add `compress` configuration in <match>"
    end
    end
  • chunk.open(compressed: @compress) do |chunk_io|
    entries = [0xdb, chunk_io.size].pack('CN')
    sock.write entries.force_encoding(Encoding::UTF_8) # 2. entries: String (str32)
    IO.copy_stream(chunk_io, sock) # writeRawBody(packed_es)
    end
  • writer = case
    when @compress_method.nil?
    method(:write_without_compression)
    when @compress_method == :gzip
    if @buffer.compress != :gzip || @recompress
    method(:write_gzip_with_compression)
    else
    method(:write_gzip_from_gzipped_chunk)
    end
    else
    raise "BUG: unknown compression method #{@compress_method}"
    end
  • def write_without_compression(path, chunk)
    File.open(path, "ab", @file_perm) do |f|
    if @need_ruby_on_macos_workaround
    content = chunk.read()
    f.puts content
    else
    chunk.write_to(f)
    end
    end
    end
    def write_gzip_with_compression(path, chunk)
    File.open(path, "ab", @file_perm) do |f|
    gz = Zlib::GzipWriter.new(f)
    chunk.write_to(gz, compressed: :text)
    gz.close
    end
    end
    def write_gzip_from_gzipped_chunk(path, chunk)
    File.open(path, "ab", @file_perm) do |f|
    chunk.write_to(f, compressed: :gzip)
    end
    end

@Athishpranav2003
Copy link
Contributor Author

@daipom thanks for the review.

Yah, this will kick a lot more changes in the overall events pipeline.
For now i have just added the compression support as a module in Compressible. And used this in in-forward. I will try to look at Buffer as well in meantime(might need more time since i got busy with a project). I guess maybe splitting it into multiple PRs will be easy since i can finish small chunks one by one(seperate functionality) and easy to review

@Athishpranav2003
Copy link
Contributor Author

@daipom can you please check the change in the Compression module as well. Once if we merge this we can proceed with the other development around this

@daipom
Copy link
Contributor

daipom commented Oct 25, 2024

@Athishpranav2003

can you please check the change in the Compression module as well

Sure! I will review the Compressable module!

Once if we merge this we can proceed with the other development around this

To judge if we can merge this only with the support of the module and in_forward, we need to reach an agreement on updating Forward Protocol Specification.

It would be necessary to update CompressedPackedForward Mode.
By allowing zstd as the value for the key compressed, it would be possible to add Zstd support while keeping compatibility, but I'm not sure yet.

@Athishpranav2003
Copy link
Contributor Author

Sure @daipom

If it's needed then I can split the PR into 2 where this will only focus on zstd and other one for in-forward

@Athishpranav2003
Copy link
Contributor Author

@daipom how to proceed in this PR?

@daipom
Copy link
Contributor

daipom commented Oct 30, 2024

Sorry, please give me a few more days 😢

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Athishpranav2003
Sorry for my late response 😢
I have commented on some points, but basically it looks good! Thanks!

About our future policies, I think it is better not to separate the PRs, as these changes need to be considered in combination with updating Forward-Protocol.

How about supporting the out_forward by implementing Output/Buffer/Chunk logic in this PR?

My concern is whether we can support Buffer/Chunk zstd compression.
To support it, the compressed data must be able to be concatenated.
gzip allows it.
It appears that we can concat zstd files without any problems, but I wonder if this is an officially supported specification of zstd and zstd-ruby gem.

$ echo "Hello world" > 1
$ echo "Hello Fluentd" > 2
$ zstd -f 1
$ zstd -f 2
$ cat 1.zst 2.zst > 3.zst
$ zstd -d 3.zst
$ cat 3
Hello world
Hello Fluentd

lib/fluent/event.rb Outdated Show resolved Hide resolved
lib/fluent/event.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/compressable.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/compressable.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/compressable.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/in_forward.rb Outdated Show resolved Hide resolved
@Athishpranav2003
Copy link
Contributor Author

bundle exec rake test TEST=test/plugin/test_compressable.rb                      2856ms  Sun 03 Nov 2024 11:29:54 AM IST
/usr/bin/ruby -w -I"lib:test" -Eascii-8bit:ascii-8bit /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb "test/plugin/test_compressable.rb" 
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/rubygems_ext.rb:250: warning: method redefined; discarding old encode_with
/usr/local/share/ruby/site_ruby/rubygems/dependency.rb:341: warning: previous definition of encode_with was here
/usr/share/ruby/bundled_gems.rb:75:in `require': libruby.so.3.2: cannot open shared object file: No such file or directory - /usr/lib64/gems/ruby/yajl-ruby-1.4.3/yajl/yajl.so (LoadError)
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /usr/share/gems/gems/yajl-ruby-1.4.3/lib/yajl.rb:1:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/lib/fluent/config/literal_parser.rb:20:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/lib/fluent/config/element.rb:18:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/test/helper.rb:42:in `<top (required)>'
	from /home/aggressive_racer1/projects/fluentd/test/plugin/test_compressable.rb:1:in `require_relative'
	from /home/aggressive_racer1/projects/fluentd/test/plugin/test_compressable.rb:1:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:21:in `block in <main>'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:6:in `select'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:6:in `<main>'
rake aborted!
Command failed with status (1): [ruby -w -I"lib:test" -Eascii-8bit:ascii-8bit /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb "test/plugin/test_compressable.rb" ]
/usr/share/gems/gems/rake-13.2.1/exe/rake:27:in `<top (required)>'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:58:in `load'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:58:in `kernel_load'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:23:in `run'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:455:in `exec'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:35:in `dispatch'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:29:in `start'
/usr/share/gems/gems/bundler-2.5.16/exe/bundle:28:in `block in <top (required)>'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
/usr/share/gems/gems/bundler-2.5.16/exe/bundle:20:in `<top (required)>'
/usr/bin/bundle:25:in `load'
/usr/bin/bundle:25:in `<main>'
Tasks: TOP => test => base_test
(See full trace by running task with --trace)

@daipom i addressed your comments but couldnt test it locally due to some issue.
yajl-ruby is already installed in my system but some other issue

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not this code work?

#4657 (comment)

lib/fluent/plugin/in_forward.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/in_forward.rb Outdated Show resolved Hide resolved
@Athishpranav2003
Copy link
Contributor Author

I guess there is a small careless
Problem is local testing is not working at all. Could help me with that later

I will check after 18th
Got packed with exams now :(

@daipom
Copy link
Contributor

daipom commented Nov 5, 2024

Hmm, these changes would be necessary, but looks like the error in your environment is caused by another reason.

#4657 (comment)

/usr/share/ruby/bundled_gems.rb:75:in `require': libruby.so.3.2: cannot open shared object file: No such file or directory - /usr/lib64/gems/ruby/yajl-ruby-1.4.3/yajl/yajl.so (LoadError)

@daipom
Copy link
Contributor

daipom commented Nov 5, 2024

I will check after 18th
Got packed with exams now :(

OK! I'm sorry my response was so slow, and it took so long.
Good luck with the exams!

@Athishpranav2003
Copy link
Contributor Author

@daipom the gem issue got resolved after update,prolly some mismatched versions,

All tests are passing now

Signed-off-by: Athish Pranav D <[email protected]>
Signed-off-by: Athish Pranav D <[email protected]>
@Athishpranav2003
Copy link
Contributor Author

Athishpranav2003 commented Nov 18, 2024

@daipom some doubts here

in chuck open method for decompress we dont have any type identification.
how to identify the chunk type while open/read method is called

def open(**kwargs, &block)
if kwargs[:compressed] == :gzip
super
else
super(**kwargs) do |chunk_io|
output_io = if chunk_io.is_a?(StringIO)
StringIO.new
else
Tempfile.new('decompressed-data')
end
output_io.binmode if output_io.is_a?(Tempfile)
decompress(input_io: chunk_io, output_io: output_io)
output_io.seek(0, IO::SEEK_SET)
yield output_io
end
end
end
def read(**kwargs)
if kwargs[:compressed] == :gzip
super
else
decompress(super)
end
end

@Athishpranav2003
Copy link
Contributor Author

@daipom can you check the PR now?

@Athishpranav2003
Copy link
Contributor Author

@daipom added UTs as well.
PR seems to be complete now
let me know if i missed anything

@daipom
Copy link
Contributor

daipom commented Nov 27, 2024

Sorry I have been busy this week 😢
Thanks! I will check it out!

@Athishpranav2003
Copy link
Contributor Author

hey @daipom sorry to ping again.
would be good if i can wrap this up soon

@daipom
Copy link
Contributor

daipom commented Dec 20, 2024

@Athishpranav2003

hey @daipom sorry to ping again. would be good if i can wrap this up soon

Sorry for my late response 😢
I was busy releasing Fluentd v1.18 and fluent-package v5.2, sorry.
Now, those are done.
I will see this in a few days.

@Athishpranav2003
Copy link
Contributor Author

Yah,
I was seeing ur contributions for a while.
Its fine for me

@daipom
Copy link
Contributor

daipom commented Dec 26, 2024

@Athishpranav2003
Sorry for my late response.
Overall, it looks great! Thanks so much!

About out_file and out_forward, it would be better that we do not support setting a different compression format from the buffer.
There would be no particular benefit in allowing it.
Those plugins should raise ConfigError if a different compression format is set (except :text).

For example, for out_forward, we should check the format as follows.

        if @buffer.compress == :text
          @buffer.compress = @compress unless @compress == :text
        else
          if @compress == :text
            log.info "buffer is compressed.  If you also want to save the bandwidth of a network, Add `compress` configuration in <match>"
          elsif @compress != @buffer.compress
            raise Fluent::ConfigError, "You cannot specify different compression formats for Buffer (Buffer: #{@buffer.compress}, Self: #{@compress})"
          end
        end

As for implementation, there appears to be no other problems.
(There are some points where it would be nice to make some things simpler, but not essential.)

If you have any other concerns, please let me know.

We need to update the protocol before we merge this.
I will consider how to proceed it.

To judge if we can merge this only with the support of the module and in_forward, we need to reach an agreement on updating Forward Protocol Specification.

It would be necessary to update CompressedPackedForward Mode. By allowing zstd as the value for the key compressed, it would be possible to add Zstd support while keeping compatibility, but I'm not sure yet.

@Athishpranav2003
Copy link
Contributor Author

@daipom i have addressed ur comments. Feel free to point out the nits as well.

@Athishpranav2003 Athishpranav2003 marked this pull request as ready for review December 26, 2024 11:16
@Athishpranav2003
Copy link
Contributor Author

@daipom
Copy link
Contributor

daipom commented Jan 6, 2025

@daipom
Copy link
Contributor

daipom commented Jan 6, 2025

@daipom we need to precisely update this section ryt? https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#compressedpackedforward-mode

Yes!!

I have made an issue for this.

Let's summarize the new protocol there, for now.

@daipom
Copy link
Contributor

daipom commented Jan 6, 2025

@daipom we need to precisely update this section ryt? https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#compressedpackedforward-mode

Yes!!

I have made an issue for this.

* [Update Forward Protocol Specification for zstd compression support #4758](https://github.com/fluent/fluentd/issues/4758)

Let's summarize the new protocol there, for now.

@Athishpranav2003
I have made a draft. Could you please check it?

#4758 (comment)

@daipom daipom added this to the v1.19.0 milestone Jan 6, 2025
daipom
daipom previously approved these changes Jan 6, 2025
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this significant improvement!!
This should be merged into the next minor version v1.19.
Also, we need to wait #4758.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks so much!

In #4758, we now have a clear direction for protocol changes, so I merge this PR.

@daipom daipom merged commit 30c3ce0 into fluent:master Jan 21, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Work-In-Progress
Development

Successfully merging this pull request may close these issues.

Add ZSTD compression support to forward plugin
2 participants