-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: can't use multipart due to large compression factor #674
Comments
Please see the new docs I have added here: fluent/fluent-bit-docs#1127 Since your compression factor is very large, the suggestion to increase A short term workaround for you might be to switch to Let me know if you have any more questions/confusions/feedback and we can discuss it. |
Suppose we use put object on and 256mb total_file_size, can we expect that the buffering will be done on disk (store_dir)? Is the store_dir_limit_size config for the latest stable version unlimited? Upon checking in https://docs.fluentbit.io/manual/v/1.9-pre/pipeline/outputs/s3 store_dir_limit_size does not exist yet. Thanks @PettitWesley. |
Also, should we use as reference the 2.1 documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 or 1.9 https://docs.fluentbit.io/manual/v/1.9-pre/ when configuring s3 output plugin if we are using public.ecr.aws/aws-observability/aws-for-fluent-bit:stable? |
@danielejuan-metr unfortunately, the AWS release has drifted from the upstream, we're mostly 1.9 based with some more recent AWS features added. Our release notes explain the net changes in each release:
our stable is now 2.31.11 that is based on 1.9.10 FLB + our custom patches, 1.9.10 had store_dir_limit_size: https://github.com/fluent/fluent-bit/tree/v1.9.10 I'm sorry for this drift... I know its not super convenient right now. Hopefully in the future we will get back to just re-releasing upstream versions.
Yes it will buffer an entire 256mb (if you upload_timeout gives it enough time to) and then compress it, and then send the compressed file all at once. |
AWS release 2.31.11/stable contains all of the AWS features you see in 2.1 for S3, so use that. |
@PettitWesley, hoping to reuse this thread for our next question regarding the s3 plugin. We see that S3 plugin output can support Workers with a limit of 1. Suppose we enable this, does the worker only receive chunks, write to store dir, compress and send to s3? If sequence above is accurate, suppose compression and sending to s3 is slow, will fluentbit engine backpressure chunks grow? (chunks will be buffered in [Service] storage.path on filesystem buffering)? |
This is largely correct, except that the send can happen in the timer thread which runs at an interval to see if any files are ready to send. Currently in the AWS Distro S3 uses async http, which means that it won't block any thread while its waiting to send.: #702 Please read this: https://docs.fluentbit.io/manual/pipeline/outputs/s3#differences-between-s3-and-other-fluent-bit-outputs I'm working on a major refactor of S3 output which should make it more reliable and make all of its operations run inside the worker thread. It will still support only 1 worker, but longer term it will enable me to add support for multiple workers.
Potentially. In practice I've never seen this. As noted above, the send happens async. The compression step does take non-trivial CPU and is synchronous... so that may slow things down. Please let me know if this makes sense and if you have any more questions. EDIT: We just checked the code and the timer thread should be running in the worker thread as well. |
For additional context of the question above, we are trying to identify the bottleneck of our test setup. From our volume testing, w/ around 27MB/s tail ingestion we see that the filesystem buffer (storage.path) is growing and we saw gigabytes of data in the buffer. We only have s3 as our output w/ worker set to 1 and compression enabled. With your response above, it seems like compression is our bottleneck since sending to s3 is async.
For confirmation as well, we use app name in our s3 log filename prefix. Does this mean that each file have their own timeout timers and file size computation? Thanks for the response @PettitWesley! -- Update: We tested without compression and throughputs were the same. Upon checking the releases in the repo, there were benchmark results up to 30mb/s. Was there any concern/issue on throughputs higher than 30mb/s? |
Describe the question/issue
We are testing out the S3 plugin for fluentbit in AWS EKS. Will it be possible to enable compression and multipart upload in the latest stable release?
With the output configuration below, fluentbit compresses the chunk and put upload is used (as logged below). We are expecting that the chunks will be compressed but multipart upload will be used until the total file size is reached. Is this a misunderstanding on our part?
Due to the behavior above, the s3 bucket contains a lot of small gz files.
Configuration
Fluent Bit Log Output
Fluent Bit Version Info
public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
Cluster Details
The text was updated successfully, but these errors were encountered: