-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS] Remove duplicated number_of_workers
settings from the custom logs integration
#7319
[AWS] Remove duplicated number_of_workers
settings from the custom logs integration
#7319
Conversation
I added an extra `number_of_workers` advanced configuration setting by mistake while adding it to a group of CloudWatch-based integrations missing it. This change removes the extra definition. I used the later setting description because it contains more details.
🌐 Coverage report
|
0bf8f8c
to
40a502f
Compare
number_of_workers
settings from the custom logs integrationnumber_of_workers
settings from the custom logs integration
@kaiyan-sheng, I would love to hear from you about both changes, particularly the S3 one. Please take a look at this draft when you have time 🙇 |
40a502f
to
0e206ea
Compare
I am trying to group the S3 options by source: - sqs queue - bucket - aws bucket arn - non-aws bucket name With this approach, we can define shared options like `number_of_workers` only once. This should streamline the options and avoid duplicated definitions.
0e206ea
to
77d4e0f
Compare
@kaiyan-sheng, in particular, I don't understand the role of this part: integrations/packages/aws_logs/data_stream/generic/agent/stream/aws-s3.yml.hbs Lines 43 to 57 in 1b8de56
Since the input requires one of them to work properly:
But I may missing something. |
This is another example of how elastic/package-spec#421 would prevent bugs. It's not quite shadowing, but it's a direct duplicate. |
This fixes part of #6148. |
@zmoog Seems like we added |
Why set
This input needs strictly one of the following:
And I guess the |
This config parameter for |
If all |
Yes that's what I mean. It should not be only available when queue_url, bucket_arn, and non_aws_bucket_name are NOT set. This bucket_list_prefix is to specify the prefix for objects in a S3 bucket so we need either sqs queue URL set to point to the S3 or give a S3 bucket directly. |
Co-authored-by: Davide Girardi <[email protected]>
I am trying to make template more readable.
It seems the aws-s3 input only uses the |
@kaiyan-sheng, do you think this change is complete, or am I overlooking something? What are the next steps? Please, help me finding all the test scenario I need to cover. Here's what I can think of right now:
|
/test |
Manual testsHere are the input config from an agent policy used for testing che changes in this PR. @kaiyan-sheng, are there others settings I should consider adding? Poll from an AWS bucketinputs:
- id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
name: aws_logs-1
revision: 5
type: aws-s3
use_output: default
meta:
package:
name: aws_logs
version: 0.5.1
data_stream:
namespace: default
package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
streams:
- id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
data_stream:
dataset: aws_logs.generic
access_key_id: <REDACTED>
secret_access_key: <REDACTED>
parsers: null
sqs.max_receive_count: 5
max_bytes: 10MiB
max_number_of_messages: 5
tags:
- forwarded
publisher_pipeline.disable_host: true
file_selectors: null
bucket_arn: mbranca-esf-logs
bucket_list_prefix: 2023-02-14-13-41-08-79BF7A8FA7821B47_D_6
number_of_workers: 5
sqs.wait_time: 20s
bucket_list_interval: 120s Process objects creation notifications from an SQS queueinputs:
- id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
name: aws_logs-1
revision: 6
type: aws-s3
use_output: default
meta:
package:
name: aws_logs
version: 0.5.1
data_stream:
namespace: default
package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
streams:
- id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
data_stream:
dataset: aws_logs.generic
file_selectors: null
access_key_id: <REDACTED>
queue_url: 'https://sqs.eu-west-1.amazonaws.com/1234567890/mbranca-esf-logs'
secret_access_key: <REDACTED>
parsers: null
sqs.wait_time: 20s
sqs.max_receive_count: 5
max_bytes: 10MiB
max_number_of_messages: 5
tags:
- preserve_original_event
- forwarded
publisher_pipeline.disable_host: true |
@mauiroma Input configs and test cases look good to me. Thanks for working on it! |
I have one more Poll from a non-AWS bucketI created a non-AWS bucket using the S3-compatible service Object Storage from Linode (check the public note zmoog/public-notes#46 for more details. I used the following aws-s3 settings from the agent policy: inputs:
- id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
name: aws_logs-1
revision: 9
type: aws-s3
use_output: default
meta:
package:
name: aws_logs
version: 0.5.1
data_stream:
namespace: default
package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
streams:
- id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
data_stream:
dataset: aws_logs.generic
access_key_id: <REDACTED>
secret_access_key: <REDACTED>
parsers: null
sqs.max_receive_count: 5
max_bytes: 10MiB
non_aws_bucket_name: mbranca-esf-logs
max_number_of_messages: 5
tags:
- preserve_original_event
- forwarded
publisher_pipeline.disable_host: true
file_selectors: null
endpoint: 'https://eu-central-1.linodeobjects.com'
bucket_list_prefix: 2023-02-14-13-41-08-79BF7A8FA7821B47_D
number_of_workers: 5
sqs.wait_time: 20s
bucket_list_interval: 120s Then I uploaded a couple of access logs files from my collection: And here is the result in Elasticsearch: |
What does this PR do?
Addresses two distinct problems happening to the CloudWatch and S3 integrations.
CloudWatch integration
In a previous PR, I added an extra
number_of_workers
advanced configuration setting by mistake while adding it to a group of CloudWatch-based integrations missing it.This change applies to following changes:
S3 integration
If we set the "Bucket List Prefix" option, the
number_of_workers
is defined twice creating the "duplicated mapping key" error.This PR re-groups the S3 settings to avoid defining the
number_of_workers
setting multiple times.Checklist
changelog.yml
file.How to test this PR locally
TBA