Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Remove duplicated number_of_workers settings from the custom logs integration #7319

Merged

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Aug 9, 2023

What does this PR do?

Addresses two distinct problems happening to the CloudWatch and S3 integrations.

CloudWatch integration

In a previous PR, I added an extra number_of_workers advanced configuration setting by mistake while adding it to a group of CloudWatch-based integrations missing it.

This change applies to following changes:

  • Removes the extra definition.
  • Reuse the later setting description because it contains more details.

S3 integration

If we set the "Bucket List Prefix" option, the number_of_workers is defined twice creating the "duplicated mapping key" error.

This PR re-groups the S3 settings to avoid defining the number_of_workers setting multiple times.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

TBA

I added an extra `number_of_workers` advanced configuration setting by
mistake while adding it to a group of CloudWatch-based integrations
missing it.

This change removes the extra definition.

I used the later setting description because it contains more details.
@zmoog zmoog added bug Something isn't working, use only for issues Integration:aws AWS labels Aug 9, 2023
@zmoog zmoog self-assigned this Aug 9, 2023
@elasticmachine
Copy link

elasticmachine commented Aug 9, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-08-21T17:54:55.174+0000

  • Duration: 22 min 52 sec

Test stats 🧪

Test Results
Failed 0
Passed 3
Skipped 0
Total 3

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Aug 9, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚 5.556
Classes 100.0% (0/0) 💚 5.556
Methods 100.0% (3/3) 💚 14.047
Lines 100.0% (0/0) 💚 13.976
Conditionals 100.0% (0/0) 💚

@zmoog zmoog force-pushed the zmoog/aws-logs-remove-extra-number-of-workers branch from 0bf8f8c to 40a502f Compare August 9, 2023 10:21
@zmoog zmoog changed the title [AWS] Remove the accidental extra number_of_workers settings from the custom logs integration [AWS] Remove duplicated number_of_workers settings from the custom logs integration Aug 9, 2023
@zmoog
Copy link
Contributor Author

zmoog commented Aug 9, 2023

@kaiyan-sheng, I would love to hear from you about both changes, particularly the S3 one. Please take a look at this draft when you have time 🙇

@zmoog zmoog force-pushed the zmoog/aws-logs-remove-extra-number-of-workers branch from 40a502f to 0e206ea Compare August 9, 2023 11:26
I am trying to group the S3 options by source:

- sqs queue
- bucket
  - aws bucket arn
  - non-aws bucket name

With this approach, we can define shared options like
`number_of_workers` only once. This should streamline the options and
avoid duplicated definitions.
@zmoog zmoog force-pushed the zmoog/aws-logs-remove-extra-number-of-workers branch from 0e206ea to 77d4e0f Compare August 9, 2023 11:35
@zmoog
Copy link
Contributor Author

zmoog commented Aug 9, 2023

@kaiyan-sheng, in particular, I don't understand the role of this part:

{{#unless queue_url}}
{{#unless non_aws_bucket_name}}
{{#unless bucket_arn }}
{{#if bucket_list_prefix }}
bucket_list_prefix: {{ bucket_list_prefix }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}
{{#if bucket_list_interval }}
bucket_list_interval: {{ bucket_list_interval }}
{{/if}}
{{/unless}}
{{/unless}}
{{/unless}}

Since the input requires one of them to work properly:

if len(enabled) == 0 {
	return errors.New("neither queue_url, bucket_arn nor non_aws_bucket_name were provided")
}

But I may missing something.

@andrewkroh
Copy link
Member

andrewkroh commented Aug 9, 2023

This is another example of how elastic/package-spec#421 would prevent bugs. It's not quite shadowing, but it's a direct duplicate.

@andrewkroh
Copy link
Member

This fixes part of #6148.

@kaiyan-sheng
Copy link
Contributor

@zmoog Seems like we added bucket_list_prefix in elastic/beats#28252 a long time ago but didn't fix the config.go to check for this additional config option.

@zmoog
Copy link
Contributor Author

zmoog commented Aug 9, 2023

Seems like we added bucket_list_prefix in elastic/beats#28252 a long time ago but didn't fix the config.go to check for this additional config option.

Why set bucket_list_prefix only when queue_url, bucket_arn, and non_aws_bucket_name are not set?

 {{#unless queue_url}} 
 {{#unless non_aws_bucket_name}} 
 {{#unless bucket_arn }} 

 {{#if bucket_list_prefix }} 
 bucket_list_prefix: {{ bucket_list_prefix }} 
 {{/if}} 

 ...

 {{/unless}} 
 {{/unless}} 
 {{/unless}} 

This input needs strictly one of the following:

  • queue_url
  • bucket_arn
  • non_aws_bucket_name

And I guess the bucket_list_prefix only works when the input is polling the bucket content and not when it's processing SQS notifications.

@kaiyan-sheng
Copy link
Contributor

This config parameter for ListObjects API to limit the response to objects that only have this bucket_list_prefix. I don't think this should be limited by {{#unless queue_url}} {{#unless non_aws_bucket_name}} {{#unless bucket_arn }} because this config should work for both when s3 input is used with SQS and when s3 input is used directly with S3 bucket. @P1llus Do you recall use case for specifying prefix with both s3-sqs and s3?

@zmoog
Copy link
Contributor Author

zmoog commented Aug 9, 2023

If all queue_url, bucket_arn, and non_aws_bucket_name are NOT set, the input will not work. So why setting bucket_list_prefix?

@kaiyan-sheng
Copy link
Contributor

If all queue_url, bucket_arn, and non_aws_bucket_name are NOT set, the input will not work. So why setting bucket_list_prefix?

Yes that's what I mean. It should not be only available when queue_url, bucket_arn, and non_aws_bucket_name are NOT set. This bucket_list_prefix is to specify the prefix for objects in a S3 bucket so we need either sqs queue URL set to point to the S3 or give a S3 bucket directly.

@zmoog zmoog marked this pull request as ready for review August 15, 2023 15:33
@zmoog zmoog requested a review from a team as a code owner August 15, 2023 15:33
zmoog and others added 2 commits August 21, 2023 18:18
I am trying to make template more readable.
@zmoog
Copy link
Contributor Author

zmoog commented Aug 21, 2023

It should not be only available when queue_url, bucket_arn, and non_aws_bucket_name are NOT set. This bucket_list_prefix is to specify the prefix for objects in a S3 bucket so we need either sqs queue URL set to point to the S3 or give a S3 bucket directly.

It seems the aws-s3 input only uses the bucket_list_prefix config option in the polling scenario. So it seems okay to only set it in the shared polling option section.

@zmoog
Copy link
Contributor Author

zmoog commented Aug 21, 2023

@kaiyan-sheng, do you think this change is complete, or am I overlooking something? What are the next steps?

Please, help me finding all the test scenario I need to cover.

Here's what I can think of right now:

  • Poll from an AWS bucket
  • Poll from a non-AWS bucket
  • Process objects creation notifications from an SQS queue

@zmoog
Copy link
Contributor Author

zmoog commented Aug 21, 2023

/test

@zmoog
Copy link
Contributor Author

zmoog commented Aug 21, 2023

Manual tests

Here are the input config from an agent policy used for testing che changes in this PR.

@kaiyan-sheng, are there others settings I should consider adding?

Poll from an AWS bucket

inputs:
  - id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
    name: aws_logs-1
    revision: 5
    type: aws-s3
    use_output: default
    meta:
      package:
        name: aws_logs
        version: 0.5.1
    data_stream:
      namespace: default
    package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
    streams:
      - id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
        data_stream:
          dataset: aws_logs.generic
        access_key_id: <REDACTED>
        secret_access_key: <REDACTED>
        parsers: null
        sqs.max_receive_count: 5
        max_bytes: 10MiB
        max_number_of_messages: 5
        tags:
          - forwarded
        publisher_pipeline.disable_host: true
        file_selectors: null
        bucket_arn: mbranca-esf-logs
        bucket_list_prefix: 2023-02-14-13-41-08-79BF7A8FA7821B47_D_6
        number_of_workers: 5
        sqs.wait_time: 20s
        bucket_list_interval: 120s

Process objects creation notifications from an SQS queue

inputs:
  - id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
    name: aws_logs-1
    revision: 6
    type: aws-s3
    use_output: default
    meta:
      package:
        name: aws_logs
        version: 0.5.1
    data_stream:
      namespace: default
    package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
    streams:
      - id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
        data_stream:
          dataset: aws_logs.generic
        file_selectors: null
        access_key_id: <REDACTED>
        queue_url: 'https://sqs.eu-west-1.amazonaws.com/1234567890/mbranca-esf-logs'
        secret_access_key: <REDACTED>
        parsers: null
        sqs.wait_time: 20s
        sqs.max_receive_count: 5
        max_bytes: 10MiB
        max_number_of_messages: 5
        tags:
          - preserve_original_event
          - forwarded
        publisher_pipeline.disable_host: true

@kaiyan-sheng
Copy link
Contributor

@mauiroma Input configs and test cases look good to me. Thanks for working on it!

@zmoog
Copy link
Contributor Author

zmoog commented Aug 21, 2023

I have one more thing to test.

Poll from a non-AWS bucket

I created a non-AWS bucket using the S3-compatible service Object Storage from Linode (check the public note zmoog/public-notes#46 for more details.

I used the following aws-s3 settings from the agent policy:

inputs:
  - id: aws-s3-aws_logs-afe56f1c-6312-411f-a8c8-b369327c943f
    name: aws_logs-1
    revision: 9
    type: aws-s3
    use_output: default
    meta:
      package:
        name: aws_logs
        version: 0.5.1
    data_stream:
      namespace: default
    package_policy_id: afe56f1c-6312-411f-a8c8-b369327c943f
    streams:
      - id: aws-s3-aws_logs.generic-afe56f1c-6312-411f-a8c8-b369327c943f
        data_stream:
          dataset: aws_logs.generic
        access_key_id: <REDACTED>
        secret_access_key: <REDACTED>
        parsers: null
        sqs.max_receive_count: 5
        max_bytes: 10MiB
        non_aws_bucket_name: mbranca-esf-logs
        max_number_of_messages: 5
        tags:
          - preserve_original_event
          - forwarded
        publisher_pipeline.disable_host: true
        file_selectors: null
        endpoint: 'https://eu-central-1.linodeobjects.com'
        bucket_list_prefix: 2023-02-14-13-41-08-79BF7A8FA7821B47_D
        number_of_workers: 5
        sqs.wait_time: 20s
        bucket_list_interval: 120s

Then I uploaded a couple of access logs files from my collection:

CleanShot 2023-08-22 at 00 24 44@2x

And here is the result in Elasticsearch:

CleanShot 2023-08-22 at 00 24 51@2x

@zmoog zmoog merged commit 9875be8 into elastic:main Aug 21, 2023
@zmoog zmoog deleted the zmoog/aws-logs-remove-extra-number-of-workers branch August 21, 2023 22:34
@zmoog zmoog restored the zmoog/aws-logs-remove-extra-number-of-workers branch August 21, 2023 22:41
@zmoog zmoog deleted the zmoog/aws-logs-remove-extra-number-of-workers branch August 21, 2023 22:41
@andrewkroh andrewkroh added Integration:aws_logs Custom AWS Logs and removed Integration:aws AWS labels Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, use only for issues Integration:aws_logs Custom AWS Logs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants