Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat] Add lowercase_fields and uppercase_fields processors #34022

Closed
wants to merge 1 commit into from

Conversation

davidifr
Copy link
Contributor

What does this PR do?

The PR adds lowercase and uppercase processors to libbeat.

Why is it important?

It's a basic requirement that came up from multiple users which are currently using different solutions in order to work around this missing feature (using script or rename processors for example).

Checklist

  • [ V] My code follows the style guidelines of this project
  • [ V] I have commented my code, particularly in hard-to-understand areas
  • [ V] I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • [ V] I have added tests that prove my fix is effective or that my feature works
  • [ V] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Users who desire to lowercase/uppercase some fields are currently required to use the script or rename processors.
With this PR, users will no longer need to use the above mentioned work arounds.

Screenshots

Logs

@davidifr davidifr requested a review from a team as a code owner December 12, 2022 14:36
@davidifr davidifr requested review from rdner and faec and removed request for a team December 12, 2022 14:36
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 12, 2022
@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @davidifr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

The PR is not allowed to run in the CI yet

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2022-12-12T14:36:58.913+0000

  • Duration: 5 min 14 sec

Steps errors 2

Expand to view the steps failures

Load a resource file from a library
  • Took 0 min 0 sec . View more details here
  • Description: approval-list/elastic/beats.yml
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: githubApiCall: The REST API call https://api.github.com/orgs/elastic/members/davidifr return the message : java.lang.Exception: httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/davidifr : httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/davidifr : Code: 404Error: {"message":"User does not exist or is not a member of the organization","documentation_url":"https://docs.github.com/rest/reference/orgs#check-organization-membership-for-a-user"}

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
41.2% 41.2% Duplication

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Dec 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 12, 2022
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Dec 12, 2022
Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution and especially for the full test coverage.

I think we should definitely improve the docs and naming of the processors. And we could refactor these 2 processor implementation into one parameterising the field changing function to avoid code duplication.

Comment on lines +22 to +27
[source,json]
-------------------------------------------------------------------------------
{
"a.b": {}
}
-------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is very confusing.

  1. What is it supposed to demonstrate? Is it the end result?
  2. Does it lowercase the last field name (after the last .) or all the keys on the path? We need more examples covering these edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, the end result.
  2. only the last field in a given path.

I agree, I should definitely improve the documentation and provide more examples.

Comment on lines +22 to +27
[source,json]
-------------------------------------------------------------------------------
{
"a.B": {}
}
-------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment from the lowercase doc applies here. We need more examples "before" and "after" and to explicitly explain what segments of the dot notated paths are affected.

Copy link
Contributor Author

@davidifr davidifr Dec 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, before and after cases would make a good examples for users.


var lower string
if strings.ContainsRune(field, '.') {
// In case of nested fields provided, we need to make sure to only modify the latest field in the chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my earlier comments I meant that this should have been added to the docs.

Also, how are we sure it's the desired behaviour for most of the users? Why not all the keys in the path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it to the docs as well.

Regarding the desired behavior, I think lowercasing/uppercasing all the keys in the path is kind of restricting in case of a user only wanting to apply the action to one specific field,
While providing the ability to only apply the action to the latest field, may give the user a more flexibility and control in to which fields does the user want to apply the action to.

What do you think?

Comment on lines +110 to +112
lastIndexRuneFunc := func(r rune) bool { return r == '.' }
idx := strings.LastIndexFunc(field, lastIndexRuneFunc)
lower = field[:idx+1] + strings.ToLower(field[idx+1:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it's worth using runes here but I think it's okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wasn't sure it's required either, I will change that.

<titleabbrev>lowercase_fields</titleabbrev>
++++

The `lowercase_fields` processor specifies a list of fields to lowercase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this I thought it would lowercase the values, not keys. I think we should explicitly explain what this processor is changing. The same goes for the upercase-fields processor.

"github.com/pkg/errors"
)

type lowerCaseProcessor struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be called lowerCaseFieldsProcessor in case we would want lowerCaseValuesProcessor in the future.

"github.com/pkg/errors"
)

type upperCaseProcessor struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be called upperCaseFieldsProcessor in case we would want upperCaseValuesProcessor in the future.

}

for _, field := range p.Fields {
if err := p.upperCaseField(event, field); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code duplication in these two processors. The only part that differs is what function we're applying to the field name (last segment of the path).

I think it would be better to implement a changeFieldProcessor and a parameter called changeFunc. We would avoid a lot of code duplication if we did this.

It will be still exposed as upperCaseFieldProcessor and lowerCaseFieldProcessor for external use but in their constructor function they would just create and return a changeFieldProcessor with toLower or toUpper function as a parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, since this comment and your first comment are kind of confusing to me.

You mentioned on your first comment that these 2 processors can be replaced by one so why do I need to keep their *CaseFieldProcessor struct?

Also, where should I put the changeFieldProcessor? should I create it in another file?
Or should I create a new file change_fields.go that will include the changeFieldProcessor struct with both of the lowerCase/upperCase constructors that will return a changeFieldProcessor each with their relevant changeFunc?

@rdner rdner added the Filebeat Filebeat label Dec 16, 2022
@mergify
Copy link
Contributor

mergify bot commented Jan 11, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b issue-22254 upstream/issue-22254
git merge upstream/main
git push upstream issue-22254

@mr1716
Copy link
Contributor

mr1716 commented Jan 20, 2023

@davidifr any sense of when development will be continued? If not, no big deal.

@cmacknz
Copy link
Member

cmacknz commented Apr 20, 2023

Closing this one. Please reopen it when it ready for review again.

@cmacknz cmacknz closed this Apr 20, 2023
@mr1716
Copy link
Contributor

mr1716 commented May 25, 2023

@davidifr have you had a chance to work on this recently?

@zez3
Copy link

zez3 commented Sep 20, 2024

I think @davidifr is no longer an active account. It would be nice of someone would take over and finish this

@mr1716
Copy link
Contributor

mr1716 commented Sep 26, 2024

@zez3 agreed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filebeat Processors - make uppercase and lowercase processors (as in ES ingest nodes) available to filebeat
7 participants