Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Normalize replace_all_patterns function behavior #32896

Open
krokwen opened this issue May 7, 2024 · 9 comments
Open

[pkg/ottl] Normalize replace_all_patterns function behavior #32896

krokwen opened this issue May 7, 2024 · 9 comments
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@krokwen
Copy link

krokwen commented May 7, 2024

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

I'm implementing log sanitizing processor based on transformprocessor

One of goals - replace value in all 'token' fields with 'redacted' string.

First obvious solution is to use replace_all_patterns(attributes, "key", "*.token.*", "redacted") - but it will just rename the matched keys
If we look into function source, we will found that this behavior is changing if we use converting function in there : replace_all_patterns(attributes, "key", "*.token.*", "redacted", SHA1) - Now, the keys are left as is, but value is changed, with one note - the value isn't a sha1 hash of original value, but a hash of 'redacted' string that doesn't make sense.

Describe the solution you'd like

Split it into two functions 'replace_all_keys' and 'replace_all_values', let them accept also 'key' and 'value' mode and apply them to only keys or only values. E.g:

- replace_all_keys(attributes, "key", "*.token.*", "redacted") # this will replace the key with replacement if key pattern matches
- replace_all_keys(attributes, "value", "*.token.*", "redacted") # this will replace the key with replacement if value matches
- replace_all_values(attributes, "key", "*.token.*", "redacted") # this will replace value with replacement if key matches
- replace_all_values(attributes, "value", "*.token.*", "redacted") # this will replace value with replacement if value matches

This approach will allow to use any converting functions without affecting the behavior

Describe alternatives you've considered

My final solution is to apply SHA1 function with replacement format and the replace it one more time with 'redacted' string because it doesn't make sense to store a hashsum of replacement string.

- replace_all_patterns(attributes, "key", ".*token.*", "redacted", SHA1, "redacted %s")
- replace_all_patterns(attributes, "value", "^redacted.*", "redacted")

Additional context

No response

@krokwen krokwen added enhancement New feature or request needs triage New item requiring triage labels May 7, 2024
Copy link
Contributor

github-actions bot commented May 7, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@krokwen krokwen changed the title Normalize replace_all_patterns function behavior [pkg/ottl] Normalize replace_all_patterns function behavior May 7, 2024
@TylerHelmuth
Copy link
Member

You've come upon something OTTL doesn't do well: conditions on specific items in a dynamic list.

Ultimately the way we wish we could handle this situation is with something like:

replace_all_patterns(attributes, "values", "redacted") where IsMatch(<key from attributes>, ".*token.*")

which we cannot express yet. I think #29289 would give us the solution.

Your proposed solution would also work, but I'd rather solve the underlying problem which is that we cannot express how to access a map's key(s) in a condition.

@TylerHelmuth
Copy link
Member

Side note, that is a very clever work around, one that I would consider a bug. I think we should patch that once we have a real solution for the kind of transformation you need. In my opinion the expected outcome of using function or replacePattern in replace_all_patterns with mode=key should be to run the new key value through the function.

@TylerHelmuth TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels May 7, 2024
@TylerHelmuth
Copy link
Member

@rnishtala-sumo curious on your thoughts.

@rnishtala-sumo
Copy link
Contributor

rnishtala-sumo commented May 8, 2024

One thing about the issue reported and maybe this isn't obvious from the docs, is that at present it makes sense to use the optional function with matched group/s in the replacement string.

For example:

replace_pattern(name, "^kube_([0-9A-Za-z]+_)", "$$1.", SHA256, "k8s.%s")

The above hashes a substring from the original.

It then makes sense that one would want to apply the function on the replacement string.

In my opinion the expected outcome of using function or replacePattern in replace_all_patterns with mode=key should be to run the new key value through the function.

I agree, the function is applied to the key, but seems to replace the value with the updated key which is an issue.

Copy link
Contributor

github-actions bot commented Jul 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jordankay13
Copy link

jordankay13 commented Aug 30, 2024

An alternative workaround to the one posted is to use the cache map. We also update the key with a redaction.redacted prefix, but here's ours:

          # Copy all attributes to the cache
          - merge_maps(cache, attributes, "insert")
          # Keep all allowed attributes
          - keep_matching_keys(attributes, "${env:ALLOWED_KEYS_REGEX}")
          # Delete all allowed attributes from the cache
          - delete_matching_keys(cache, "${env:ALLOWED_KEYS_REGEX}")
          # Redact all attributes in the cache
          - 'replace_all_matches(cache, "*", "redacted: https://github.com/......../allowed-keys.txt")'
          # Update key with redaction prefix
          - replace_all_patterns(cache, "key", "^(.*)$$", "redaction.redacted.$$1")
          # Merge redacted attributes from cache back into span attributes
          - merge_maps(attributes, cache, "insert")

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 30, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024
@TylerHelmuth TylerHelmuth reopened this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
Development

No branches or pull requests

4 participants