Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

Closed
beniwohli opened this issue Sep 12, 2019 · 18 comments · Fixed by #333
Closed

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

beniwohli opened this issue Sep 12, 2019 · 18 comments · Fixed by #333

Comments

@beniwohli
Copy link
Contributor

beniwohli commented Sep 12, 2019

Description of the issue

Most (all?) agents have a config option ignore_urls (or similar, see below) to ignore certain transactions by URL, using pattern matching. Unfortunately, there are subtle and not-so-subtle differences across agents:

  • Java and Go: a list of wildcard matchers
  • Node: A list of items that can be either regexes or strings
  • Ruby: a list of regexes. Also, the option is called ignore_url_patterns
  • Python: A list of regexes. Also, the option is called transaction_ignore_patterns. Oh, and it's matched against the transaction name, not the URL (historically, this is so it can also be used for non-HTTP transactions, like background jobs)
  • then I stopped...

These differences in both naming and semantics make it difficult to use this config option in a remote config scenario. Therefore, we should harmonize.

Semantics is probably the easier of the two: I propose that we follow the lead of the Java and Go agent, and use wildcard matching on the path part of the URL itself (not the parametrized URL pattern).

As for naming, using IGNORE_URLS or IGNORE_URL_PATTERNS poses the difficulty that they are already used with different semantics by some agents (the former by Node, the latter by Ruby). Therefore, I propose to use TRANSACTION_IGNORE_URLS, assuming that all agents only apply this config option to transactions, not errors or metrics (if that's not the case in your current implementation, please speak up!).

What we are voting on

All agents will add a new config option, TRANSACTION_IGNORE_URLS, which will use glob matching. It will be case insensitive by default. There is no required deprecation path for old config values, each agent team can decide that for themselves.

Vote

Agent Yes No Indifferent N/A Link to agent issue Central Kibana Config
.NET
elastic/apm-agent-dotnet#688
Go
elastic/apm-agent-go#792
Java
elastic/apm-agent-java#1313
Node.js
elastic/apm-agent-nodejs#1689
Python
elastic/apm-agent-python#772
Ruby
elastic/apm-agent-ruby#835
RUM
PHP
elastic/apm-agent-php#82
@mikker
Copy link
Contributor

mikker commented Sep 12, 2019

I agree on adding TRANSACTION_ to the option name to be explicit about what it actually does.

I guess if it ends in _URL it should probably only match URLs and not transaction names? These are not alike in Ruby (nor Java AFAIK).

Wildcard matchers are probably a better idea than regexes IMO. I suppose even only supporting * could be enough?

@mikker
Copy link
Contributor

mikker commented Sep 12, 2019

Strictly speaking, I think we're matching paths and not URLs? /ping and not http://example.com/ping. /ping would work although not a full URL.

@beniwohli
Copy link
Contributor Author

beniwohli commented Sep 12, 2019

@mikker regarding wildcards, I think we should all try and match the behavior of WildcardMatcher in the Java agent, including optional case insensitivity.

Regarding URL, yes, my proposal is to match against the "raw" path, thanks for the clarification. I'll update the description.

@hmdhk
Copy link
Contributor

hmdhk commented Sep 18, 2019

IMO, the IGNORE_URLS as is defined for the backend agents (which applies to incoming request) is not a good fit for the RUM agent. Please raise your concerns regarding this and we can reconsider.

@beniwohli
Copy link
Contributor Author

@jahtalab can you go into a bit more details of why you think it isn't a good fit? You talked a bit about it in our agents meeting, but for posterity's sake, and possibly as a base for further discussion, it would be nice if you wrote it up.

Personally, I think the most important thing we need to make sure is harmonized amongst agents is user expectations. If I add /admin/* to TRANSACTION_IGNORE_URLS, I don't want to see any transactions from that URL. If that's what happens on the RUM agent, I think it would be fine to use the same setting there, even if the exact interpretation and implementation may be somewhat different.

@hmdhk
Copy link
Contributor

hmdhk commented Sep 18, 2019

Conceptually I tend to separate the configuration used for incoming request in the backend and the url of a page visited in the browser. Having two different meaning for the same configuration option tends to create problems down the line. For example, in the backend agent the request URL is used by default as the transaction name but this is not the case for the RUM agent and we let the user to set the name of the page load transaction.

Furthermore, the RUM agent has ignoreTransactions config option which is a better fit since a transaction is not always guaranteed to have a url (e.g. route change transactions) but it will always have a name and at the same time the RUM agent doesn't face the challenges backend agents face with not being able to filter out transactions by name.

@basepi
Copy link
Contributor

basepi commented Jan 14, 2020

Updated the "What we are voting on" piece of the issue, with the decisions made in today's agents meeting. I made an executive decision that it should be case insensitive by default. If you disagree please comment here. :)

@felixbarny
Copy link
Member

As "glob matching" or wildcard matching can mean lots of things, let's define that as well.

My take on it:

The wildcard character *, matches zero or more characters. A wildcard may be at the beginning, in the middle or at the end of a matcher. Examples: /foo/*/bar/*/baz*, *foo*.
Matching is case insensitive by default. Prepending a matcher with (?-i) makes the matching case sensitive.
Single character wildcards (?) are not supported.
There is no escape character, so matching a literal * is not supported.

These constraints make it reasonable to both implement a matcher from scratch, like in https://github.com/elastic/apm-agent-java/blob/master/apm-agent-core/src/main/java/co/elastic/apm/agent/matcher/WildcardMatcher.java and to build a wildcard matcher based on regex. The latter would work by regex-quoting the input string and then replacing \* with .*.

@basepi
Copy link
Contributor

basepi commented Jan 14, 2020

@felixbarny How worried are we about "accidentally" allowing for things like single character wildcards?

Python has fnmatch which is usually used for globbing. But I don't think you can turn features off, so should we plan on creating our own implementation or have hidden behavior? I'm not convinced that the more constrained specification is worth implementing it from scratch in each agent, even if it's only a few hundred lines of code.

@axw
Copy link
Member

axw commented Jan 15, 2020

If there are quirks, people will inevitably come to rely on them (insert XKCD spacebar comic here). We could just say that they shouldn't do that, but I think it would be preferable to be consistent and exact. I think you could still use fnmatch, replacing ? with [?] and [ with [[], and get consistent behaviour.

@beniwohli
Copy link
Contributor Author

FWIW, we already do have an implementation of wildcard matching in the Python agent

https://github.com/elastic/apm-agent-python/blob/793800f698a427d4df6208e8ce1d7ecc304b6639/elasticapm/utils/__init__.py#L151

@mikker
Copy link
Contributor

mikker commented Jan 15, 2020

I copied Python's approach for the Ruby agent, basically pulling all the *s, escaping anything else that would have special meaning, putting .* where the *s were, converting to regex.

(Very smart, @beniwohli!)

(Update: I see you described the approach already, Felix. Still smart 😉)

@felixbarny
Copy link
Member

@elastic/apm-agent-devs please leave your vote in the issue description 🙂

@beniwohli
Copy link
Contributor Author

Should we only match against the path, or also the query string?

Pro include query string: easy to add something like ?ignore_apm=true to any URL, e.g. for uptime checkers
Contra include query string: any URL with a query string would break the matching unless the user doesn't append a * to every configured URL.

I'm tending towards not including the query string

@felixbarny
Copy link
Member

I'm tending towards not including the query string

++

@gregkalapos
Copy link
Contributor

gregkalapos commented Jul 16, 2020

As it seems there is an agreement on this with the new config name and behavior described in the description. (Except RUM, it's different...).

Some agents already linked implementation issues, I suggest the rest also creates and links those and we close this and start implementing it soon.

Reason I bring this up is that people are asking for this in .NET and I waited to see what we agree on here (I expect PHP will have the same situation). I will implement it for .NET according to the description above, but as it seems .NET will be the only one doing it this way and I see a little bit of a risk that we'll end up now with N+1 way of doing this.

So in sum: if you agree please add implementation issue for your corresponding agent and schedule it; and let's close this one since we agree on the new name and behavior which is in the issue description.

@gregkalapos
Copy link
Contributor

gregkalapos commented Jul 28, 2020

We discussed this is today's agent meeting.

  • The issue description has the new name and the definition of the setting
  • No objection to it, we go with that
  • Opened implementation issues for all remaining agents
  • RUM might consider supporting this option as the transaction names are set based on URL by default
  • Currently targeted for 7.9 7.10.

@felixbarny
Copy link
Member

When implementing this change, agents should test against this common set of test cases to ensure interoperability: #313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants