Harmonize on name and meaning of IGNORE_URLS-ish config option #144

beniwohli · 2019-09-12T09:55:20Z

Description of the issue

Most (all?) agents have a config option ignore_urls (or similar, see below) to ignore certain transactions by URL, using pattern matching. Unfortunately, there are subtle and not-so-subtle differences across agents:

Java and Go: a list of wildcard matchers
Node: A list of items that can be either regexes or strings
Ruby: a list of regexes. Also, the option is called ignore_url_patterns
Python: A list of regexes. Also, the option is called transaction_ignore_patterns. Oh, and it's matched against the transaction name, not the URL (historically, this is so it can also be used for non-HTTP transactions, like background jobs)
then I stopped...

These differences in both naming and semantics make it difficult to use this config option in a remote config scenario. Therefore, we should harmonize.

Semantics is probably the easier of the two: I propose that we follow the lead of the Java and Go agent, and use wildcard matching on the path part of the URL itself (not the parametrized URL pattern).

As for naming, using IGNORE_URLS or IGNORE_URL_PATTERNS poses the difficulty that they are already used with different semantics by some agents (the former by Node, the latter by Ruby). Therefore, I propose to use TRANSACTION_IGNORE_URLS, assuming that all agents only apply this config option to transactions, not errors or metrics (if that's not the case in your current implementation, please speak up!).

What we are voting on

All agents will add a new config option, TRANSACTION_IGNORE_URLS, which will use glob matching. It will be case insensitive by default. There is no required deprecation path for old config values, each agent team can decide that for themselves.

Vote

Agent	Link to agent issue	Central Kibana Config
.NET	elastic/apm-agent-dotnet#688	⚪
Go	elastic/apm-agent-go#792	⚪
Java	elastic/apm-agent-java#1313	✅
Node.js	elastic/apm-agent-nodejs#1689	✅
Python	elastic/apm-agent-python#772	⚪
Ruby	elastic/apm-agent-ruby#835	⚪
RUM
PHP	elastic/apm-agent-php#82	⚪

The text was updated successfully, but these errors were encountered:

mikker · 2019-09-12T10:19:01Z

I agree on adding TRANSACTION_ to the option name to be explicit about what it actually does.

I guess if it ends in _URL it should probably only match URLs and not transaction names? These are not alike in Ruby (nor Java AFAIK).

Wildcard matchers are probably a better idea than regexes IMO. I suppose even only supporting * could be enough?

mikker · 2019-09-12T10:20:15Z

Strictly speaking, I think we're matching paths and not URLs? /ping and not http://example.com/ping. /ping would work although not a full URL.

beniwohli · 2019-09-12T10:26:01Z

@mikker regarding wildcards, I think we should all try and match the behavior of WildcardMatcher in the Java agent, including optional case insensitivity.

Regarding URL, yes, my proposal is to match against the "raw" path, thanks for the clarification. I'll update the description.

hmdhk · 2019-09-18T12:16:40Z

IMO, the IGNORE_URLS as is defined for the backend agents (which applies to incoming request) is not a good fit for the RUM agent. Please raise your concerns regarding this and we can reconsider.

beniwohli · 2019-09-18T13:35:52Z

@jahtalab can you go into a bit more details of why you think it isn't a good fit? You talked a bit about it in our agents meeting, but for posterity's sake, and possibly as a base for further discussion, it would be nice if you wrote it up.

Personally, I think the most important thing we need to make sure is harmonized amongst agents is user expectations. If I add /admin/* to TRANSACTION_IGNORE_URLS, I don't want to see any transactions from that URL. If that's what happens on the RUM agent, I think it would be fine to use the same setting there, even if the exact interpretation and implementation may be somewhat different.

hmdhk · 2019-09-18T14:17:56Z

Conceptually I tend to separate the configuration used for incoming request in the backend and the url of a page visited in the browser. Having two different meaning for the same configuration option tends to create problems down the line. For example, in the backend agent the request URL is used by default as the transaction name but this is not the case for the RUM agent and we let the user to set the name of the page load transaction.

Furthermore, the RUM agent has ignoreTransactions config option which is a better fit since a transaction is not always guaranteed to have a url (e.g. route change transactions) but it will always have a name and at the same time the RUM agent doesn't face the challenges backend agents face with not being able to filter out transactions by name.

basepi · 2020-01-14T16:11:04Z

Updated the "What we are voting on" piece of the issue, with the decisions made in today's agents meeting. I made an executive decision that it should be case insensitive by default. If you disagree please comment here. :)

felixbarny · 2020-01-14T16:54:34Z

As "glob matching" or wildcard matching can mean lots of things, let's define that as well.

My take on it:

The wildcard character *, matches zero or more characters. A wildcard may be at the beginning, in the middle or at the end of a matcher. Examples: /foo/*/bar/*/baz*, *foo*.
Matching is case insensitive by default. Prepending a matcher with (?-i) makes the matching case sensitive.
Single character wildcards (?) are not supported.
There is no escape character, so matching a literal * is not supported.

These constraints make it reasonable to both implement a matcher from scratch, like in https://github.com/elastic/apm-agent-java/blob/master/apm-agent-core/src/main/java/co/elastic/apm/agent/matcher/WildcardMatcher.java and to build a wildcard matcher based on regex. The latter would work by regex-quoting the input string and then replacing \* with .*.

basepi · 2020-01-14T18:07:32Z

@felixbarny How worried are we about "accidentally" allowing for things like single character wildcards?

Python has fnmatch which is usually used for globbing. But I don't think you can turn features off, so should we plan on creating our own implementation or have hidden behavior? I'm not convinced that the more constrained specification is worth implementing it from scratch in each agent, even if it's only a few hundred lines of code.

axw · 2020-01-15T04:57:01Z

If there are quirks, people will inevitably come to rely on them (insert XKCD spacebar comic here). We could just say that they shouldn't do that, but I think it would be preferable to be consistent and exact. I think you could still use fnmatch, replacing ? with [?] and [ with [[], and get consistent behaviour.

beniwohli · 2020-01-15T07:21:35Z

FWIW, we already do have an implementation of wildcard matching in the Python agent

https://github.com/elastic/apm-agent-python/blob/793800f698a427d4df6208e8ce1d7ecc304b6639/elasticapm/utils/__init__.py#L151

mikker · 2020-01-15T08:51:05Z

I copied Python's approach for the Ruby agent, basically pulling all the *s, escaping anything else that would have special meaning, putting .* where the *s were, converting to regex.

(Very smart, @beniwohli!)

(Update: I see you described the approach already, Felix. Still smart 😉)

felixbarny · 2020-03-24T16:02:07Z

@elastic/apm-agent-devs please leave your vote in the issue description 🙂

beniwohli · 2020-03-25T12:12:42Z

Should we only match against the path, or also the query string?

Pro include query string: easy to add something like ?ignore_apm=true to any URL, e.g. for uptime checkers
Contra include query string: any URL with a query string would break the matching unless the user doesn't append a * to every configured URL.

I'm tending towards not including the query string

felixbarny · 2020-03-25T14:26:28Z

I'm tending towards not including the query string

++

gregkalapos · 2020-07-16T09:37:43Z

As it seems there is an agreement on this with the new config name and behavior described in the description. (Except RUM, it's different...).

Some agents already linked implementation issues, I suggest the rest also creates and links those and we close this and start implementing it soon.

Reason I bring this up is that people are asking for this in .NET and I waited to see what we agree on here (I expect PHP will have the same situation). I will implement it for .NET according to the description above, but as it seems .NET will be the only one doing it this way and I see a little bit of a risk that we'll end up now with N+1 way of doing this.

So in sum: if you agree please add implementation issue for your corresponding agent and schedule it; and let's close this one since we agree on the new name and behavior which is in the issue description.

gregkalapos · 2020-07-28T15:21:36Z

We discussed this is today's agent meeting.

The issue description has the new name and the definition of the setting
No objection to it, we go with that
Opened implementation issues for all remaining agents
RUM might consider supporting this option as the transaction names are set based on URL by default
Currently targeted for ~~7.9~~ 7.10.

felixbarny · 2020-08-11T10:18:31Z

When implementing this change, agents should test against this common set of test cases to ensure interoperability: #313

beniwohli added apm-agents poll labels Sep 12, 2019

beniwohli mentioned this issue Sep 12, 2019

Agent Configuration in Kibana, GA #138

Closed

7 tasks

beniwohli mentioned this issue Sep 20, 2019

Harmonize on TRANSACTION_MAX_SPANS #148

Closed

This was referenced Jan 14, 2020

Add config TRANSACTION_IGNORE_URLS elastic/apm-agent-dotnet#688

Closed

Trace only some endpoints in .Net Core elastic/apm-agent-dotnet#685

Closed

beniwohli mentioned this issue Mar 25, 2020

Implement transaction_ignore_urls elastic/apm-agent-python#772

Closed

lreuven mentioned this issue Mar 25, 2020

Implement transaction_ignore_urls elastic/apm-agent-nodejs#1689

Closed

felixbarny added this to the 7.8 milestone Mar 27, 2020

SergeyKleyman mentioned this issue Jul 14, 2020

Configuration option: TRANSACTION_IGNORE_URLS elastic/apm-agent-php#82

Closed

2 tasks

gregkalapos mentioned this issue Jul 16, 2020

Add TransactionIgnoreUrls setting elastic/apm-agent-dotnet#904

Merged

3 tasks

felixbarny modified the milestones: 7.8, 7.9 Jul 28, 2020

felixbarny mentioned this issue Jul 28, 2020

Add TRANSACTION_IGNORE_URLS option elastic/apm-agent-java#1313

Closed

This was referenced Jul 28, 2020

Implement transaction_ignore_urls elastic/apm-agent-go#792

Closed

Implement transaction_ignore_urls elastic/apm-agent-ruby#835

Closed

gregkalapos closed this as completed Jul 28, 2020

felixbarny modified the milestones: 7.9, 7.10 Aug 6, 2020

felixbarny mentioned this issue Aug 11, 2020

Add wildcard matcher tests #313

Merged

This was referenced Aug 11, 2020

Add transaction_ignore_urls config elastic/apm-agent-ruby#844

Merged

Adding more config options to central config for backend agents #318

Closed

felixbarny linked a pull request Aug 26, 2020 that will close this issue

Add transaction_ignore_urls spec #333

Merged

felixbarny mentioned this issue Aug 26, 2020

Add transaction_ignore_urls spec #333

Merged

astorm mentioned this issue Nov 18, 2020

feat: transaction_ignore_urls wildcard matching elastic/apm-agent-nodejs#1876

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

beniwohli commented Sep 12, 2019 •

edited by AlexanderWert

Loading

mikker commented Sep 12, 2019 •

edited

Loading

mikker commented Sep 12, 2019

beniwohli commented Sep 12, 2019 •

edited

Loading

hmdhk commented Sep 18, 2019

beniwohli commented Sep 18, 2019

hmdhk commented Sep 18, 2019

basepi commented Jan 14, 2020 •

edited

Loading

felixbarny commented Jan 14, 2020

basepi commented Jan 14, 2020

axw commented Jan 15, 2020

beniwohli commented Jan 15, 2020

mikker commented Jan 15, 2020 •

edited

Loading

felixbarny commented Mar 24, 2020

beniwohli commented Mar 25, 2020

felixbarny commented Mar 25, 2020

gregkalapos commented Jul 16, 2020 •

edited

Loading

gregkalapos commented Jul 28, 2020 •

edited

Loading

felixbarny commented Aug 11, 2020

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

Harmonize on name and meaning of IGNORE_URLS-ish config option #144

Comments

beniwohli commented Sep 12, 2019 • edited by AlexanderWert Loading

Description of the issue

What we are voting on

Vote

mikker commented Sep 12, 2019 • edited Loading

mikker commented Sep 12, 2019

beniwohli commented Sep 12, 2019 • edited Loading

hmdhk commented Sep 18, 2019

beniwohli commented Sep 18, 2019

hmdhk commented Sep 18, 2019

basepi commented Jan 14, 2020 • edited Loading

felixbarny commented Jan 14, 2020

basepi commented Jan 14, 2020

axw commented Jan 15, 2020

beniwohli commented Jan 15, 2020

mikker commented Jan 15, 2020 • edited Loading

felixbarny commented Mar 24, 2020

beniwohli commented Mar 25, 2020

felixbarny commented Mar 25, 2020

gregkalapos commented Jul 16, 2020 • edited Loading

gregkalapos commented Jul 28, 2020 • edited Loading

felixbarny commented Aug 11, 2020

beniwohli commented Sep 12, 2019 •

edited by AlexanderWert

Loading

mikker commented Sep 12, 2019 •

edited

Loading

beniwohli commented Sep 12, 2019 •

edited

Loading

basepi commented Jan 14, 2020 •

edited

Loading

mikker commented Jan 15, 2020 •

edited

Loading

gregkalapos commented Jul 16, 2020 •

edited

Loading

gregkalapos commented Jul 28, 2020 •

edited

Loading