Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce metric cardinality by default #1551

Open
roobre opened this issue Nov 26, 2024 · 2 comments
Open

Reduce metric cardinality by default #1551

roobre opened this issue Nov 26, 2024 · 2 comments
Labels
feature A new feature

Comments

@roobre
Copy link
Member

roobre commented Nov 26, 2024

Feature Description

Currently, xk6-browser generates several metrics (such as browser_data_sent, browser_http_req_duration, browser_data_received) for each request that the browser makes to the server. For very simple webpages, this is not an issue. However, for more complex webpages, the number of requests for images, styles, scripts, fonts, etc. can very rapidly grow: A simple script that navigates to www.amazon.com and waits there for a bit generated metrics for 380 different URLs.

This number compounds when we want to figure out the number of timeseries generated:

  • For each URL, multiple metrics are generated
  • Some webpages will request the same resource multiple times, but adding a "cache-busting" query argument to it at the end (e.g. /somethig?ts=1732637991). This means the same resource will generate multiple timeseries
  • As the scrip gets more complex and more pages are visited, subsequent pages can request more images

The fantastic URL grouping API introduced in https://github.com/grafana/k6/releases/tag/v0.55.0 is a great way to provide fine-grained control and let users select which URLs they are interested in specifically. However, this API does not help to improve the out-of-the-box experience, when a user is creating their first check without having learned about the in-depth APIs that k6 browser offers. Currently, it is possible for a user to greatly increase their metrics spend by taking the example script and swapping the URL with theirs, which is not the best experience.

I think we should have a way to provide an out-of-the box filter that prevents generating metrics for URLs that we estimate are not relevant for the simple use cases. Users that want more detailed metrics should have a way to opt-in into them.

Suggested Solution (optional)

Figuring out which urls are "not relevant for the simple use cases" is of course a difficult matter. As a suggestion (which needs to be validated), we might be able to get a decent heuristic if we can leverage the "initiator" for a request. If we can obtain this information from the browser, we can assume that for the basic use case, a user will be most interested in knowing metrics for URLs that:

  • Are navigated to directly, through page.goto()
  • Are navigated to as a result of (scripted) user input, such as page.locator().click()
  • Are the result of HTTP redirects, such as 301 or 307

As opposed to requests originating from:

  • src attributes for img, script, style, etc.
  • CSS styles, such as background-url, font-face, etc.

For a simple use case, this could be implemented as a filter, alternative to URL grouping. For example, something like:

export const options = {
  scenarios: {
    foo: {
      executor: 'shared-iterations',
      options: {
        browser: {
          type: 'chromium',
          metricsFor: ['navigation,input'],
        },
      },
    },
  },
};

To avoid breaking existing behvior, the default value for metricsFor could be obtained from an environment variable, such as K6_BROWSER_METRICS_FOR=navigation,input. This would allow managed services like Synthetic Monitoring to specify a custom default, while at the same time also allow users to override this with their preferences. Note that metricsFor option is just a crude example, and this could perfectly be an option to the browser context, or an API call, or whatever else that is idiomatic to the k6 browser specifics.

Already existing or connected issues / PRs (optional)

@roobre roobre added the feature A new feature label Nov 26, 2024
@roobre roobre changed the title Provide a way to opt-in or out of metrics for non-user initiated requests Provide a way to reduce metric cardinality by default Nov 26, 2024
@roobre roobre changed the title Provide a way to reduce metric cardinality by default Reduce metric cardinality by default Nov 26, 2024
@ankur22
Copy link
Collaborator

ankur22 commented Nov 27, 2024

Currently requests already have a resourceType which can be:

	ResourceTypeDocument           ResourceType = "Document"
	ResourceTypeStylesheet         ResourceType = "Stylesheet"
	ResourceTypeImage              ResourceType = "Image"
	ResourceTypeMedia              ResourceType = "Media"
	ResourceTypeFont               ResourceType = "Font"
	ResourceTypeScript             ResourceType = "Script"
	ResourceTypeTextTrack          ResourceType = "TextTrack"
	ResourceTypeXHR                ResourceType = "XHR"
	ResourceTypeFetch              ResourceType = "Fetch"
	ResourceTypePrefetch           ResourceType = "Prefetch"
	ResourceTypeEventSource        ResourceType = "EventSource"
	ResourceTypeWebSocket          ResourceType = "WebSocket"
	ResourceTypeManifest           ResourceType = "Manifest"
	ResourceTypeSignedExchange     ResourceType = "SignedExchange"
	ResourceTypePing               ResourceType = "Ping"
	ResourceTypeCSPViolationReport ResourceType = "CSPViolationReport"
	ResourceTypePreflight          ResourceType = "Preflight"
	ResourceTypeOther              ResourceType = "Other"

The first step could be to tag the metrics with their respective resourceType. Looking at k6, i can't find a suitable tag that already exists that we could piggy back on, so instead i propose that we create a new tag such as resourceType.

The second step is to provide an API that enables the filtering of metrics.

I think the general feeling is that we want to avoid working with the options block if at all possible. If there is a good reason for it then it might be considered.

An env var approach could be a good starting point to set an environment up easily so that only certain resourceTypes are allowed through. To then allow the user to override the env var white list, we could extend the already existing page.on('metric') or an existing k6 metric filter?

Example of extending page.on('metric')

page.on('metric', (metric) => {
    const d = metric.abort({
        tag: 'resourceTag',
        value: 'Stylesheet',
    });

    if (d) {
        console.log('metric was aborted');
    }
})

In k6 there are a couple of ways to drop tags:

However, neither of these provide a way to drop a tag which matches particular value. An existing issue could have the answer to a new API that works with k6 metric tags throughout the whole of k6, rather than just the browser module.

@roobre
Copy link
Member Author

roobre commented Nov 28, 2024

Thank you for the super fast research @ankur22!

The list of resources looks very extensive and I think they offer a promising angle to tackle this. I'm meeting with some teammates to discuss this deeper next Monday, but to me this seems like the way to go. Tentatively, I think if we allow Document,XHR,Fetch,Prefetch,WebSocket by default that would leave us on a good spot.

I think the general feeling is that we want to avoid working with the options block if at all possible. If there is a good reason for it then it might be considered.

I agree with this. By avoiding it, we also make more sense of having the code override what is set on the environment variable. If we used the option block, having making the env var less precedent than the option would be inconsistent, so if we avoid it, all the better!

Regardin the API, I'm happy to meet again and discuss to make sure that we have someting that feels ergonomic. While I strongly think we should limit the URLs we meter for by default, I also think it should be easy for users to opt into having all URLs metered if they want to. We can have a chat and try to figure out an API that satisfies both (default off, easy opt-in) goals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants