Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Grouping/Aggregation #371

Closed
4 tasks done
Tracked by #1460
tom-miseur opened this issue Jun 3, 2022 · 8 comments
Closed
4 tasks done
Tracked by #1460

URL Grouping/Aggregation #371

tom-miseur opened this issue Jun 3, 2022 · 8 comments
Assignees
Labels
blocked We need further action from something/someone to be able to work on the issue evaluate feature A new feature next Might be eligible for the next planning (not guaranteed!) user request Requested by the community

Comments

@tom-miseur
Copy link

tom-miseur commented Jun 3, 2022

It is often useful to aggregate endpoint URLs that contain dynamic values. This is critical in the k6 Cloud due to the limits we have in place to prevent tests from emitting too-many-metrics/too-many-urls.

The URL Grouping documentation provides a solution for k6 scripts using the http module, but because xk6-browser operates at the browser-level, there is no opportunity for the user to apply the name tag to requests that require it.

The situation is compounded by the fact that xk6-browser gains visibility of all HTTP requests incurred by the browser, including 3rd party hosts that would not normally be interacted with at all using HTTP k6 scripts.

Potential solutions

Allowlist/blocklist hosts in xk6-browser

A cursory browse through Playwright docs suggests there is no convenient way of preventing/allowing requests to certain hosts, e.g. through specifying regular expressions. There is, however, a request interception mechanism involving Page.route or BrowserContext.route that could be used to abort requests that don't fit the criteria.

Pros:

  • would appear to support regex allow/block-listing which should be fairly easy for users to apply
  • doesn't actually send requests to undesired hosts, so no need to wait for #1321 and no need to worry about errors from 3rd party hosts (e.g. errors caused by rate limiting)

Cons:

  • users need to encounter the problem before then figuring out how to resolve it

Allowlist/blocklist hosts after-the-fact

This means xk6-browser still sends requests to the additional hosts, but that traffic can be filtered out of results.

Pros:

  • the user wouldn't need to run the test again to have filtering applied

Cons:

  • requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
  • k6 OSS would need some mechanism to ignore metrics from certain hosts (#1321)
  • k6 Cloud would need to be able to filter out hosts (unless #1321 would result in k6 Cloud not receiving the metrics at all which is quite likely)

Aggregation Rules

This would involve the user specifying URL grouping regular expressions (likely in options) ahead of time. Before any metric is generated, we check if the URL matches any of the patterns and apply the transformation as necessary.

Example:

export const options = {
  aggregations: [
    { regex: 'http:\/\/ecommerce\.test\.k6\.io\/checkout\/order-received\/.*\/\?key=.*', replace: '[id]' }
  ]
}

// http://ecommerce.test.k6.io/checkout/order-received/124/?key=bgravga43g43 -> http://ecommerce.test.k6.io/checkout/order-received/[id]/?key=[id]

Pros:

  • fairly straightforward to use; possibly even easier to implement than tagging requests with name
  • would be applicable to both http and xk6-browser
  • also solves the edge case where redirect requests contain dynamic IDs (you can apply a name tag to the request that initiates the redirect chain, but then all requests in that chain end up with the same name tag)

Cons:

  • requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
  • users need to encounter the problem before then figuring out how to resolve it
  • performance is likely going to be a concern here, given that all URLs would need to be evaluated against one or more regular expressions

Tasks

Preview Give feedback
@tom-miseur tom-miseur added the feature A new feature label Jun 3, 2022
@imiric
Copy link
Contributor

imiric commented Jun 6, 2022

As mentioned over Slack, support for k6's blockHostnames option was added in #204, and released in v0.2.0. So you can give that a try right now and see if it helps.

That said, we'll still have to implement URL grouping by name, since that's currently not possible.

Using regex for this would be the more flexible option, but sticking with globbing patterns like with blockHostnames would be user friendlier. Considering this feature would also be useful for plain k6 scripts, where evaluating a regex for each URL might be too CPU intensive, using globbing would also perform better. Performance in this case isn't as important for xk6-browser, since we don't make requests with nearly the same frequency, so regex might work for us as well, but globbing seems like the way to go.

If we want to use the global options object, this will have to be implemented in k6 instead, since extensions don't have access to change it. It's worth discussing this with k6 devs, so @na--, WDYT? Would this feature also be useful for k6? If so, we should implement it there first, and then reuse the option in xk6-browser, in the same way we did for blockHostnames. If not, then this will have to be an xk6-browser-specific option, likely part of the BrowserContext options.

@imiric imiric added the evaluate label Jun 6, 2022
@na--
Copy link
Member

na-- commented Jun 6, 2022

Hmm, I don't have a very strong opinion here, but I'd prefer if we can avoid doing this via a new global option, at least until we have a clear idea of how to implement that optimally... 🤔

Global options are always a heavy maintenance burden over time and they are often not flexible enough to address all use cases. In some cases they are unavoidable, but in general I think we've found that programmable APIs are both easier to maintain and more flexible.

In this case, maybe a new callback to the browser.newContext() parameters could be used? I am not familiar enough with xk6-browser to know if this is a good or even possible solution, just throwing it out there as a potential solution through the API instead of through the global config

@inancgumus inancgumus added the next Might be eligible for the next planning (not guaranteed!) label Jun 9, 2022
@inancgumus inancgumus added this to the v0.5.0 milestone Jun 9, 2022
@inancgumus inancgumus modified the milestones: v0.5.0, v0.6.0 Jun 23, 2022
@inancgumus inancgumus modified the milestones: v0.6.0, v0.7.0 Nov 8, 2022
@inancgumus inancgumus self-assigned this Nov 8, 2022
@inancgumus inancgumus added the blocked We need further action from something/someone to be able to work on the issue label Nov 10, 2022
@dgzlopes
Copy link
Member

dgzlopes commented Dec 1, 2022

Sorry! I somehow missed responding to this one 😞

I thought it could be interesting to have an automatic way of doing this. After all, we have the metrics data and all the URLs in k6! (at least for some time).

Maybe we could have the option to aggregate "high cardinality data" that would check the latest URLs and remove the highly changing part (and replace it with id_X or something).

There is a "similar" feature in Grafana that lets you dedup Loki logs based on the signature.

@dgzlopes
Copy link
Member

dgzlopes commented Dec 1, 2022

Internally, if I remember correctly, we had something similar for Prometheus metrics labels, too (In Python).

@inancgumus inancgumus removed their assignment Dec 5, 2022
@inancgumus inancgumus removed this from the v0.7.0 milestone Feb 7, 2023
@ka3de ka3de added the user request Requested by the community label Jul 28, 2023
@inancgumus inancgumus removed the next Might be eligible for the next planning (not guaranteed!) label Sep 22, 2023
@ankur22 ankur22 added the next Might be eligible for the next planning (not guaranteed!) label Sep 18, 2024
@ankur22 ankur22 assigned ankur22 and unassigned ankur22 Sep 26, 2024
@ankur22
Copy link
Collaborator

ankur22 commented Sep 26, 2024

After some discussions we want to showcase the following API that will soon be available to allow grouping of metrics which are tagged with url. It differs from how the k6/http module groups metrics with high cardinality urls for good reason.

Here's the API (some details may slightly change):

export default async function() {
  const context = await browser.newContext();
  const page = await context.newPage();

  // Register a callback on the page object to be executed whenever a
  // metric is about to be emitted: offering the user the ability to build
  // their own logic and grouping of URLs. 
  page.on('metric', metric => {
    let regex = /^https:\/\/example\.com\/checkout\/[0-9a-f]*$/;

    // Grouping all browser metrics that contain the url tag which match the
    // regex with the name "example-checkout", which would allow the customer to
    // build a graph by querying for "shop-checkout".
    if (regex.test(metric.tags['url'])) {
      metric.tags['url']["name"] = 'example-checkout'
    }
  });

  await page.goto('https://example.com');
  await page.close();
}

The new API extends the page.on API (that already exists) to intercept and modify the metrics that are being emitted for the current page. In the example above we're working with the raw metric object. The user experience might not be tight, but it does give the user a lot of control over the metric.

NOTE: This will only intercept metrics that the browser module emits, which currently are:

  • browser_data_sent
  • browser_http_req_duration
  • browser_data_received
  • browser_http_req_failed
  • browser_web_vital_*

We also hope to offer an easier to use helper function on metric to reduce the boiler plate code:

 page.on('metric', metric => {
    metric.groupURLTag({
    urls: [
      {url: /^https:\/\/example\.com\/[0-9a-f]*\/checkout\/[0-9a-f]*$/, name:"account-basket"},
      {url: /^https:\/\/example\.com\/catalogue\?session=[0-9a-f]*$/, name:"catalogue"},
    ]});
 });

@ankur22 ankur22 self-assigned this Sep 27, 2024
@jewbetcha
Copy link

Jumping in here from the k8s monitoring team, we would be very interested in this!

@ankur22 ankur22 mentioned this issue Oct 3, 2024
3 tasks
@ankur22
Copy link
Collaborator

ankur22 commented Oct 3, 2024

While implementing this feature, I've had to change it ever so slightly, which is to not expose the metric internals and allow the user to amend them. Instead the focus is only on the groupURLTag, which was the primary use case for this feature request. The reason for not exporting the metric itself is that there's still some uncertainty to the metric structure. I think it requires a bit more thought into the structure of the metric object that we want to expose, and there needs to be a clear reason why we're doing that.

@inancgumus inancgumus mentioned this issue Oct 4, 2024
@ankur22
Copy link
Collaborator

ankur22 commented Oct 7, 2024

The final API looks like this:

  page.on('metric', (metric) => {
    metric.Tag({
      urls: [
        {url: /^https:\/\/test\.k6\.io\/\?q=[0-9a-z]+$/, name:'test'},
      ]
    });
  });

There's an example you can work with to get you off your feet (remember to change the import to k6/browser)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked We need further action from something/someone to be able to work on the issue evaluate feature A new feature next Might be eligible for the next planning (not guaranteed!) user request Requested by the community
Projects
None yet
Development

No branches or pull requests

8 participants