Provide standardized traffic metrics #554

tomkerkhove · 2021-02-16T06:12:06Z

What would you like to be added:
Provide standardized traffic metrics for all gateways to implement so that other tools can rely on a common way to get the metrics.

Ideally, this would fully align with the metrics that SMI spec leverages to make it, even more, easier to integrate.

I've proposed to extend SMI to beyond Service Meshes but then this project started so it might be better to just align instead of re-invent.

Why is this needed:

Tools & platforms need a unified way to get traffic metrics for all gateways, regardless of what the effective gateway is that is being used.

In my case, we are building HTTP-based autoscaling for KEDA (experimental) so we will rely on things such as SMI, but here we were hoping to rely on a standard/SDK for getting the metrics as well instead of re-implementing every gateway.

/cc @michelleN @bridgetkromhout

robscott · 2021-02-17T01:06:46Z

Hey @tomkerkhove, thanks for bringing this up! We're always interested in finding any overlap with SMI Spec or any other related projects and trying to develop a standard that works for all use cases. In this case, metrics are something that we haven't prioritized yet, but something that could be in scope in the future. This actually came up in Slack a couple weeks ago and I raised it at our community meeting as well.

A few quick follow up questions related to SMI metrics:

Did you envision Gateway API supporting the same set of metrics? A subset/superset? Are there metrics that would be more relevant to only certain use cases?
What level of standardization do you think would be ideal to provide?
Would it be possible to expose this with standard prometheus metrics instead of a CRD?
In your experience, how difficult has it been for different implementations to provide this standardized set of metrics?
Are there any areas you'd like to improve on with the metrics in SMI?

tomkerkhove · 2021-02-18T20:08:59Z

Thanks for your reply!

In terms of what metrics I do not have much of a preference, but # of requests is the minimum for me.

A CRD is ideal for us/me as we do not want to force Prometheus on everyone. For example, some companies rely on their cloud provider for that so don't need it (including me).

In terms of standardization, I'd hope every gateway API has to/will provide these metrics and hopefully compatible with SMI so that we can have the same metric experience for Service Meshes, Ingresses/gateways and eventually hoping for service-to-service as well.

beriberikix · 2021-02-18T23:42:11Z

I would suggest success/failure vs. raw count, since success rate or # of requests can both be calculated from it.

tomkerkhove · 2021-02-19T05:58:12Z

Yes, but ideally they are there out of the box so no computation needs to be done which makes it easier to use

bowei · 2021-02-19T17:54:49Z

Given the SMI experience, are there any issues with using CRs (which are backed by etcd db) to store metrics that are fast changing and tend to be emphemeral? I know that K8S metrics resources have special handling to avoid overwhelming the API server.

(https://github.com/kubernetes/metrics/blob/master/pkg/apis/metrics/v1beta1/types.go)

evankanderson · 2021-03-03T19:18:26Z

Other art happening concurrently:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/http-metrics.md

I'd share bowei's concern about the apiserver not necessarily being the best place for fast-moving metrics.

fejta-bot · 2021-06-01T20:52:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

tomkerkhove · 2021-06-01T20:58:23Z

Closing as discussed in standup that this is not on the radar for now.

tomkerkhove · 2021-12-22T15:12:00Z

Since time has passed by, I'm wondering if there are plans to integrate Traffic Metrics into the spec? If so, would be nice to re-use some of the SMI spec or OpenTelemetry ones.

howardjohn · 2021-12-22T15:14:14Z

I am admittedly not very familiar with them, but my understanding is that opentelemetry also defined some standardized request metric schemas which could be a reasonable API if this project decides to 'endorse' a metrics scheme.

tomkerkhove · 2021-12-22T15:45:36Z

That's definitely correct, but what if it's not endorsed and rather part of the spec to provide these?

This is what SMI does and allows end-users to rely on a standard way of getting metrics; regardless of what standard that is being used for the semantics. Tooling knows they are there, for every Gateway API.

k8s-triage-robot · 2022-01-21T16:31:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-02-20T16:40:09Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-02-20T16:40:20Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tomkerkhove · 2022-06-02T12:52:34Z

Given the current state of the project, I think it's time to reconsider this for consistent metrics across implementation.

/reopen

k8s-ci-robot · 2022-06-02T12:52:44Z

@tomkerkhove: Reopened this issue.

In response to this:

Given the current state of the project, I think it's time to reconsider this for consistent metrics across implementation.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-07-02T13:43:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-07-02T13:43:14Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mikemorris · 2022-07-06T15:39:43Z

@tomkerkhove Some of the folks who have been involved in SMI are trying to get a new working group started for discussing E/W mesh applications of Gateway API that it sounds like may be of interest to you kubernetes/community#6724

In early discussions though, we decided to not focus on a spec for telemetry at this time - it's been an under-implemented/adopted part of SMI, and the divergence in implementations, vendors and evolving standards has made it challenging to build the consensus needed for a standard to become widely adopted.

It has been encouraging to see some of the work OpenTelemetry has been doing, and I think for the near future it would be best to focus on implementation/adoption within that group, with the goal of laying the groundwork to eventually enable broader adoption in projects like Gateway API, rather than starting a parallel effort.

tomkerkhove · 2022-07-07T08:27:56Z

Thanks for the update.

I strongly believe Gateway API is more than service meshes and this is a common misconception but I will jump to that thread and see what gives because in the end if it's purely focussing on Service Meshes then SMI already covered that.

youngnick · 2022-07-08T03:35:13Z

I one thousand percent agree that Gateway API is more than service meshes, but I'm supportive of the new WG because the core API has so many todos already that we're not going to have bandwidth to properly address service mesh use cases for some time. Having a WG that works on the service mesh problems and how to integrate the work SMI has already done with Gateway API and report back will be super useful.

robscott · 2024-12-12T00:23:59Z

For anyone still interested in this - there's a related discussion in OpenTelemetry now: open-telemetry/semantic-conventions#1675

tomkerkhove added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 16, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2021

tomkerkhove closed this as completed Jun 1, 2021

tomkerkhove reopened this Dec 22, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 21, 2022

k8s-ci-robot closed this as completed Feb 20, 2022

k8s-ci-robot reopened this Jun 2, 2022

k8s-ci-robot closed this as completed Jul 2, 2022

tedvanderveen mentioned this issue Jul 18, 2024

Add integration with Application Gateway for Containers for HTTP ingress kedacore/http-add-on#933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide standardized traffic metrics #554

Provide standardized traffic metrics #554

tomkerkhove commented Feb 16, 2021

robscott commented Feb 17, 2021

tomkerkhove commented Feb 18, 2021

beriberikix commented Feb 18, 2021

tomkerkhove commented Feb 19, 2021

bowei commented Feb 19, 2021

evankanderson commented Mar 3, 2021

fejta-bot commented Jun 1, 2021

tomkerkhove commented Jun 1, 2021

tomkerkhove commented Dec 22, 2021 •

edited

Loading

howardjohn commented Dec 22, 2021

tomkerkhove commented Dec 22, 2021

k8s-triage-robot commented Jan 21, 2022

k8s-triage-robot commented Feb 20, 2022

k8s-ci-robot commented Feb 20, 2022

tomkerkhove commented Jun 2, 2022

k8s-ci-robot commented Jun 2, 2022

k8s-triage-robot commented Jul 2, 2022

k8s-ci-robot commented Jul 2, 2022

mikemorris commented Jul 6, 2022 •

edited

Loading

tomkerkhove commented Jul 7, 2022

youngnick commented Jul 8, 2022

robscott commented Dec 12, 2024

Provide standardized traffic metrics #554

Provide standardized traffic metrics #554

Comments

tomkerkhove commented Feb 16, 2021

robscott commented Feb 17, 2021

tomkerkhove commented Feb 18, 2021

beriberikix commented Feb 18, 2021

tomkerkhove commented Feb 19, 2021

bowei commented Feb 19, 2021

evankanderson commented Mar 3, 2021

fejta-bot commented Jun 1, 2021

tomkerkhove commented Jun 1, 2021

tomkerkhove commented Dec 22, 2021 • edited Loading

howardjohn commented Dec 22, 2021

tomkerkhove commented Dec 22, 2021

k8s-triage-robot commented Jan 21, 2022

k8s-triage-robot commented Feb 20, 2022

k8s-ci-robot commented Feb 20, 2022

tomkerkhove commented Jun 2, 2022

k8s-ci-robot commented Jun 2, 2022

k8s-triage-robot commented Jul 2, 2022

k8s-ci-robot commented Jul 2, 2022

mikemorris commented Jul 6, 2022 • edited Loading

tomkerkhove commented Jul 7, 2022

youngnick commented Jul 8, 2022

robscott commented Dec 12, 2024

tomkerkhove commented Dec 22, 2021 •

edited

Loading

mikemorris commented Jul 6, 2022 •

edited

Loading