Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: attributes to model k8s gateway API #1675

Open
michaelsafyan opened this issue Dec 11, 2024 · 7 comments
Open

Feature request: attributes to model k8s gateway API #1675

michaelsafyan opened this issue Dec 11, 2024 · 7 comments
Labels
area:k8s enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage

Comments

@michaelsafyan
Copy link
Contributor

michaelsafyan commented Dec 11, 2024

Area(s)

area:k8s

Is your change request related to a problem? Please describe.

If a loadbalancer is generated by the K8s Gateway API, there is currently not a standard way for the load balancer to include information in the spans that it generates to tie the spans/LB back to the K8s Gateway Config that generated it.

The goal is to make it very easy for users to know that:

  1. The loadbalancer was created from k8s gateway API
  2. Which of the various k8s YAML configs created that specific load balancer

Describe the solution you'd like

Incorporating feedback from conversation below:

  • k8s.gateway.class.name: The name of the GatewayClass referenced by the Gateway
  • k8s.gateway.class.namespace: Namespace for the corresponding name.
  • k8s.gateway.gateway.name: The name (metadata.name in the YAML) of the Gateway
  • k8s.gateway.gateway.namespace: Namespace for the corresponding name
  • k8s.gateway.route.kind: The route type (e.g. HTTPRoute, GRPCRoute, etc.)
  • k8s.gateway.route.name: The route name (metdata.name in the YAML) of the route
  • k8s.gateway.route.namespace: Namespace for the corresponding name

My assumption is that an initial version of this should be:

  1. Strictly "experimental"
  2. Either "Optional" or "Recommended if available" (but NOT required)

We should require at least one (and ideally two) stable vendor implementation setting these properties before promoting a property in the list above from "experimental" to "stable".

Describe alternatives you've considered

Alternatives Considered

More general

Like:

  • k8s.config.api.version: value of the apiVersion in the config YAML
  • k8s.config.kind: value of the kind in the config YAML

The problem with this is that multiple separate config YAMLs can contribute to a single real, materialized load balancer

More expansive/specific

Like including the proposal and additionally adding things like:

  • k8s.gateway.route.grpcroute.filter.*
  • k8s.gateway.route.grpcroute.matches.*
  • k8s.gateway.route.grpcroute.backendref.*

The problem with this is that it is unclear if any of this would be needed. Better to start with something smaller and then add more modelling in later should it prove to be needed.

Additional context

No response

@michaelsafyan michaelsafyan added enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Dec 11, 2024
@robscott
Copy link

This is cool to see! Thanks for starting the conversation @michaelsafyan.

At first glance this seems reasonable. Just a couple nits:

  1. Both the Gateway and Route should also have a namespace tracked
  2. Route type should probably be "kind" to more clearly align with k8s terms

Looping in a few more Gateway folks to take a look:

cc @arkodg @howardjohn @dprotaso @youngnick

@arkodg
Copy link

arkodg commented Dec 12, 2024

+1 to standardizing these attributes
Envoy Gateway has done something similar for access logs https://gateway.envoyproxy.io/docs/tasks/observability/proxy-accesslog/#additional-metadata
https://gateway.envoyproxy.io/contributions/design/metadata/

@howardjohn
Copy link

From the Gateway API side we should be careful we don't accidentally introduce conventions that sound good at the time but accidentally lock us into design decisions we don't want.

For example, k8s.gateway.gateway.name sounds great.. but what if we have gateway merging, or ListenerSet, or some future thing?

The backendRef stuff is a more obvious way this can go wrong. Its not uncommon for an implementation to share the same "backend" internally (such as Envoy Cluster) between multiple distinct routes for instance.

TBH I don't necessarily have any concrete "we must not have this attribute" but we should consider these types of things for anything we add.

@michaelsafyan
Copy link
Contributor Author

Thanks, @howardjohn .

Re: Gateway merging ... would it allay some of the concern if we ensured that these were strictly optional (not required)? One the one hand, we definitely don't want to prevent merging (or other optimizations/implementations). On the other hand, if such merging wasn't being done (or if the merging saved enough information to allow attribution to the original source), it would be useful if we could allow a standard way of providing that provenance information if it is available.

Thoughts?

@costinm
Copy link

costinm commented Dec 13, 2024

I am more of a "are we 100% sure it is required and users don't have any other ways" then "maybe someone will need it, why not add" - a lot of this info is redundant or can be found by other tools, doesn't need to be included in every single trace.

What would be the absolute minimum info that a user would need ? They know the backend - because the backend generates it own span. I believe the LB workload is already clearly identified. Based on that - in most cases a user can determine the Gateway CR, and from there the routes.

It may be worth indicating which route was hit, because routes can be complex - but adding a requirement that all load balancers can backtrack the original config is a very expensive demand. Some are implemented by translating configs and in not all cases the source config can be traced ( assuming the source is the K8S API - which is not always the case since various things generate and delete routes - like acme ).

I understand it is nice to have all the data immediately available, traced back to the source - in case someone needs it when looking at the trace and doesn't want to do extra work. But it has some costs, and it may be better to start with the most minimal set of info - and validate that different gateway implementations can propagate all this info ( and maybe what is the cost in gateway config size and trace storage ).

@costinm
Copy link

costinm commented Dec 13, 2024

I'm also a bit concerned about every API defining its own names for the same thing. Most APIs configuring a proxy have some 'route name' - wouldn't be better to just have a route.name - and fill it with whatever is the route name in the API ( even in K8S there are many APIs doing routing, and outside even more, many vendors and languages have multiple sets).

Having a way to identify the LB (which may or may not run in a K8S namespace), and a way to identify the route (without making it specific to K8S HttpRoute) may avoid a lot of name pollution and be more flexible.

@michaelsafyan
Copy link
Contributor Author

adding a requirement that all load balancers can backtrack the original config is a very expensive demand

Just want to clarify that no such requirement is being proposed.

Rather, if this information is available and a vendor providing a load balancer has the ability to and wants to report this information, I am seeking a way where this could be reported in a common/standard format.

Without conventions, any loadbalancer provider wanting to report this information would have to do so in a vendor-specific way, which is not ideal for consistency across different load balancers or different observability vendors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:k8s enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage
Projects
Status: No status
Development

No branches or pull requests

5 participants