Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog article on "Demystifying activator on the data path" #5709

Merged
merged 18 commits into from
Mar 20, 2024

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Oct 11, 2023

  • Introduces a new article on how Knative Pod Autoscaler decides when to add Activator on path and dives into certain concepts of how autoscaling works.
  • Adds an example of a scaled service and follows the steps done from the autoscaler in order to deal with the traffic demand.
  • It is a common user question as to when a ksvc is in proxy mode and when it goes to serve mode. I think we need to clarify certain stuff for better adoption and this is a starting article.

@knative-prow knative-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2023
@netlify
Copy link

netlify bot commented Oct 11, 2023

Deploy Preview for knative ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 2214e4b
🔍 Latest deploy log https://app.netlify.com/sites/knative/deploys/65faa883dec52100086d0379
😎 Deploy Preview https://deploy-preview-5709--knative.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 11, 2023
@knative-prow knative-prow bot requested review from nak3 and snneji October 11, 2023 12:51
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2023
@skonto skonto requested review from ReToCode, psschwei, dprotaso and matzew and removed request for nak3 and snneji October 11, 2023 12:51
Copy link
Member

@ReToCode ReToCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite some comments, feel free to take up what you think is useful. I like the approach and your work on this to make it more clear.


_In this blog post you will learn how to recognize when activator is on the data path and what it triggers that behavior._

A knative service can operate in two modes:proxy mode and serve mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Knative
and add a blank after :

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_In this blog post you will learn how to recognize when activator is on the data path and what it triggers that behavior._

A knative service can operate in two modes:proxy mode and serve mode.
When in proxy mode Activator is on the data path and it will stay on path until certain conditions are met.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the data path (which means the incoming requests are routed through the Activator component)

_In this blog post you will learn how to recognize when activator is on the data path and what it triggers that behavior._

A knative service can operate in two modes:proxy mode and serve mode.
When in proxy mode Activator is on the data path and it will stay on path until certain conditions are met.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certain conditions (more on this later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


A knative service can operate in two modes:proxy mode and serve mode.
When in proxy mode Activator is on the data path and it will stay on path until certain conditions are met.
When these conditions are met Activator is removed from the path and the service transitions to serve mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path -> data path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

A knative service can operate in two modes:proxy mode and serve mode.
When in proxy mode Activator is on the data path and it will stay on path until certain conditions are met.
When these conditions are met Activator is removed from the path and the service transitions to serve mode.
Although it was not always like that when a service scales from/to zero activator is added by default to the path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path -> data path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hey -z 600s -c 20 -q 1 -host "autoscale-go.default.example.com" "http://192.168.39.43:32718?sleep=1000"
```

Initially activator when get a request in it sends stats to the autoscaler which tries to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reword this a bit?

Since the pod is not up yet: EBS = 0*10 - floor(19.874) - 10 = -30


Given the new statistics kpa decides to scale to 3 pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add, "as one pod is set to a target of 10 requests"

{"severity":"INFO","timestamp":"2023-10-10T15:32:57.241421042Z","logger":"autoscaler","caller":"kpa/scaler.go:370","message":"Scaling from 1 to 3","commit":"f1617ef","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"6dcf87c9-15d8-41d3-95ae-5ca9b3d90705","knative.dev/key":"default/autoscale-go-00001"}
```

But let's see why is this so. The log above comes from the multi-scaler which reports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But let's see why is this is the case.

The target value is the utilization in terms of concurrency and that is is 0.7*(revision_target).
In this case this is 7. So we have for example for the panic window: ceil(19.874/7)=3

**Note:** if RPS is used then the utilization factor is 0.75.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might need more explaining

@@ -0,0 +1,182 @@
# Demystifying Activator on path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading the full text, it might be good to add the generic concept before going into the details. Something like

The activator acts as a component on the data-path to enable traffic buffering when a service is scaled-to-zero. One lesser known feature of activator is, that it can act as a request buffer that handles back-pressure with the goal to not overload a Knative service. For this, a Knative service can define how much traffic it can handle using annotations (link to docs here). The autoscaler component will use this information to calculate the amount of pods needed to handle the incoming traffic for a specific Knative service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might even be good to add some more generic info about the activator to the intro (like why it's needed for scale to zero), or maybe link out to the overview in the serving repo

@ReToCode
Copy link
Member

/cc @rhuss

@knative-prow knative-prow bot requested a review from rhuss October 11, 2023 13:50
Copy link
Contributor

@rhuss rhuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skonto , very important topic! Highly appreciated!

I would do a second round of review, so don't want to mix with Reto's comments. I just added some technical comments and using the suggestion feature of GitHub to directly allow to add proposals with one click.

Another recommendation I usually follow is to allow seamless reviews on GitHub for markdown/asciidoc to write every sentence in a single line without a line break. That makes diffs much easier to understand and works nicely with the suggestion feature.

More on this: https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line

blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
blog/docs/articles/demystifying-activator-on-path.md Outdated Show resolved Hide resolved
@skonto
Copy link
Contributor Author

skonto commented Oct 13, 2023

Thanks for the review guys. This is still in draft mode, I will ping you for the second round as I need to adjust the content a bit, some important stuff are still missing.

@skonto skonto marked this pull request as ready for review November 14, 2023 13:18
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023
@knative-prow knative-prow bot requested review from nak3 and snneji November 14, 2023 13:18
@skonto skonto changed the title Blog article on "Demystifying activator on path" [wip] Blog article on "Demystifying activator on path" Nov 14, 2023
@knative-prow knative-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023
Copy link
Contributor

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice post! will do a more thorough review once it's no longer WIP, but had a couple of general comments on the draft:

@@ -0,0 +1,182 @@
# Demystifying Activator on path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might even be good to add some more generic info about the activator to the intro (like why it's needed for scale to zero), or maybe link out to the overview in the serving repo

When the pod is up we have:

```bash
$ oc get po
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably should stick with either kubectl or oc for these commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups :) Yeah makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with kubectl.

autoscale-go-00001 Proxy 2 autoscale-go-00001 autoscale-go-00001-private Unknown NoHealthyBackends
```

Let's send some traffic (experiment was run on Minikube):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention that you'll be using hey to send that traffic (with a link to the tool, as some readers might not know what it is)

Comment on lines 143 to 149
"severity": "DEBUG",
"timestamp": "2023-10-10T15:29:37.241575214Z",
"logger": "autoscaler",
"caller": "scaling/autoscaler.go:286",
"message": "PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=10.000 ExcessBC=0.000",
"commit": "f1617ef",
"knative.dev/key": "default/autoscale-go-00001"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to consider filtering out parts here that aren't relevant to what you want to show, as they might obscure the info you want to highlight to the reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up the log msgs.

Copy link

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Reopen with /reopen. Mark as fresh by adding the
comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 15, 2024
@rhuss
Copy link
Contributor

rhuss commented Feb 19, 2024

/remove-lifecycle stale

@skonto what is the status of your blog post? Are you still up for publishing it? That would be great as I think this is a very important topic that many people would be interested in.

@knative-prow knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2024
@skonto skonto force-pushed the autoscaler_blog_1 branch from 178a1b0 to fa34bdf Compare March 19, 2024 10:14
@skonto skonto changed the title [wip] Blog article on "Demystifying activator on path" Blog article on "Demystifying activator on path" Mar 19, 2024
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2024
@skonto
Copy link
Contributor Author

skonto commented Mar 19, 2024

@skonto what is the status of your blog post? Are you still up for publishing it?

@rhuss I talked with @dprotaso I will try to publish it this week.

@skonto
Copy link
Contributor Author

skonto commented Mar 19, 2024

It might even be good to add some more generic info about the activator to the intro (like why it's needed for scale to zero), or maybe link out to the overview in the serving repo

Added a link to the docs we have.

@skonto
Copy link
Contributor Author

skonto commented Mar 19, 2024

@psschwei @ReToCode @rhuss @dprotaso pls review so we can merge it. It will be nice to have this in as it includes a few stuff that recently some users have asked about.

@skonto skonto changed the title Blog article on "Demystifying activator on path" Blog article on "Demystifying activator on the data path" Mar 19, 2024
Copy link
Member

@ReToCode ReToCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than that, LGTM.

Copy link
Member

@ReToCode ReToCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/hold not sure if you want others to comment. Otherwise feel free to unhold.

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 20, 2024
@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2024
Copy link

knative-prow bot commented Mar 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ReToCode, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@psschwei
Copy link
Contributor

I'm taking a look now

Copy link
Contributor

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I've left a number of nits / suggestions / minor tweaks in the comments below, but feel free to disregard any/all of them or tweak them as you see fit, they're more editorial suggestions than requested changes.


_In this blog post, you will learn how to recognize when the activator is on the data path and what triggers that behavior._

The [activator](https://github.com/knative/serving/tree/main/docs/scaling#activator) acts as a component on the data path to enable traffic buffering when a service is scaled to zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
The [activator](https://github.com/knative/serving/tree/main/docs/scaling#activator) acts as a component on the data path to enable traffic buffering when a service is scaled to zero.
The [activator](https://github.com/knative/serving/tree/main/docs/scaling#activator) is placed on on the data path to enable traffic buffering when a service is scaled to zero.

This reads a little clearer to me

Comment on lines +12 to +17
In detail, when serving traffic, a Knative service can operate in two modes: the `proxy` mode and the `serve` mode.
When in proxy mode, the activator is on the data path (which means the incoming requests are routed through the activator component), and it will stay on the path until certain conditions are met (more on this later).
If these conditions are met, the activator will be removed from the data path, and the service will transition to serve mode.
For example, when a service scales from/to zero, the activator is added to the data path by default.
This default setting often confuses users, as the activator will not be removed from the path unless enough capacity is available.
This is by intention, as one of the activator's roles (as mentioned above) is to offer back pressure capabilities so that a Knative service is not overloaded by incoming traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reverse the order of these two paragraphs -- the info on proxy/serve modes and activator on/off path is useful background (esp. for folks who don't know much about how the activator works), so I'd cover that first and then go into handling back pressure, etc.

The default pod autoscaler in Knative (KPA) is a sophisticated algorithm that uses metrics from pods to make scaling decisions.
Let's see in detail what happens when a new Knative service is created.

Once the user creates a new service the corresponding Knative reconciler creates a Knative `Configuration` and a Knative `Route` for that service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also a link to the components spec

Let's see in detail what happens when a new Knative service is created.

Once the user creates a new service the corresponding Knative reconciler creates a Knative `Configuration` and a Knative `Route` for that service.
Then the Configuration reconciler creates a `Revision` resource and the reconciler for the latter will create a PodAutoscaler(PA) resource along with the K8s deployment for the service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then the Configuration reconciler creates a `Revision` resource and the reconciler for the latter will create a PodAutoscaler(PA) resource along with the K8s deployment for the service.
Then the Configuration reconciler creates a `Revision` resource and the reconciler for the latter will create a `PodAutoscaler` (PA) resource along with the K8s deployment for the service.

Comment on lines +171 to +173
!!! note

The experiment was run on Minikube and the [hey](https://github.com/rakyll/hey) tool was used for generating the traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this note to right around the graphs, as that's the first place that results / data are mentioned


The experiment was run on Minikube and the [hey](https://github.com/rakyll/hey) tool was used for generating the traffic.

Initially activator when receives a request, sends stats to the autoscaler which tries to scale from zero based on some initial scale (default 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Initially activator when receives a request, sends stats to the autoscaler which tries to scale from zero based on some initial scale (default 1):
Initially when the activator receives a request, it sends stats to the autoscaler which tries to scale from zero based on some initial scale (default 1):

Comment on lines +231 to +232
Roughly the final desired number is (there is more logic that covers corner cases and checking against min/max scale limits)
derived from the dppc we saw earlier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Roughly the final desired number is (there is more logic that covers corner cases and checking against min/max scale limits)
derived from the dppc we saw earlier.
Roughly the final desired number is derived from the dppc we saw earlier (there is more logic that covers corner cases and checking against min/max scale limits).

Roughly the final desired number is (there is more logic that covers corner cases and checking against min/max scale limits)
derived from the dppc we saw earlier.

In this case the target value is 0.7*10=10. So we have for example for the panic window: dppc=ceil(19.874/7)=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this case the target value is 0.7*10=10. So we have for example for the panic window: dppc=ceil(19.874/7)=3
In this case the target value is 0.7*10=10. So we have for example for the panic window: `dppc=ceil(19.874/7)=3`

Comment on lines +252 to +259
Then when we reach the required pod count and metrics are stable we get EBC=3*10 - floor(19.968) - 10=0:

```
"timestamp": "2023-10-10T15:33:59.24118625Z",
"logger": "autoscaler",
"message": "PodCount=3 Total1PodCapacity=10.000 ObsStableValue=19.602 ObsPanicValue=19.968 TargetBC=10.000 ExcessBC=0.000",
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then when we reach the required pod count and metrics are stable we get EBC=3*10 - floor(19.968) - 10=0:
```
"timestamp": "2023-10-10T15:33:59.24118625Z",
"logger": "autoscaler",
"message": "PodCount=3 Total1PodCapacity=10.000 ObsStableValue=19.602 ObsPanicValue=19.968 TargetBC=10.000 ExcessBC=0.000",
```
Then when we reach the required pod count and metrics are stable we get:

"timestamp": "2023-10-10T15:33:59.24118625Z",
"logger": "autoscaler",
"message": "PodCount=3 Total1PodCapacity=10.000 ObsStableValue=19.602 ObsPanicValue=19.968 TargetBC=10.000 ExcessBC=0.000",


EBC=3*10 - floor(19.968) - 10=0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting is weird here due to the multiple blocks, but general idea was to move the EBC calc after the logs, to make the structure the same as the other sections

@skonto
Copy link
Contributor Author

skonto commented Mar 20, 2024

@psschwei thanks for the comments I will unhold so @dprotaso can reference the blog post for his Kubecon talk and then I will polish it in a follow up PR shortly.

/unhold

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 20, 2024
@knative-prow knative-prow bot merged commit 97d53f0 into knative:main Mar 20, 2024
19 checks passed
prushh pushed a commit to prushh/docs that referenced this pull request Apr 30, 2024
* demystifying activator on path

* some fixes

* Apply suggestions from code review - Roland

Co-authored-by: Roland Huß <[email protected]>

* Update blog/docs/articles/demystifying-activator-on-path.md

Co-authored-by: Roland Huß <[email protected]>

* update

* fix

* add some diagrams

* add timeline and several fixes

* updates

* address some comments

* add ref to the sample app

* grammar, typos and other fixes

* more grammar fixes

* change personal link

* make title specific

* more fixes

* explain a bit what app was used for tests

* address comments

---------

Co-authored-by: Roland Huß <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants