aws · gonmmarques · Dec 21, 2024
@@ -98,7 +98,7 @@ Using node groups can help the underlying compute resources do the expected thin
 
 The Cluster Autoscaler can add and remove node capacity from a cluster based on new pods needing to be scheduled or nodes being underutilized. It does not take a wholistic view of pod placement after it has been scheduled to a node. If you are using the Cluster Autoscaler you should also look at the https://github.com/kubernetes-sigs/descheduler[Kubernetes descheduler] to avoid wasting capacity in your cluster.
 
-If you have 10 nodes in a cluster and each node is 60% utilized you are not using 40% of the provisioned capacity in the cluster. With the Cluster Autoscaler you can set the utilization threashold per node to 60%, but that would only try to scale down a single node after utilization dropped below 60%.
+If you have 10 nodes in a cluster and each node is 60% utilized you are not using 40% of the provisioned capacity in the cluster. With the Cluster Autoscaler you can set the utilization threshold per node to 60%, but that would only try to scale down a single node after utilization dropped below 60%.
 
 With the descheduler it can look at cluster capacity and utilization after pods have been scheduled or nodes have been added to the cluster. It attempts to keep the total capacity of the cluster above a specified threshold. It can also remove pods based on node taints or new nodes that join the cluster to make sure pods are running in their optimal compute environment. Note that, descheduler does not schedule replacement of evicted pods but relies on the default scheduler for that.
 

@@ -121,7 +121,7 @@ The controller is configured to manage AWS resources in region: "us-east-1"
 The ACK controller has been successfully installed and ACK can now be used to provision an Amazon Managed Service for Prometheus workspace.
 ----
 
-Let's now create a yaml file for provisioning the alert manager defnition and rule groups.
+Let's now create a yaml file for provisioning the alert manager definition and rule groups.
 Save the below file as `rulegroup.yaml`
 
 ----
@@ -250,4 +250,4 @@ image::mon_cw_metrics.png[CW_NW_Performance]
 We can clearly see that there were no packets dropped as the value is zero. If you are using Nitro-based instances, you can create a similar dashboard for `conntrack_allowance_available` and pro-actively monitor the connections in your EC2 instance. You can further extend this by configuring alerts in Amazon Managed Grafana to send notifications to Slack, SNS, Pagerduty etc.
 
 
-📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/networking/monitoring.adoc[Edit this page on GitHub]
+📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/networking/monitoring.adoc[Edit this page on GitHub]
@@ -38,7 +38,7 @@ You can remove nodes when they have no running workloads using the scale down th
 
 === Use pod disruption budgets and safe node shutdown
 
-Removing pods and nodes from a Kubernetes cluster requires controllers to make updates to multiple resources (e.g. EndpointSlices). Doing this frequently or too quickly can cause API server throttling and application outages as changes propogate to controllers. https://kubernetes.io/docs/concepts/workloads/pods/disruptions/[Pod Disruption Budgets] are a best practice to slow down churn to protect workload availability as nodes are removed or rescheduled in a cluster.
+Removing pods and nodes from a Kubernetes cluster requires controllers to make updates to multiple resources (e.g. EndpointSlices). Doing this frequently or too quickly can cause API server throttling and application outages as changes propagate to controllers. https://kubernetes.io/docs/concepts/workloads/pods/disruptions/[Pod Disruption Budgets] are a best practice to slow down churn to protect workload availability as nodes are removed or rescheduled in a cluster.
 
 == Use Client-Side Cache when running Kubectl
 
@@ -314,4 +314,4 @@ If you call the API without any arguments it will be the most resource intensive
 ----
 
 
-📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/scalability/control-plane.adoc[Edit this page on GitHub]
+📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/scalability/control-plane.adoc[Edit this page on GitHub]
@@ -210,7 +210,7 @@ We have seen EKS customers impacted by the quotas listed below for other AWS ser
 
 == AWS Request Throttling
 
-AWS services also implement request throttling to ensure that they remain performant and available for all customers. Simliar to Service Quotas, each AWS service maintains their own request throttling thresholds. Consider reviewing the respective AWS Service documentation if your workloads will need to quickly issue a large number of API calls or if you notice request throttling errors in your application.
+AWS services also implement request throttling to ensure that they remain performant and available for all customers. Similar to Service Quotas, each AWS service maintains their own request throttling thresholds. Consider reviewing the respective AWS Service documentation if your workloads will need to quickly issue a large number of API calls or if you notice request throttling errors in your application.
 
 EC2 API requests around provisioning EC2 network interfaces or IP addresses can encounter request throttling in large clusters or when clusters scale drastically. The table below shows some of the API actions that we have seen customers encounter request throttling from.
 You can review the EC2 rate limit defaults and the steps to request a rate limit increase in the https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html[EC2 documentation on Rate Throttling].
@@ -252,4 +252,4 @@ You can review the EC2 rate limit defaults and the steps to request a rate limit
 * In EKS environment, etcd storage limit is *8 GiB* as per https://etcd.io/docs/v3.5/dev-guide/limit/#storage-size-limit[upstream guidance]. Please monitor metric `etcd_db_total_size_in_bytes` to track etcd db size. You can refer to https://github.com/etcd-io/etcd/blob/main/contrib/mixin/mixin.libsonnet#L213-L240[alert rules] `etcdBackendQuotaLowSpace` and `etcdExcessiveDatabaseGrowth` to setup this monitoring.
 
 
-📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/scalability/quotas.adoc[Edit this page on GitHub]
+📝 https://github.com/aws/aws-eks-best-practices/tree/master/latest/bpg/scalability/quotas.adoc[Edit this page on GitHub]