Horizontal pod autoscaling(HPA) in Azure aks

Hey folks!! its been a couple of months that couldn’t do a writeup on a topic. Having been closely watching the Rise and the Rise of Kubernetes based solutions off late I found the Auto scaling concepts pretty exciting.
There are 3 different ways of scaling in kuberenetes
- Horizontal Pod AutoScaler(HPA) – Declarative Pod Resource Request based scaling
- Cluster Node based scaling – Cluster Auto Scaler
- Vertical Pod AutoScaler(VPA) – Automatic scaling of pods based on resource usage
The Horizontal Pod Autoscaler(HPA) automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
For the scope of this blog lets have a look at how HPA works on Azure AKS. To show practically how HPA works on AKS I have publicly available e-commerce reference application eShopOnContainers here. eShopOnContainers is comprised of a microservice based e-commerce website deployed on docker containers and can be tested on Linux or windows based virtual machines.
Below are some key considerations based on which the application was deployed on Azure AKS.
- Helm chart based application deployment
- Nginx ingress containers as API Gateways or load balancer.
- Grafana and Prometheus on containers being used for monitoring the resource usage.
- SQL and Redis on containers for data store and cart management
- Linux based Azure VM(Ds2 v2) for both system and worker nodes

Below are the steps i have followed to deploy the AKS cluster
- Azure AKS cluster with Kubernetes 1.18.10 version
- Created separate namespaces for application and monitoring
- Install the AKS cluster as per the instructions here https://github.com/dotnet-architecture/eShopOnContainers/wiki/Deploy-to-Azure-Kubernetes-Service-(AKS)#create-the-kubernetes-cluster-in-aks
- Deploy public images from DockerHub here https://github.com/dotnet-architecture/eShopOnContainers/wiki/Deploy-to-Azure-Kubernetes-Service-(AKS)#install-eshoponcontainers-using-helm
- Install Grafana using below CLI scripting
$dnsName = $(az aks show --name az-sea-aks-cluster --resource-group az-sea-aks-eshop-rg --query 'addonProfiles.httpApplicationRouting.config.HTTPApplicationRoutingZoneName' -o tsv)
helm install stable/grafana --generate-name --set "service.type=LoadBalancer,persistence.enabled=true,persistence.size=10Gi,persistence.accessModes[0]=ReadWriteOnce,plugins=grafana-azure-monitor-datasource\,grafana-kubernetes-app,ingress.enabled=true,ingress.annotations.kubernetes\.io/ingress\.class=addon-http-application-routing,ingress.hosts[0]=grafana.$dnsName" --namespace monitoring
5. Install Prometheus using below CLI scripting
helm install stable/prometheus --namespace monitoring --set rbac.create=true --generate-name
Now comes the fun part of implementing HPA!!. I have chosen the webmvc container that was originally deployed as a part of steps 3 & 4 above to implement HPA. Below are points to note before implementing HPA
- Ensure all pods have resource requests specified
- Install metrics-server(already installed above)
- Configure custom or external metrics(metric configured in values.yml)
- Configure cool-down period( set stabilizationWindowSeconds flag in hpa.yaml)
Steps to deploy HPA
Create a template yaml file for HPA and save the file in the .\deploy\k8s\helm\webmvc\templates folder
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "webmvc.fullname" . }}
labels:
app: {{ template "webmvc.name" . }}
chart: {{ template "webmvc.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webmvc.fullname" . }}
minReplicas: {{ .Values.hpa.minReplicas }}
maxReplicas: {{ .Values.hpa.maxReplicas }}
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.hpa.targetCPUUtilizationPercentage }}
behavior:
scaleDown:
stabilizationWindowSeconds: 300
Add the below section the values.yaml file here .\deploy\k8s\helm\webmvc for declaratively setting the cpu Limits and Requests. Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
hpa:
enabled: true
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 50
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
Since we are setting the targetCPUUtilizationPercentage to 50% once the CPU utilization crosses 50% the scaling of pods starts kicks in creating the pods mentioned as per maxReplicas tag above.
Now I’m thinking how do we monitor the Pod Autoscaling?? – Using Grafana dashboards
I login to Grafana system(i will have separate blog post detailing setup of grafana dashboards for AKS) but for a brief explanation I used the pre-built dashboard here.


I use Apache Jmeter to perform a load test on the website deployed on AKS with /webmvc prefix here are the results that show up in the next 5 mins and webmvc containers gets scaled to 5 pods as shown below.

Within a span of 10 mins after the load test is completed, the scale down behaviour policy specified in the HPA.yaml with stabilizationWindowSeconds and scales the pods down to 1 from 5.

So this effectively means that the HPA is an effective scaling solution if you have declarative way of deploying resource requests to various microservices which are part of your application and also manage and monitor the scale up and scale down of pods easily through tools such as Grafana and Prometheus.
Happy Learning!!