Horizontal pod autoscaling(HPA) in Azure aks

Hey folks!! its been a couple of months that couldn’t do a writeup on a topic. Having been closely watching the Rise and the Rise of Kubernetes based solutions off late I found the Auto scaling concepts pretty exciting.

There are 3 different ways of scaling in kuberenetes

  1. Horizontal Pod AutoScaler(HPA) – Declarative Pod Resource Request based scaling
  2. Cluster Node based scaling – Cluster Auto Scaler
  3. Vertical Pod AutoScaler(VPA) – Automatic scaling of pods based on resource usage

The Horizontal Pod Autoscaler(HPA) automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). 

For the scope of this blog lets have a look at how HPA works on Azure AKS. To show practically how HPA works on AKS I have publicly available e-commerce reference application eShopOnContainers here. eShopOnContainers is comprised of a microservice based e-commerce website deployed on docker containers and can be tested on Linux or windows based virtual machines.

Below are some key considerations based on which the application was deployed on Azure AKS.

  • Helm chart based application deployment
  • Nginx ingress containers as API Gateways or load balancer.
  • Grafana and Prometheus on containers being used for monitoring the resource usage.
  • SQL and Redis on containers for data store and cart management
  • Linux based Azure VM(Ds2 v2) for both system and worker nodes
Credits to Original Reference Architecture of eShopOnContainers. Improvised with Monitoring concepts

Below are the steps i have followed to deploy the AKS cluster

  1. Azure AKS cluster with Kubernetes 1.18.10 version
  2. Created separate namespaces for application and monitoring
  3. Install the AKS cluster as per the instructions here https://github.com/dotnet-architecture/eShopOnContainers/wiki/Deploy-to-Azure-Kubernetes-Service-(AKS)#create-the-kubernetes-cluster-in-aks
  4. Deploy public images from DockerHub here https://github.com/dotnet-architecture/eShopOnContainers/wiki/Deploy-to-Azure-Kubernetes-Service-(AKS)#install-eshoponcontainers-using-helm
  5. Install Grafana using below CLI scripting
$dnsName = $(az aks show  --name az-sea-aks-cluster --resource-group az-sea-aks-eshop-rg --query 'addonProfiles.httpApplicationRouting.config.HTTPApplicationRoutingZoneName' -o tsv)

helm install stable/grafana --generate-name --set "service.type=LoadBalancer,persistence.enabled=true,persistence.size=10Gi,persistence.accessModes[0]=ReadWriteOnce,plugins=grafana-azure-monitor-datasource\,grafana-kubernetes-app,ingress.enabled=true,ingress.annotations.kubernetes\.io/ingress\.class=addon-http-application-routing,ingress.hosts[0]=grafana.$dnsName" --namespace monitoring

5. Install Prometheus using below CLI scripting

helm install stable/prometheus --namespace monitoring --set rbac.create=true --generate-name

Now comes the fun part of implementing HPA!!. I have chosen the webmvc container that was originally deployed as a part of steps 3 & 4 above to implement HPA. Below are points to note before implementing HPA

  • Ensure all pods have resource requests specified
  • Install metrics-server(already installed above)
  • Configure custom or external metrics(metric configured in values.yml)
  • Configure cool-down period( set stabilizationWindowSeconds flag in hpa.yaml)

Steps to deploy HPA

Create a template yaml file for HPA and save the file in the .\deploy\k8s\helm\webmvc\templates folder

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "webmvc.fullname" . }}
  labels:
    app: {{ template "webmvc.name" . }}
    chart: {{ template "webmvc.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "webmvc.fullname" . }}
  minReplicas: {{ .Values.hpa.minReplicas }}
  maxReplicas: {{ .Values.hpa.maxReplicas }}
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: 
          type: Utilization
          averageUtilization: {{ .Values.hpa.targetCPUUtilizationPercentage }}
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Add the below section the values.yaml file here .\deploy\k8s\helm\webmvc for declaratively setting the cpu Limits and Requests. Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.

hpa:
  enabled: true
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

resources: 
  limits: 
    cpu: 250m
    memory: 256Mi
  requests: 
    cpu: 100m
    memory: 128Mi

Since we are setting the targetCPUUtilizationPercentage to 50% once the CPU utilization crosses 50% the scaling of pods starts kicks in creating the pods mentioned as per maxReplicas tag above.

Now I’m thinking how do we monitor the Pod Autoscaling?? – Using Grafana dashboards

I login to Grafana system(i will have separate blog post detailing setup of grafana dashboards for AKS) but for a brief explanation I used the pre-built dashboard here.

cluster statistics in grafana dashboard
webmvc Pod has been filtered for clarity

I use Apache Jmeter to perform a load test on the website deployed on AKS with /webmvc prefix here are the results that show up in the next 5 mins and webmvc containers gets scaled to 5 pods as shown below.

Within a span of 10 mins after the load test is completed, the scale down behaviour policy specified in the HPA.yaml with stabilizationWindowSeconds and scales the pods down to 1 from 5.

So this effectively means that the HPA is an effective scaling solution if you have declarative way of deploying resource requests to various microservices which are part of your application and also manage and monitor the scale up and scale down of pods easily through tools such as Grafana and Prometheus.

Happy Learning!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s