VPA + HPA Together: The Right Way to Autoscale Both

VPA adjusts pod resources (CPU/memory). HPA adjusts pod count. Using them together is tricky - both can fight over CPU. Here’s how to make them work together.

TL;DR

VPA = vertical scaling (resource requests/limits)
HPA = horizontal scaling (replica count)
Don’t let both scale on CPU
VPA for memory, HPA for CPU (recommended)
Or use VPA in recommendation-only mode

The Problem

HPA: "CPU is high, add more replicas"
VPA: "CPU is high, increase CPU requests"

Both trigger → pods get more CPU AND more replicas
→ Massive over-provisioning

Solution 1: Split by Metric

VPA scales memory, HPA scales on CPU:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources: ["memory"]  # Only memory
        minAllowed:
          memory: 128Mi
        maxAllowed:
          memory: 4Gi

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu  # Only CPU
        target:
          type: Utilization
          averageUtilization: 70

Solution 2: VPA Recommendations Only

VPA in “Off” mode provides recommendations without acting:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Apply VPA recommendations during maintenance windows:

#!/bin/bash
# apply-vpa-recommendations.sh

VPA_NAME=$1
NAMESPACE=$2

REC=$(kubectl get vpa $VPA_NAME -n $NAMESPACE -o jsonpath='{.status.recommendation.containerRecommendations[0].target}')

CPU=$(echo $REC | jq -r '.cpu')
MEMORY=$(echo $REC | jq -r '.memory')

echo "Recommended: cpu=$CPU, memory=$MEMORY"

kubectl patch deployment $VPA_NAME -n $NAMESPACE --type=json -p="[
  {\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources/requests/cpu\", \"value\": \"$CPU\"},
  {\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources/requests/memory\", \"value\": \"$MEMORY\"}
]"

Solution 3: KEDA with Custom Metrics

Use KEDA for event-driven scaling, VPA for resources:

# VPA for resources
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: worker-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  updatePolicy:
    updateMode: Auto

---
# KEDA for queue-based scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: queue-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq.default.svc:5672
        queueName: jobs
        queueLength: "10"

Solution 4: Goldilocks

Goldilocks runs VPA in recommendation mode and provides a dashboard:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks --create-namespace

# Label namespace to enable
kubectl label ns production goldilocks.fairwinds.com/enabled=true

Best Practices

WORKLOAD TYPE           RECOMMENDATION
=============           ==============
Stateless API           HPA on CPU, VPA on memory
Batch/Workers           KEDA on queue depth, VPA on both
Memory-intensive        VPA on memory, HPA on custom metric
GPU                     VPA Off (fixed resources)

Configuration Matrix

# CPU-bound (web servers)
VPA: memory only, Auto mode
HPA: cpu at 70%

# Memory-bound (caches, JVM)
VPA: memory only, Auto mode
HPA: custom metric (requests/sec)

# Queue workers
VPA: both, Auto mode
KEDA: queue length

# Mixed workloads
VPA: Off mode (recommendations)
HPA: cpu at 70%
Apply VPA recommendations weekly

Monitoring

# Prometheus rules
groups:
  - name: autoscaling
    rules:
      - alert: VPARecommendationDrift
        expr: |
          abs(
            kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
            -
            kube_pod_container_resource_requests{resource="cpu"}
          ) / kube_pod_container_resource_requests{resource="cpu"} > 0.5
        for: 24h
        labels:
          severity: info
        annotations:
          summary: "VPA recommendation differs >50% from current requests"

      - alert: HPAAtMaxReplicas
        expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "{{ $labels.horizontalpodautoscaler }} at max replicas"

Full Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3  # Will be managed by HPA
  template:
    spec:
      containers:
        - name: api
          resources:
            requests:
              cpu: 100m     # Will be managed by VPA (recommendations)
              memory: 256Mi # Will be managed by VPA
            limits:
              memory: 512Mi # Will be managed by VPA

---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources: ["memory"]
        minAllowed:
          memory: 128Mi
        maxAllowed:
          memory: 2Gi

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 30
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 0
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

References

VPA Docs: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
HPA Docs: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
KEDA: https://keda.sh
Goldilocks: https://goldilocks.docs.fairwinds.com

======================================== VPA + HPA + KEDA

Right-size resources. Scale replicas. Together.