VPA + HPA Together: The Right Way to Autoscale Both
VPA adjusts pod resources (CPU/memory). HPA adjusts pod count. Using them together is tricky - both can fight over CPU. Here’s how to make them work together.
TL;DR
- VPA = vertical scaling (resource requests/limits)
- HPA = horizontal scaling (replica count)
- Don’t let both scale on CPU
- VPA for memory, HPA for CPU (recommended)
- Or use VPA in recommendation-only mode
The Problem
HPA: "CPU is high, add more replicas"
VPA: "CPU is high, increase CPU requests"
Both trigger → pods get more CPU AND more replicas
→ Massive over-provisioning
Solution 1: Split by Metric
VPA scales memory, HPA scales on CPU:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: api
controlledResources: ["memory"] # Only memory
minAllowed:
memory: 128Mi
maxAllowed:
memory: 4Gi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu # Only CPU
target:
type: Utilization
averageUtilization: 70
Solution 2: VPA Recommendations Only
VPA in “Off” mode provides recommendations without acting:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendations only
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 4
memory: 8Gi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Apply VPA recommendations during maintenance windows:
#!/bin/bash
# apply-vpa-recommendations.sh
VPA_NAME=$1
NAMESPACE=$2
REC=$(kubectl get vpa $VPA_NAME -n $NAMESPACE -o jsonpath='{.status.recommendation.containerRecommendations[0].target}')
CPU=$(echo $REC | jq -r '.cpu')
MEMORY=$(echo $REC | jq -r '.memory')
echo "Recommended: cpu=$CPU, memory=$MEMORY"
kubectl patch deployment $VPA_NAME -n $NAMESPACE --type=json -p="[
{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources/requests/cpu\", \"value\": \"$CPU\"},
{\"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/resources/requests/memory\", \"value\": \"$MEMORY\"}
]"
Solution 3: KEDA with Custom Metrics
Use KEDA for event-driven scaling, VPA for resources:
# VPA for resources
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: worker-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
updatePolicy:
updateMode: Auto
---
# KEDA for queue-based scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: rabbitmq
metadata:
host: amqp://rabbitmq.default.svc:5672
queueName: jobs
queueLength: "10"
Solution 4: Goldilocks
Goldilocks runs VPA in recommendation mode and provides a dashboard:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks --create-namespace
# Label namespace to enable
kubectl label ns production goldilocks.fairwinds.com/enabled=true
Best Practices
WORKLOAD TYPE RECOMMENDATION
============= ==============
Stateless API HPA on CPU, VPA on memory
Batch/Workers KEDA on queue depth, VPA on both
Memory-intensive VPA on memory, HPA on custom metric
GPU VPA Off (fixed resources)
Configuration Matrix
# CPU-bound (web servers)
VPA: memory only, Auto mode
HPA: cpu at 70%
# Memory-bound (caches, JVM)
VPA: memory only, Auto mode
HPA: custom metric (requests/sec)
# Queue workers
VPA: both, Auto mode
KEDA: queue length
# Mixed workloads
VPA: Off mode (recommendations)
HPA: cpu at 70%
Apply VPA recommendations weekly
Monitoring
# Prometheus rules
groups:
- name: autoscaling
rules:
- alert: VPARecommendationDrift
expr: |
abs(
kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
-
kube_pod_container_resource_requests{resource="cpu"}
) / kube_pod_container_resource_requests{resource="cpu"} > 0.5
for: 24h
labels:
severity: info
annotations:
summary: "VPA recommendation differs >50% from current requests"
- alert: HPAAtMaxReplicas
expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
for: 30m
labels:
severity: warning
annotations:
summary: "{{ $labels.horizontalpodautoscaler }} at max replicas"
Full Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3 # Will be managed by HPA
template:
spec:
containers:
- name: api
resources:
requests:
cpu: 100m # Will be managed by VPA (recommendations)
memory: 256Mi # Will be managed by VPA
limits:
memory: 512Mi # Will be managed by VPA
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: api
controlledResources: ["memory"]
minAllowed:
memory: 128Mi
maxAllowed:
memory: 2Gi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 30
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 0
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
References
- VPA Docs: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
- HPA Docs: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
- KEDA: https://keda.sh
- Goldilocks: https://goldilocks.docs.fairwinds.com