Kubernetes Horizontal Pod Autoscaler — Custom Metrics

Configure Kubernetes HPA with custom application metrics. Scale on queue depth, response latency, or business KPIs — not just CPU.

by Promptsy Team

598 views224 copies

+99

DevOps & Infrastructure #hpa #kubernetes #prometheus #scaling #custom-metrics #autoscaling

Prompt Discussion

Prompt

## Task
Kubernetes HPA configuration using custom application metrics from Prometheus.

## Requirements
- Kubernetes 1.28+
- Prometheus + Prometheus Adapter
- Custom metrics from application

## Manifests to Generate

### 1. Application Deployment with metrics endpoint
```yaml
# Pod exposes /metrics endpoint with:
# - http_request_duration_seconds (histogram)
# - message_queue_depth (gauge)
# - active_connections (gauge)
# - business_orders_per_minute (counter)
```

### 2. ServiceMonitor (for Prometheus)
```yaml
# Scrape the app's /metrics every 15s
```

### 3. Prometheus Adapter ConfigMap
```yaml
# Map Prometheus metrics to Kubernetes custom metrics API
# So HPA can query: custom.metrics.k8s.io/v1beta1
```

### 4. HPA with multiple metrics
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_request_duration_seconds_p95
        target:
          type: AverageValue
          averageValue: 200m  # Scale up if P95 > 200ms
    - type: Pods
      pods:
        metric:
          name: message_queue_depth
        target:
          type: AverageValue
          averageValue: "10"  # Scale up if queue > 10 per pod
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50    # Max 50% increase per minute
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scaling down
```

## Implementation Notes
1. Always set scaleDown stabilization (prevents flapping)
2. Use multiple metrics — HPA scales to the HIGHEST demand
3. Test with load generator: verify scaling triggers at expected thresholds
4. Set resource requests accurately — HPA uses them for calculations
5. Include PodDisruptionBudget: minAvailable 50%
6. Monitor HPA decisions: kubectl describe hpa

Compatible models

Copilot (GitHub)Claude Code

Gallery

No gallery images yet.

Version history

Discussion

Start discussion→

No comments yet. Start the discussion