Configure Kubernetes HPA with custom application metrics. Scale on queue depth, response latency, or business KPIs — not just CPU.
## Task
Kubernetes HPA configuration using custom application metrics from Prometheus.
## Requirements
- Kubernetes 1.28+
- Prometheus + Prometheus Adapter
- Custom metrics from application
## Manifests to Generate
### 1. Application Deployment with metrics endpoint
```yaml
# Pod exposes /metrics endpoint with:
# - http_request_duration_seconds (histogram)
# - message_queue_depth (gauge)
# - active_connections (gauge)
# - business_orders_per_minute (counter)
```
### 2. ServiceMonitor (for Prometheus)
```yaml
# Scrape the app's /metrics every 15s
```
### 3. Prometheus Adapter ConfigMap
```yaml
# Map Prometheus metrics to Kubernetes custom metrics API
# So HPA can query: custom.metrics.k8s.io/v1beta1
```
### 4. HPA with multiple metrics
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_request_duration_seconds_p95
target:
type: AverageValue
averageValue: 200m # Scale up if P95 > 200ms
- type: Pods
pods:
metric:
name: message_queue_depth
target:
type: AverageValue
averageValue: "10" # Scale up if queue > 10 per pod
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50 # Max 50% increase per minute
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
```
## Implementation Notes
1. Always set scaleDown stabilization (prevents flapping)
2. Use multiple metrics — HPA scales to the HIGHEST demand
3. Test with load generator: verify scaling triggers at expected thresholds
4. Set resource requests accurately — HPA uses them for calculations
5. Include PodDisruptionBudget: minAvailable 50%
6. Monitor HPA decisions: kubectl describe hpaNo gallery images yet.