Deploy a CPU‑intensive demo app (autoscale-probe) and a separate Grafana deployment.
Each component is exposed through its own OCI Layer‑7 Load Balancer on port 80.
- App:
http://<APP_LB_IP>/(port 80 → container 8080) - Health:
http://<APP_LB_IP>/healthz - Grafana:
http://<GRAFANA_LB_IP>/(port 80 → container 3000, default credsadmin/admin— change in prod)
The manifest includes: Metrics Server, Prometheus, HorizontalPodAutoscaler (HPA) (Ensure the OCI Cluster Autoscaler add‑on is Enabled).
- Overview
- Architecture
- What’s in the Manifest
- Prerequisites
- Deploy
- Grab the Load Balancer IPs
- Use the App and Grafana
- Sample Grafana Dashboard
- Generate Load & Watch Autoscaling
- Enable the OCI Cluster Autoscaler
- Troubleshooting
- Cleanup
This setup targets OKE and demonstrates HPA‑driven scaling of a CPU‑bound app while visualizing metrics in Grafana.
Grafana runs in its own Deployment and is published by a separate Service of type LoadBalancer.
Metrics sources used by the grafana bundled dashboard:
- CPU & Memory Utilization (per‑node + cluster): from kubelet cAdvisor (
container_*,machine_*). - Node readiness & capacity/allocatable: custom gauges exported by
autoscale-probe(k8s_node_*). - Pods Up & Pod Health: from Prometheus
up{job="autoscale-probe"}(labels added via relabeling).
+-----------------------------+ +-----------------------------+
| OCI LB (App) | | OCI LB (Grafana) |
Internet ----------->| Port 80 ---> / (App) | | Port 80 ---> / (Grafana) |
+-----------------------------+ +-----------------------------+
| |
Service: autoscale-probe-lb Service: grafana
(ns: autoscale) (ns: monitoring)
| |
+-------------------------+ +-------------------------+
| Pod: autoscale-probe | | Pod: grafana |
| Container: app (8080) | | Container: grafana |
+-------------------------+ +-------------------------+
Namespaces:
- autoscale : app + Services (LB + metrics) + HPA
- monitoring : Grafana + Prometheus + Grafana provisioning ConfigMaps
- kube-system : CertManager + metrics-server + Cluster Autoscaler (enable addons in OKE)
-
Namespace
autoscaleDeployment autoscale-probe(container: app)Service autoscale-probe-lb(type LoadBalancer, port 80 → app)Service autoscale-probe-metrics(ClusterIP for Prometheus scraping)HorizontalPodAutoscaler autoscale-probe-hpa(60% CPU, 1→20 replicas)- Cluster Autoscaler adds nodes when pods become unschedulable and removes them when nodes are underused.
-
Namespace
monitoringDeployment grafana+Service grafana(LoadBalancer on port 80)- Grafana provisioning
ConfigMap:grafana-datasource→ Prometheus datasourcegrafana-dashboard→ “OKE Autoscale” dashboard JSONgrafana-dash-provider→ dashboard file provider
Deployment prometheus+Service prometheus(ClusterIP)
-
Namespace
kube-systemcert-manager(enable the addon in OKE)metrics-server(enable the addon in OKE)Cluster Autoscaler(enable the addon in OKE)
- OKE cluster with the OCI Cloud Controller Manager (for LBs)
kubectlconfigured to point at your cluster- (Optional)
heyfor load generation - Egress access for image pulls (ghcr.io, grafana, prometheus, k8s images)
Security: Grafana defaults to
admin/adminin this demo. For production, inject credentials via aSecretand env, or use an IdP.
Clone repo:
git clone https://github.com/cj667113/OCI_K8_AUTOSCALING_APP.git
cd OCI_K8_AUTOSCALING_APP
kubectl apply -f oke-autoscale.yamlkubectl -n autoscale get svc autoscale-probe-lb -w
kubectl -n monitoring get svc grafana -wWhen EXTERNAL-IP appears, use those as APP_LB_IP and GRAFANA_LB_IP.
App (root) : http://<APP_LB_IP>/
Health : http://<APP_LB_IP>/healthz
Grafana (UI) : http://<GRAFANA_LB_IP>/ (admin / admin)
Prometheus is internal (ClusterIP):
kubectl -n monitoring get svc prometheusIf you update the dashboard ConfigMap and don’t see changes, reload Grafana:
kubectl -n monitoring rollout restart deploy/grafana
Grafana dashboard included via ConfigMaps (see
grafana-dashboard and grafana-datasource in the manifest).
The HPA targets 60% CPU with minReplicas: 1, maxReplicas: 20.
Generate sustained CPU load for 500s with concurrency 72:
hey -z 500s -c 72 -disable-keepalive "http://<APP_LB_IP>/burn?cpu_ms=100"Observe metrics and scaling:
watch -n 1 kubectl top pods -A
kubectl -n autoscale get hpa autoscale-probe
kubectl get nodes -wDocs:
OCI CLI example:
oci ce cluster update-addon --addon-name ClusterAutoscaler --from-json file://<path-to-config-file> --cluster-id <cluster-ocid>
kubectl -n kube-system rollout restart deploy/cluster-autoscalerTail autoscaler logs:
kubectl -n kube-system logs -f deploy/cluster-autoscaler | egrep -i "scale-down|unneeded|removing|utilization|NoScaleDown"No EXTERNAL-IP on Service
kubectl -n autoscale describe svc autoscale-probe-lbkubectl -n monitoring describe svc grafana- Verify OCI CCM is running and LB quota/permissions are OK
- Ensure subnet/security lists allow inbound 80
Grafana not loading
- Confirm security lists / firewall for
<GRAFANA_LB_IP>:80 kubectl -n monitoring logs deploy/grafana
HPA shows unknown metrics
kubectl get apiservices | grep metrics→v1beta1.metrics.k8s.iomust be Availablekubectl top pods -Ashould return CPU/Memory (metrics-server healthy)kubectl -n autoscale describe hpa autoscale-probefor details
kubectl delete -f oke-autoscale.yamlkubectl apply -f oke-autoscale.yaml
kubectl -n kube-system rollout restart deploy/cluster-autoscaler
kubectl get nodes
watch -n 1 kubectl top pods -A
kubectl get pods -n kube-system
kubectl get pods -n autoscale
kubectl get pods -n monitoring
kubectl top pods -A | head
kubectl -n kube-system logs -f deploy/cluster-autoscaler | egrep -i "scale-down|unneeded|removing|utilization|NoScaleDown"
kubectl get nodes -w
# Load test (replace APP_LB_IP)
hey -z 500s -c 72 -disable-keepalive "http://<APP_LB_IP>/burn?cpu_ms=100"Enjoy! 🎉