BLOG09b: CPU & Memory Metrics — The Real Heartbeat of HPA

BLOG23: CPU & Memory Metrics — The Real Heartbeat of HPA

1. CPU & Memory Metrics — The Real Heartbeat of HPA

HPA listens to metrics like a stethoscope. But unlike humans, its heartbeats come in millicores and MiB.

CPU — the millicore universe

Kubernetes measures CPU in millicores (mCPU).

1000 millicores = 1 CPU core
A CPU request of 250m = 25% of one core
A limit of 2000m = 2 full cores

⚡ Why millicores?

Because pods sip CPU in tiny gulps, not whole cores. Millicores give Kubernetes fine-grained control.

CPU Utilization Formula (absolutely key!)

If a pod requests 500m CPU…

And it is using 250m according to metrics-server…

Then:

Utilization = (250 / 500) * 100 = 50%

HPA compares actual utilization to the target (e.g., 70%).

➡️ If actual > target → scale up ➡️ If actual < target → scale down

Crucial: CPU limits are not used for autoscaling. HPA compares usage vs requests only.

2. Memory — the MiB universe

Memory in Kubernetes is measured in:

Bytes
MiB (Mebibytes = 1,048,576 bytes)
GiB

Kubernetes uses MiB, not MB, to stay faithful to binary.

Memory is not compressible, unlike CPU.

If you exceed CPU, you slow down. If your application uses more memory than allocated, the pod will be OOMKilled (terminated due to Out-Of-Memory).

Memory autoscaling is rare

Unlike CPU:

Memory doesn’t fluctuate rapidly
Memory keeps increasing and never “goes down” unless app frees it
Memory spikes often mean memory leaks (not load)

HPA usually uses CPU-based scaling, not memory.

3. HPA Cooldown Period — the “patience timer”

HPA isn’t trigger-happy. It waits… watches… and then scales.

There are three key timing concepts:

A. Stabilization Window (default 300s for scale-down only)

This is the cooldown period.

When load drops, HPA waits 5 minutes before scaling down.
Prevents flapping like:

5 pods → 10 pods → 5 → 10 → 5 …

You can customize it:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 60

B. Scale-Up "Forgetting Window" (default 15 seconds)

This one is for aggressiveness.

If load increases:

HPA checks CPU every 15 seconds
If it sees sustained overload → instant scale-up

Scale-up is fast. Scale-down is slow.

This is on purpose.

C. Kubelet Metrics Delay (~15s → 30s)

Metrics-server scrapes kubelet:

Every 15 seconds
HPA reads metrics around every 15 seconds

So total time from CPU spike → HPA decision is:

15s (scrape) + 15s (HPA polling) = ~30 seconds

That’s why you feel a slight lag when spamming your prime number app.

4. Why HPA scaling works best with CPU & not memory

CPU scaling signals:

CPU increases when more users hit the API
CPU decreases when load drops
Very reactive
Good for “spiky” workloads

Memory:

Memory often increases due to cache or leaks
Doesn’t drop until GC frees it
Scaling on memory often leads to unnecessary pods

This is why production HPAs almost always use CPU.

5. The "Moving Average" Mental Model (How HPA thinks)

HPA doesn’t react to one measurement.

It reacts to:

Sustained demand (scale-up)
Sustained calm (scale-down)

6. Your Prime App Autoscaling Example

Your prime calculator app has superb CPU behavior:

The algorithm is CPU-bound
Every call generates predictable CPU load
Perfect for HPA

If your target is 70% CPU, and each pod uses:

~300m CPU under load
request = 200m

Then utilization = 150% → scale-up.

A storm of pods is born. 🌪️

When load drops:

They stay alive for 300 seconds
Then scale-down gradually

PreviousBLOG09a: Tiny Toy Application—A Pocket-Sized Traffic Booth NextBLOG09c: Prometheus Metric Collection Flow

Last updated 1 month ago

Good night

hashtagBLOG23: CPU & Memory Metrics — The Real Heartbeat of HPA

hashtag1. CPU & Memory Metrics — The Real Heartbeat of HPA

hashtagCPU — the millicore universe

hashtag⚡ Why millicores?

hashtagCPU Utilization Formula (absolutely key!)

hashtag2. Memory — the MiB universe

hashtagMemory is not compressible, unlike CPU.

hashtagMemory autoscaling is rare

hashtag3. HPA Cooldown Period — the “patience timer”

hashtagA. Stabilization Window (default 300s for scale-down only)

hashtagYou can customize it:

hashtagB. Scale-Up "Forgetting Window" (default 15 seconds)

hashtagC. Kubelet Metrics Delay (~15s → 30s)

hashtag4. Why HPA scaling works best with CPU & not memory

hashtagCPU scaling signals:

hashtagMemory:

hashtag5. The "Moving Average" Mental Model (How HPA thinks)

hashtag6. Your Prime App Autoscaling Example