BLOG04: What Is Tainting?

BLOG03: What Is Tainting?

A taint is a rule applied to a node that says:

“Do NOT schedule pods here unless they explicitly tolerate this taint.”

It is used to protect nodes or restrict pod placement.


Basic Taint Format

kubectl taint nodes <node-name> key=value:effect

Effects:

  • NoSchedule → new pods cannot schedule

  • PreferNoSchedule → scheduler tries to avoid it, but not strict

  • NoExecute → evicts existing pods + blocks new pods


Part 1: Practical Hands-On Tainting Scenario

Scenario A — Reserve a node for a specific workload

Problem:

You have a powerful node (node3) and want only critical apps (e.g., billing, payments) to run on it.

Step 1 — Taint the node

Now no pod will run on node3 unless it tolerates this taint.

Step 2 — Toleration in pod spec

Outcome: Only pods with this toleration will schedule on that node. Other pods will avoid it completely.


Scenario B — Node dedicated for Logging / Monitoring

You want to dedicate node-log-1 only for logging stack (EFK, Loki, Prometheus).

Apply taint:

Logging DaemonSet tolerates it:

Outcome:

  • Only logging workloads run on node-log-1.

  • Normal application pods stay away.


Scenario C — GPU Node Tainting (Very common)

GPU nodes must run only AI/ML workloads, not normal web apps.

Taint:

GPU training pods tolerate:

Outcome:

  • Only GPU workloads land on GPU nodes.

  • Prevents accidental scheduling of normal workloads on expensive nodes.


Scenario D — Node maintenance (drain alternative)

You want to do some maintenance but don’t want new pods to land there yet.

Taint:

Pod scheduling stops but existing pods continue (until you drain).

This is used when:

  • You want to gracefully remove a node from service

  • You want a time gap before running kubectl drain


Scenario E — Force pods to evict immediately (NoExecute)

Use this for destructive operations:

Outcome:

  • Existing pods are evicted immediately

  • New pods cannot schedule

Internally, this is similar to how Kubernetes handles NotReady node failure after a crash.


Scenario F — Spot instances (cloud autoscaling)

In AWS/GCP cloud, spot nodes are cheap but unstable.

To avoid stateful workloads going there:

Taint spot nodes:

Only stateless workloads tolerate it:

Real-life usage:

  • EKS managed node groups automatically taint spot nodes

  • Only non-critical microservices run there


Summary Table — When We Use Taints in Real Life

Scenario
Why
Example

Reserve node for critical workloads

High CPU/Memory reserved

role=critical:NoSchedule

Dedicated logging/monitoring node

Avoid mixing logs with app traffic

dedicated=logging

GPU nodes

Prevent normal pods from using GPU nodes

nvidia.com/gpu=true

Maintenance mode

Stop scheduling but keep current pods

maintenance=true

Immediate eviction

Emergency, draining quickly

maintenance=now:NoExecute

Cloud spot nodes

Only stateless workloads

lifecycle=spot

Control-plane taints

Protect API server nodes

node-role.kubernetes.io/control-plane:NoSchedule

Last updated