BLOG04: What Is Tainting?

BLOG03: What Is Tainting?

A taint is a rule applied to a node that says:

“Do NOT schedule pods here unless they explicitly tolerate this taint.”

It is used to protect nodes or restrict pod placement.

Basic Taint Format

kubectl taint nodes <node-name> key=value:effect

Effects:

NoSchedule → new pods cannot schedule
PreferNoSchedule → scheduler tries to avoid it, but not strict
NoExecute → evicts existing pods + blocks new pods

Part 1: Practical Hands-On Tainting Scenario

Scenario A — Reserve a node for a specific workload

Problem:

You have a powerful node (node3) and want only critical apps (e.g., billing, payments) to run on it.

Step 1 — Taint the node

kubectl taint node node3 role=critical:NoSchedule

Now no pod will run on node3 unless it tolerates this taint.

Step 2 — Toleration in pod spec

tolerations:
- key: "role"
  operator: "Equal"
  value: "critical"
  effect: "NoSchedule"

Outcome: Only pods with this toleration will schedule on that node. Other pods will avoid it completely.

Scenario B — Node dedicated for Logging / Monitoring

You want to dedicate node-log-1 only for logging stack (EFK, Loki, Prometheus).

Apply taint:

kubectl taint nodes node-log-1 dedicated=logging:NoSchedule

Logging DaemonSet tolerates it:

tolerations:
- key: "dedicated"
  value: "logging"
  operator: "Equal"
  effect: "NoSchedule"

Outcome:

Only logging workloads run on node-log-1.
Normal application pods stay away.

Scenario C — GPU Node Tainting (Very common)

GPU nodes must run only AI/ML workloads, not normal web apps.

Taint:

kubectl taint nodes gpu-node nvidia.com/gpu=true:NoSchedule

GPU training pods tolerate:

tolerations:
- key: "nvidia.com/gpu"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

Outcome:

Only GPU workloads land on GPU nodes.
Prevents accidental scheduling of normal workloads on expensive nodes.

Scenario D — Node maintenance (drain alternative)

You want to do some maintenance but don’t want new pods to land there yet.

Taint:

kubectl taint node node1 maintenance=true:NoSchedule

Pod scheduling stops but existing pods continue (until you drain).

This is used when:

You want to gracefully remove a node from service
You want a time gap before running kubectl drain

Scenario E — Force pods to evict immediately (NoExecute)

Use this for destructive operations:

kubectl taint node node5 maintenance=now:NoExecute

Outcome:

Existing pods are evicted immediately
New pods cannot schedule

Internally, this is similar to how Kubernetes handles NotReady node failure after a crash.

Scenario F — Spot instances (cloud autoscaling)

In AWS/GCP cloud, spot nodes are cheap but unstable.

To avoid stateful workloads going there:

Taint spot nodes:

kubectl taint nodes spot-node lifecycle=spot:NoSchedule

Only stateless workloads tolerate it:

tolerations:
- key: "lifecycle"
  operator: "Equal"
  value: "spot"
  effect: "NoSchedule"

Real-life usage:

EKS managed node groups automatically taint spot nodes
Only non-critical microservices run there

Summary Table — When We Use Taints in Real Life

Scenario

Why

Example

Reserve node for critical workloads

High CPU/Memory reserved

role=critical:NoSchedule

Dedicated logging/monitoring node

Avoid mixing logs with app traffic

dedicated=logging

GPU nodes

Prevent normal pods from using GPU nodes

nvidia.com/gpu=true

Maintenance mode

Stop scheduling but keep current pods

maintenance=true

Immediate eviction

Emergency, draining quickly

maintenance=now:NoExecute

Cloud spot nodes

Only stateless workloads

lifecycle=spot

Control-plane taints

Protect API server nodes

node-role.kubernetes.io/control-plane:NoSchedule

PreviousLAB03c: Upgrade & Disaster Recovery NextLAB04: Kubernetes Hands-On Lab: Taints & Tolerations

Last updated 1 month ago

Good night

hashtagBLOG03: What Is Tainting?

hashtagBasic Taint Format

hashtagPart 1: Practical Hands-On Tainting Scenario

hashtagScenario A — Reserve a node for a specific workload

hashtagProblem:

hashtagStep 1 — Taint the node

hashtagStep 2 — Toleration in pod spec

hashtagScenario B — Node dedicated for Logging / Monitoring

hashtagApply taint:

hashtagLogging DaemonSet tolerates it:

hashtagScenario C — GPU Node Tainting (Very common)

hashtagTaint:

hashtagGPU training pods tolerate:

hashtagScenario D — Node maintenance (drain alternative)

hashtagTaint:

hashtagScenario E — Force pods to evict immediately (NoExecute)

hashtagScenario F — Spot instances (cloud autoscaling)

hashtagTaint spot nodes:

hashtagOnly stateless workloads tolerate it:

hashtagSummary Table — When We Use Taints in Real Life

BLOG03: What Is Tainting?

Basic Taint Format

Part 1: Practical Hands-On Tainting Scenario

Scenario A — Reserve a node for a specific workload

Problem:

Step 1 — Taint the node

Step 2 — Toleration in pod spec

Scenario B — Node dedicated for Logging / Monitoring

Apply taint:

Logging DaemonSet tolerates it:

Scenario C — GPU Node Tainting (Very common)

Taint:

GPU training pods tolerate:

Scenario D — Node maintenance (drain alternative)

Taint:

Scenario E — Force pods to evict immediately (NoExecute)

Scenario F — Spot instances (cloud autoscaling)

Taint spot nodes:

Only stateless workloads tolerate it:

Summary Table — When We Use Taints in Real Life