BLOG20: Why We Need StatefulSet

BLOG08: Why we need statefulset?

What is a StatefulSet?

A StatefulSet is a Kubernetes workload controller used to manage stateful applications — applications that need stable identity, persistent storage, and ordered deployment.

Key Features

  1. Stable network identity → Pod names follow a predictable format: pod-name-0, pod-name-1, pod-name-2

  2. Stable storage → Each Pod gets its own PersistentVolume that is not deleted on pod restart.

  3. Ordered deployment & scaling → Pod-0 starts first, then Pod-1, etc.

  4. Ordered updates & termination

When to Use a StatefulSet? (Real Use Cases)

StatefulSet is ideal when each pod must keep its own data or identity.

Real Use Cases

  1. Databases

    • MySQL Cluster

    • PostgreSQL with streaming replication

    • MongoDB Replica Set

    • Cassandra, Redis, Etcd

  2. Distributed systems that need stable identity

    • Kafka brokers (kafka-0, kafka-1…)

    • Zookeeper quorum

    • Elasticsearch nodes

  3. Systems that maintain local state

    • Storage systems

    • Caches that cannot lose local data

    • Message Queue clusters

Example: In Kafka, each broker must have a fixed ID and storage volume. A StatefulSet ensures that if kafka-2 is restarted, it still comes back as kafka-2 with its old data.


What is a DaemonSet?

A DaemonSet ensures that one copy of a Pod runs on every node (or on selected nodes) in the cluster.

Key Features

  1. Schedules exactly 1 pod per node

  2. Automatically adds/removes pods as nodes join/leave

  3. Ideal for node-level agents

When to Use a DaemonSet? (Real Use Cases)

DaemonSets are for workloads that must run on every node.

Real Use Cases

  1. Log Collection Agents

    • Fluentd / Fluent Bit

    • Logstash

    • Filebeat

  2. Monitoring and Metrics

    • Prometheus Node Exporter

    • Datadog Agent

    • New Relic Infra Agent

  3. Networking Components

    • CNI plugins (Calico, Weave, Cilium)

    • Kube-proxy

  4. Security Agents

    • Falco

    • Anti-virus / node scanners

  5. Storage Drivers

    • Ceph Agent

    • CSI node plugin

Example: If you use Fluent Bit to collect logs from /var/log on every node, a DaemonSet ensures each node has a collector pod automatically.


StatefulSet vs DaemonSet (Simple Comparison)

Feature
StatefulSet
DaemonSet

Purpose

Stateful apps, maintain identity

Node-level agent per node

Pod Names

Fixed (app-0, app-1)

Same name pattern on each node

Storage

Persistent per pod

Usually no persistent storage

Scaling

Manual (replicas)

Auto—follows nodes

Examples

DBs, Kafka, Zookeeper

Logging, monitoring, networking


Quick, Practical Mnemonic

StatefulSet = Stable Identity DaemonSet = One Pod Per Node


What is “Stable Network Identity”?

It means a Pod gets a permanent hostname and DNS name that does not change, even if:

✔ The pod is deleted ✔ The pod is rescheduled to another node ✔ The cluster restarts

This is critical for apps that need to identify each peer in a cluster.

Example

If you deploy a StatefulSet named mysql with 3 replicas, Kubernetes creates:

Their DNS hostnames will be:

These DNS names never change as long as the StatefulSet exists.


Why Deployment/ReplicaSet CAN’T provide stable identity

A Deployment/ReplicaSet manages pods like cattle:

Pod Names Change

If a pod crashes, Deployment creates a new pod with a different name, for example:

These are random hashes, not predictable.

No Fixed DNS Entry

Each new pod gets a new IP, and because:

  • Pod IPs are ephemeral

  • Pod names change randomly

You cannot rely on any pod to have a consistent identity.

ReplicaSet purpose

ReplicaSet only ensures N running replicas. It doesn’t care which pod is “pod-0” or “pod-1”.


How StatefulSet Achieves Stable Network Identity

StatefulSet has two mechanisms:


1. Fixed Pod Names (Ordinal Indexing)

Pods are created sequentially:

If pod-1 restarts, it returns as exactly the same name:


2. Stable DNS via Headless Service

StatefulSets require a Headless Service (clusterIP: None).

This creates a DNS entry for each pod:

Even after restarts or node failures, this DNS remains valid.


Why do stateful apps need this?

Example 1: Cassandra

Nodes form a ring: Each node is responsible for a specific token range.

If node names changed, the ring breaks.


Example 2: Kafka

Each broker has a permanent ID:

If Kafka pods kept getting new names (like in a Deployment), consumers and producers would lose connection to the brokers.


Example 3: MongoDB Replica Set

ReplicaSet members include hostnames:

If pod identity changed, the replica config becomes invalid.


Summary Table

Feature
StatefulSet
Deployment/ReplicaSet

Pod Identity

Stable, predictable (app-0, app-1)

Random (app-xxxxx)

DNS

Per-pod DNS

Only service DNS

Storage

Per-pod persistent volume

Shared/ephemeral

Order

Ordered create/delete

No order

Use Case

Databases, message queues

Web apps, APIs


One-Line Answer

Stable network identity means pods get fixed names and DNS records. StatefulSet maintains this using ordered creation, fixed naming, and a headless service — something Deployment/ReplicaSet cannot do because they treat pods as interchangeable.

Last updated