BLOG07: What is Kubernetes Operator

BLOG16: What is Kubernetes Operator Pattern?

In Kubernetes, operator and controller are core patterns used to automate application and infrastructure lifecycle management. They are often confused, but each has a specific meaning.

This document provides a clear explanation of these concepts.


What is the Kubernetes Controller?

A controller is a control-loop program inside Kubernetes that continuously monitors the cluster state and tries to move it toward the desired state.

How a Controller Works

Every controller follows the reconcile loop pattern:

  1. Observe: Read the desired state from etcd (via API server).

  2. Compare: Check the actual state in the cluster.

  3. Act: If they differ, make changes to move actual → desired.

Built-in controllers in Kubernetes

Examples:

  • Deployment controller

  • ReplicaSet controller

  • StatefulSet controller

  • Job controller

  • Node controller

  • PV/PVC controller

All of these continuously ensure Kubernetes resources work as expected.

Simple Example

For example:

The ReplicaSet controller ensures there are always 3 pods running. If 1 pod dies → controller creates a new one.


What is the Kubernetes Operator Pattern?

An Operator is a special kind of controller that manages application logic beyond what built-in controllers can do.

It extends Kubernetes using Custom Resource Definitions (CRDs).

An Operator = CRD + Custom Controller

Component
Purpose

CRD

Defines a new Kubernetes resource type (e.g., MySQLCluster)

Controller

Contains logic to reconcile that custom resource

The Operator Pattern automates complex lifecycle tasks like:

  • Install / configure an application

  • Upgrading application versions

  • Backups & restores

  • Failover & replication management

  • Scaling decisions

  • Health checking & self-healing

Basically, an operator codifies human operational knowledge.

Real Examples of Operators

Operator
What It Manages

Prometheus Operator

Deploy, upgrade, manage Prometheus & Alertmanager

ElasticSearch Operator

Cluster lifecycle & scaling

Cert-Manager Operator

Manage TLS certificates

Vault Operator

Unseal, manage Vault lifecycle

MongoDB Operator

Mongo sharding, replication, backups


Key Difference: Controller vs Operator

Feature
Controller
Operator

Scope

Kubernetes built-in resources

Custom application-specific resources

Managed by

Kubernetes core

Custom code (written by you or vendors)

Uses CRD

No

Yes

Complexity

Simple (replicas, scheduling)

High (backups, upgrades, failovers, etc.)

Purpose

Maintain basic K8s objects

Automate application lifecycle


Example: MySQL Operator

Desired State (defined by user):

Actual State (managed by Operator):

  • Creates StatefulSets internally

  • Configures replication

  • Sets up PersistentVolumes

  • Manages pod restarts

  • Performs rolling upgrades

  • Takes backups

This automation would normally require a human DBA.


Summary

Controller

  • Reconciliation loop

  • Manages built-in Kubernetes objects

  • Core part of Kubernetes

Operator

  • Extends Kubernetes with CRDs

  • Includes a custom controller with domain-specific knowledge

  • Automates full lifecycle of complex applications (DBs, Observability, Security tools, etc.)


It’s called the Operator Pattern because it models human operators who traditionally run production systems — and encodes their operational knowledge into software.

The following sections explain the origin and reasoning:


Why the name "Operator"?

In traditional infrastructure and application management, there are always people called:

  • system operators

  • database operators

  • ops engineers

  • SREs

These human operators perform actions like:

  • installing the application

  • upgrading it

  • backing it up

  • tuning configuration

  • monitoring & healing failures

Kubernetes wanted a way to automate these same tasks using software.

So they created a design pattern that mimics a human operator’s behaviour → and named it the Operator Pattern.


What makes it a "pattern"?

A design pattern is a reusable way of solving a class of problems.

The Operator Pattern provides the reusable idea:

“Model applications as custom resources and write controllers that reconcile them to desired state.”

This combination (CRD + Controller + Application Logic) repeats across many use-cases → therefore, it's a pattern like Singleton, Observer, etc.


What did Operators originally solve? (History)

Before Operators:

  • Kubernetes could manage pods, replica sets, deployments

  • But it could NOT manage application logic, like:

    • bootstrap DB cluster

    • restore backup

    • rotate certificates

    • shard a database

    • failover a primary node

These required a human operator.

Red Hat engineers (CoreOS team) in 2016 introduced:

"Let’s embed human operator knowledge in a controller"

Hence the term Operator.


The core meaning

  • It replaces/augments human operators

  • Encodes operational intelligence into the cluster

  • Automates lifecycle tasks that normally require expertise

  • Uses Kubernetes-native APIs (CRDs)


Example to show the naming logic

Human Operator Task

A MySQL DBA would:

  • install MySQL

  • configure replication

  • manage failover

  • take backups

Kubernetes Operator

A MySQL Operator:

  • watches MySQLCluster CRD

  • configures replication

  • performs rolling updates

  • automates failover

  • manages backups

It operates the application → therefore, Operator.


Simple analogy

Think of a Kubernetes Operator as a robot version of a human operator.


Operator Pattern vs Controller Pattern (Deep but Simple Explanation)

Operators and controllers are related, but not the same. Every Operator includes a controller, but not every controller is an Operator.

The following sections provide a detailed comparison.


1. What is the Controller Pattern?

A controller is a Kubernetes control-loop that:

  • Watches a Kubernetes resource

  • Compares desired state vs actual state

  • Reconciles until they match

This pattern is built into Kubernetes itself.

Example

The Deployment controller:

  • sees .spec.replicas = 3

  • ensures 3 Pods always exist

Controllers manage Kubernetes-native resources, like:

  • Pods

  • ReplicaSets

  • Deployments

  • Nodes

  • Services


2. What is the Operator Pattern?

The Operator Pattern extends the controller pattern.

It adds:

  1. CRDs (Custom Resource Definitions)

  2. Custom controllers

  3. Domain-specific operational logic

Kubernetes only knows basic stuff (deploy, scale). The operator pattern teaches Kubernetes complex application logic.

Example

A PostgreSQL Operator can:

  • Initialize a primary/replica cluster

  • Perform rolling upgrades

  • Handle automatic failover

  • Create periodic backups

  • Integrate with S3

This is far beyond what native controllers can do.


3. Why Operator Pattern Exists (The Real Reason)

Before Operators, only human operators could:

  • Deploy complex apps

  • Fix cluster-level failures

  • Run backups

  • Manage clusters (DBs, queues, caches)

Developers wanted:

“A Kubernetes-native way to automate what human operators do.”

This is why the pattern is called Operator Pattern.

It automates operational knowledge.


4. Side-by-Side Comparison

Feature
Controller Pattern
Operator Pattern

Defines new resource type

No

Yes (via CRD)

Built into Kubernetes

Yes

No (custom)

Complexity

Basic

Advanced

Automates lifecycle

Only basic (replicas, scheduling)

Full lifecycle (install, backup, upgrade)

Designed for

Kubernetes primitives

Complex applications

Who uses

Kubernetes itself

Platform engineers / vendors


5. Visual Architecture

Controllers operate Kubernetes. Operators operate applications.


6. How to Write an Operator (Simple Steps)

Using Operator SDK (Go-based):

Step 1 — Create a CRD

Example:

Step 2 — Write a controller (in Go/Python)

Your code:

  • Watches MySQLCluster resources

  • Creates StatefulSets, PVCs, Services

  • Ensures the DB cluster is healthy

  • Handles updates

  • Takes backups automatically

Step 3 — Build & deploy into K8s


7. Operator vs Helm vs GitOps (important differentiation)

Tool
Purpose

Helm

Installs apps using templates (static)

GitOps

Manages desired state from Git

Operator

Automates the application lifecycle (dynamic, intelligent)

Operators are active. Helm/GitOps are passive.

Example: Helm installs MongoDB, but it cannot:

  • heal replica set

  • trigger failover

  • ensure data replication

  • rotate certs

MongoDB Operator can.


Summary – Easy to Remember

Controller Pattern

  • Control loop → makes real state = desired state

  • Works on native resources

  • Kubernetes uses it internally

Operator Pattern

  • Extends Kubernetes API

  • Automates application lifecycle

  • CRD + Custom Controller

  • Encodes human operator knowledge

  • Designed for complex apps like DBs, queues, storage


1. How the Reconciliation Loop Actually Works (Internals)

The reconciliation loop is the brain behind every controller and operator.

Reconciliation = Desired State → Actual State

A controller continuously compares:

And takes action to bring them together.


Step-by-Step: Internal Mechanics

Step 1 — Watch

The operator registers informers to watch specific resources:

  • Your CRD (MySQLCluster)

  • Resources it owns (Pods, PVCs, Services)

Whenever something changes, a reconcile event is triggered.


Step 2 — Fetch Current State

Inside Reconcile() you fetch:

  • the CRD object

  • the actual StatefulSets, Pods, PVCs, Secrets, etc.

  • their status


Step 3 — Compare Desired vs Actual

Example:

spec.replicas = 3 but actual pods = 2 → mismatch


Step 4 — Take Action

Operator creates/patches/deletes resources.

Examples:

  • Create missing pods

  • Replace failed primary DB node

  • Restart a pod for upgrade

  • Create backup job

  • Create/rotate secrets


Step 5 — Requeue

Operator may requeue reconciliation:

So it checks again after X seconds.


Key Idea:

Reconciliation is idempotent

Running it 100 times must always produce the same result.


2. How Operators Handle Upgrades & Backups

Operators encode domain knowledge.

Below is how real-world operators handle complex tasks.


Upgrades (Rolling Upgrade Logic)

Example: Upgrading a PostgreSQL cluster from 13 → 14

Operator performs:

  1. Mark cluster as Upgrading

  2. Validate version compatibility

  3. Drain traffic from replica

  4. Upgrade replica node 1

  5. Wait for it to become healthy

  6. Upgrade replica node 2

  7. Wait again

  8. Promote replica to primary

  9. Upgrade old primary last

  10. Update CRD .status.version

All without downtime (if HA setup exists).


Backups

Most operators follow this pattern:

Backup Trigger

  • automatically based on CRD schedule

  • or manual backups using a CRD resource like:

Backup Process

Operator:

  • Creates a Kubernetes Job

  • Mounts PVC or connects to DB

  • Executes backup commands

  • Uploads backup to S3, GCS, Minio, NFS, etc.

  • Updates Backup.status

Restore Process

Operator:

  • Stops cluster

  • Restores PVC from stored backup

  • Recreates StatefulSets

  • Rebuilds replica topology


3. How to Write an Operator (Step-by-Step Using Operator SDK)

Prerequisites:

  • Go 1.22+

  • Docker/Podman

  • Kubernetes cluster

  • Operator SDK installed


Step 1 — Create Operator Project


Step 2 — Create API (CRD) + Controller

This generates:


Step 3 — Define the CRD Schema

api/v1/mysqlcluster_types.go:


Step 4 — Implement Reconcile Logic

controllers/mysqlcluster_controller.go:

This is simplified but captures the idea:

  • fetch CR

  • fetch resources

  • create/patch/delete

  • update status


Step 5 — Build & Deploy Operator

This installs:

  • CRD

  • RBAC roles

  • Deployment for your operator


Step 6 — Apply CRD example

Apply:

The operator will now build a MySQL cluster.


4. Top 20 Production-Grade Operators in the World Today

The most widely adopted, enterprise-grade operators:

Database Operators

  1. MongoDB Community Operator

  2. Percona Operators (MySQL, PostgreSQL, MongoDB)

  3. CrunchyData PostgreSQL Operator

  4. Vitess Operator

  5. MariaDB Operator

  6. CockroachDB Operator

  7. Redis Operator

  8. Cassandra Operator


Observability & Logging Operators

  1. Prometheus Operator

  2. Loki Operator

  3. Grafana Operator

  4. Fluent Operator (Fluent Bit/Fluentd)


Security Operators

  1. Cert-Manager Operator

  2. Vault Operator

  3. Kyverno Policy Engine

  4. OPA Gatekeeper Operator


Messaging & Streaming

  1. Kafka Strimzi Operator

  2. RabbitMQ Cluster Operator

  3. NATS Operator


Infrastructure

  1. Rook-Ceph Operator (Storage)

Extra commonly used:

  • ArgoCD Operator

  • Istio Operator

  • ETCD Operator

  • MinIO Tenant Operator

  • ElasticSearch Operator


Final Summary

Operator Pattern = CRD + Controller + Operational Knowledge

It enables:

  • backups

  • upgrades

  • autoscaling

  • failover

  • lifecycle management

Everything a human operator used to do.

References:

Last updated