Deploy at Scale: Kubernetes Orchestration Deep Dive

Kubernetes is not complicated. It is an API server that reconciles desired state with actual state, driven by a library of controllers. Everything else — admission webhooks, custom resources, operator patterns — is a consequence of this architecture. Once that model is clear, every Kubernetes problem becomes tractable.

This is what most Kubernetes tutorials never teach you.

The Reconciliation Loop

Every Kubernetes controller implements the same pattern:

Watch the Kubernetes API for resources of a specific type
Compare the desired state (the spec) against the actual state (the status)
Act to make the actual state match the desired state
Loop indefinitely

The ReplicaSet controller watches for ReplicaSets. When it sees a RS with replicas: 3 but only 2 Pods running, it creates a new Pod. When it sees 4 Pods, it deletes one. It does nothing else.

This simple model is why Kubernetes is resilient: every controller is constantly healing. It is also why Kubernetes debugging is hard: state changes cascade across multiple controllers, and the causal chain is not obvious.

Resource Management: The Root of Most Production Problems

The single most common source of production Kubernetes incidents is incorrect resource configuration. Specifically: not setting resource limits, or setting them wrong.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "1000m"

Requests determine scheduling: the scheduler places Pods on nodes with sufficient available requests. Limits determine runtime behavior: a Pod exceeding its memory limit is OOMKilled; a Pod exceeding its CPU limit is throttled.

The dangerous mistake: Setting no limits. A runaway Pod can consume all node resources and starve other Pods. Even in development clusters, always set limits.

The subtle mistake: Setting requests == limits for CPU (the “Guaranteed” QoS class). This prevents CPU throttling but also makes your Pods difficult to pack efficiently. For CPU, a 4:1 ratio of limit to request is usually appropriate. For memory, requests == limits is safer — overcommitting memory leads to OOMKills, not throttling.

Pod Disruption Budgets: Protecting Availability

Kubernetes will evict Pods during node maintenance, upgrades, and autoscaler scale-down. Without a PodDisruptionBudget (PDB), your service can lose all replicas simultaneously during a rolling node upgrade.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

This guarantees that at least 2 Pods of the api Deployment are always available, even during voluntary disruptions. For a service with 3 replicas, this means only 1 can be evicted at a time.

Critical gotcha: PDBs only protect against voluntary disruptions (node drains, autoscaler). They do not prevent involuntary disruptions (node failures, OOMKills). Do not confuse availability guarantees.

Custom Controllers: Extending Kubernetes

The operator pattern extends Kubernetes with domain-specific automation. You define a Custom Resource Definition (CRD) that represents your domain concept (a database cluster, a certificate, a workflow), and a controller that reconciles it.

The controller SDK in Go (controller-runtime) provides the scaffolding:

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    db := &myv1.Database{}
    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Reconcile desired state
    if err := r.ensureStatefulSet(ctx, db); err != nil {
        return ctrl.Result{}, err
    }

    if err := r.ensureService(ctx, db); err != nil {
        return ctrl.Result{}, err
    }

    // Update status
    db.Status.Phase = "Running"
    return ctrl.Result{RequeueAfter: 30 * time.Second}, r.Status().Update(ctx, db)
}

The RequeueAfter is important: it ensures the controller periodically re-reconciles even if no events are generated. External state (a database that fails, a certificate that expires) will not emit Kubernetes events — you need proactive polling.

Horizontal Pod Autoscaling: Beyond CPU

The default HPA scales on CPU utilization. For most web services, CPU is a lagging indicator. By the time CPU is high, latency has already degraded and users are suffering.

Configure custom metrics for more responsive scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: External
      external:
        metric:
          name: pubsub.googleapis.com|subscription|num_undelivered_messages
          selector:
            matchLabels:
              resource.labels.subscription_id: api-queue
        target:
          type: AverageValue
          averageValue: 100

This scales your service based on queue depth — a leading indicator. When messages pile up, scale out before latency spikes.

Conclusion

Kubernetes rewards engineers who understand its reconciliation model. Once you internalize that the system is always trying to converge desired state to actual state, everything else follows naturally: why PDBs must be set proactively, why resource limits are not optional, why custom controllers are just loops. Build that mental model, and you can operate any Kubernetes workload with confidence.