The Power Of Many: ReplicaSets

The Power Of Many: ReplicaSets

The ReplicaSet is a very useful basic building block in Kubernetes that other objects, like the Deployment object, rely on. As a kind of Pod manager running in your cluster, a ReplicaSet makes sure the desired number and type of a certain Pod is always up and running. Its functionality is based on the notion of desired vs. observed state, so it also provides a fantastic opportunity to talk about the basics of reconciliation loop awesomeness.

In case you have taken a look at some of the manifests files used in scope of the previous blog posts (such as this one, for example), you’ll no doubt have noticed the object employed to run the sample workload is the Deployment object. The way it’s set up – having a Pod template baked into it – may seem to imply the Deployment manages these Pods directly, but that’s not the case – in fact, the Deployment manages and configures a ReplicaSet, and it is the ReplicaSet that manages the Pods. As it turns out, in Kubernetes, the ReplicaSet is a basic building block for running and managing workloads that other, higher-level objects – such as the Deployment object – rely upon. In order to lay the foundation for covering the latter in future content, the following sections will introduce you to the ins and outs of the ReplicaSet object – the problem it solves, how it works, its specification, and how to interact with it.

Limits Of Pod Manifests

Imagine you had a microservice doing some kind of search for images (not an OCI container image but, you know, an image to look at) and you wanted to run five instances of it. We’ve learned previously that Pods are the basic unit for running workloads in Kubernetes, so you might be tempted to write a PodSpec like the following…

apiVersion: v1
kind: Pod
  name: image-search-1
  - name: awesome-image-search-service
    # Pretend this were an image ref to your awesome image search service
    image: antsinmyey3sjohnson/hello-container-service:1.0
    - name: http
      containerPort: 8081
        cpu: 200m
        memory: 128Mi

… and submit it to the API server, which would give you one instance of your workload. You could then add four more specs just like the one above, modify the Pod name so it becomes unique in the namespace, and then submit those four, too. Voilà, five instances of your microservice!

  • Table of Contents

Of course, this way of creating the desired number of instances – or replicas – has its drawbacks:

  • It’s tedious. Imagine you had to create 20 replicas for 10 microservices – manually creating (and updating!) those PodSpecs would get old pretty quickly even with a powerful text editor.
  • It’s error-prone as copy-paste issues would quickly make their appearance.
  • The Pods created in this fashion won’t get automatically restarted.

This last point is the most noteworthy. Pods are one-off singletons, and Kubernetes treats them as ephemeral and disposable units (unless governed by a StatefulSet, which we won’t look at in this blog post). When you created those Pods manually by submitting a PodSpec to the API server, there wouldn’t be a kind of manager making sure the desired number of replicas is always running – in fact, there wouldn’t even be a way to tell Kubernetes about what desired number means in that case to begin with. Essentially, that manager part would be yourself – constantly monitoring your herd of Pods and making sure the desired number is always up and running.

Wouldn’t it be just great to have a cluster-internal entity lifting that burden from your shoulders?

The Pod Manager You Can Trust

This is where the ReplicaSet comes in – it continuously monitors your set of Pods and makes sure the number of Pod replicas running always matches the desired number of replicas. Thus, a ReplicaSet could be seen as a kind of “Pod manager” watching the entire cluster for your Pods, spawning more or deleting some of them whenever the desired number of replicas has changed in one or the other direction. For example, if a Pod under management by a ReplicaSet crashes or gets teared down, the ReplicaSet will submit a request to create a new one to the Kubernetes API server in order to re-align desired and current state. This is the reason why a ReplicaSet should be used even if you want to run only a single Pod.

Desired And Observed State

If you were given the task of making sure a certain type of Pod is always running a certain number of times and adjust that number if need be, you’d do so by simply comparing the number of Pod replicas currently running with whatever number you were told is the desired number. ReplicaSets work in exactly this way – they observe the current state and compare it with desired state, which – in the case of ReplicaSets – is the type of Pod plus the number of replicas running.

The notion of desired vs. observed or current state is expressed in Kubernetes in the form of so-called reconciliation loops. Because those loops and the concept they embody are so central to the self-healing capabilities of Kubernetes, they will be explored in depth in their own blog post, but here’s the basic idea: A Controller implements a reconciliation loop that constantly compares the desired state (which is the state provided to the Kubernetes API by the user or a machine-based actor) to the current state (which is the state currently observed in the cluster). There are many different Controllers that fulfil different responsibilities (and users can even write their own), so what precisely said state comprises can vary significantly between Controllers. What they have in common is that they are constantly running and trying to align the current state with the desired state, if possible.

How A ReplicaSet Finds Its Pods

A previous blog post introduced labels and label selectors as an important concept in Kubernetes enabling its objects to form meaningful relationships while maintaining loose coupling. A use case for labels and label selectors we’ve already explored is how a Service finds the Pods it should forward traffic to, and you’ve probably guessed that the relationship between ReplicaSets and the Pods they manage is expressed by means of labels, too. Beyond that, with the introduction of the metadata.ownerReferences property, the notion of ownership was added, thus somewhat increasing the coupling between owner and owned object (here: ReplicaSet and Pods, respectively), though on the other hand, one might argue the ownership reference is merely a technicality since the way the ownership is established is still based on the same label mechanism and thus does not increase coupling.

Coupling discussions aside, let’s take a look at a small example. As always, there’s a little manifests file waiting for you over on my GitHub. We’ll take a more detailed look at its contents a bit further down the line, for now, it’s sufficient to know the file defines a Namespace and a simple ReplicaSet. Let’s submit the file to the API server and take a look at the results:

# As always...
alias k=kubectl

# Apply manifests file
$ k apply -f

# View created objects
$ k -n replicaset-example get all
NAME                         READY   STATUS    RESTARTS   AGE
pod/hello-replicaset-8wp8k   1/1     Running   0          3s
pod/hello-replicaset-ptw7c   1/1     Running   0          3s
pod/hello-replicaset-p9ljr   1/1     Running   0          3s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-replicaset   3         3         3       3s

The DESIRED vs. CURRENT columns the output displays for the ReplicaSet are a result of the ReplicaSet Controller’s workings, which has aligned the current state with the desired state. With the three Pods in place, we can take a look at the ReplicaSet’s label selector and use it to find its Pods:

# Query label selector
$ k -n replicaset-example get rs hello-replicaset -o,LABELS:.spec.selector.matchLabels
NAME               LABELS
hello-replicaset   map[app:hello-replicaset]

# Use label selector to retrieve Pods under management by ReplicaSet
$ k -n replicaset-example get po --selector="app=hello-replicaset"
NAME                     READY   STATUS    RESTARTS   AGE
hello-replicaset-8wp8k   1/1     Running   0          86s
hello-replicaset-ptw7c   1/1     Running   0          86s
hello-replicaset-p9ljr   1/1     Running   0          86s

ReplicaSets find their Pods in a very similar way – they query the API server for a list of all Pods in the current namespace, remove inactive Pods from the returned list, and then filter for Pods whose labels match their label selector. In addition to that, the ReplicaSet will try to claim all Pods thus identified, meaning the ReplicaSet will acquire all Pods matching its label selector that don’t have a Controller-type owner reference in their list of metadata.ownerReferences list yet by inserting themselves into this list. Although the ownership reference is established through the label mechanism, it is ultimately the ownership link the ReplicaSet uses to monitor the acquired set of Pods, rather than the label-based querying itself. Beyond this, the Controller-type owner reference is important in the context of Garbage Collection and for preventing different Controller implementations from fighting over the same or an overlapping set of Pods.

The ReplicaSet Spec

As we’ve previously uncovered, the preferred way to create and modify objects in Kubernetes is by submitting a plain-text specification of their desired state to the API server, and the ReplicaSet is no exception to this rule. Like all specifications, a ReplicaSet specification, too, requires the apiVersion, kind, and properties, as well as a spec section. Within the latter, the spec.selector and spec.template properties are mandatory. Thus, a minimal ReplicaSet spec could look like the following (this manifest is not entirely equal to the one contained in the file you’ve previously applied – in addition to what you see below, the latter defines spec.replicas as well as resource limits for the container, but those properties are not mandatory, so they do not appear in the “minimal” ReplicaSet shown below):

apiVersion: apps/v1
kind: ReplicaSet
  name: hello-replicaset
  namespace: replicaset-example
      app: hello-replicaset
        app: hello-replicaset
      - name: hello-service
        image: antsinmyey3sjohnson/hello-container-service@sha256:e9de17b4fbfc6a1f52d07609e285dab3818412016200f189c68e441bf23effb3
        - name: http
          containerPort: 8081

The two mandatory properties in the spec section perform the following tasks:

  • spec.selector: Means for the ReplicaSet to find and acquire its Pods for monitoring and managing them.
  • spec.template: One dimension of what the ReplicaSet Controller understands as state. If the observed number of replicas is lower than the desired number of replicas, new Pods will be created based on this template, which therefore defines the type of Pod this ReplicaSet manages.

You might be wondering why spec.replicas is not among the mandatory properties – after all, it defines the second and remaining dimension of state as understood by the ReplicaSet Controller. Of course, a ReplicaSet needs a well-defined number of replicas (even if that number is zero), so if omitted, the value will default to one.

Working With ReplicaSets

In the following sections, we’re going to interact with the previously created ReplicaSet in order to explore its characteristics and capabilities (in case you haven’t created the ReplicaSet yet: k apply -f


The describe command provides us with a lot of useful details about a ReplicaSet:

# Get list of ReplicaSets
$ k -n replicaset-example get rs
hello-replicaset   3         3         3       43s

# Retrieve ReplicaSet details
$ k -n replicaset-example describe rs hello-replicaset
Name:         hello-replicaset
Namespace:    replicaset-example
Selector:     app=hello-replicaset
Labels:       <none>
Annotations:  <none>
Replicas:     3 current / 3 desired
Pods Status:  3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  # ... omitted for brevity
  Type     Reason            Age   From                   Message
  ----     ------            ----  ----                   -------
  Normal   SuccessfulCreate  62s   replicaset-controller  Created pod: hello-replicaset-8lm9n
  Normal   SuccessfulCreate  62s   replicaset-controller  Created pod: hello-replicaset-nsbwp
  Normal   SuccessfulCreate  62s   replicaset-controller  Created pod: hello-replicaset-mzcdp

Among others, this output informs about the state of all replicas managed by this ReplicaSet, as well as the label selector it uses to acquire its Pods. What’s also interesting is the list of events, describing the actions of the ReplicaSet. Above, upon fresh creation of the ReplicaSet, it has created three replicas, and so there are three Pod creation events. If you delete a Pod acquired by this ReplicaSet, it will notice it’s down one replica, and spawn a new one, in which case the list of events will contain another Pod creation event:

# Delete Pod
$ k -n replicaset-example delete pod hello-replicaset-nsbwp
pod "hello-replicaset-nsbwp" deleted

# Query ReplicaSet description again
$ k -n replicaset-example describe rs hello-replicaset
# ... other output omitted
  Type     Reason            Age    From                   Message
  ----     ------            ----   ----                   -------
  Normal   SuccessfulCreate  5m51s  replicaset-controller  Created pod: hello-replicaset-8lm9n
  Normal   SuccessfulCreate  5m51s  replicaset-controller  Created pod: hello-replicaset-nsbwp
  Normal   SuccessfulCreate  5m51s  replicaset-controller  Created pod: hello-replicaset-mzcdp
  Normal   SuccessfulCreate  83s    replicaset-controller  Created pod: hello-replicaset-vw9p8

Similarly, all errors the ReplicaSet faces will be listed here.


The central capability of a ReplicaSet – enabled by its reconciliation loop, which understands the number of desired and currently available replicas as one dimension of its state – is that it can spawn new replicas or delete existing ones in case the number of available replicas deviates from the number of desired replicas. Thus, to scale a ReplicaSet up or down, the only thing we have to do is adjust the number of desired replicas, and there are two ways for achieving this: imperative scaling and declaratively adjusting the ReplicaSet spec.

Imperative Scaling

The kubectl client offers the scale function for all objects that know the notion of number of replicas (Deployment, ReplicaSet, ReplicationController, and StatefulSet). In our case, we want to scale a ReplicaSet:

# Scale ReplicaSet from 3 to 5 replicas
$ k -n replicaset-example scale rs hello-replicaset --replicas=5
replicaset.apps/hello-replicaset scaled

# View list of Pods
$ k -n replicaset-example get po --selector="app=hello-replicaset"
NAME                     READY   STATUS    RESTARTS   AGE
hello-replicaset-mzcdp   1/1     Running   0          17m
hello-replicaset-8lm9n   1/1     Running   0          17m
hello-replicaset-vw9p8   1/1     Running   0          13m
hello-replicaset-6tfq8   1/1     Running   0          11s
hello-replicaset-6g476   1/1     Running   0          11s

As you can see, the ReplicaSet has spawned two new replicas based on the Pod template defined by the spec.template property in its specification (and, in accordance to this, the event list the kubectl describe command yields for this ReplicaSet contains two new successful Pod creation events).

Scaling a ReplicaSet in this way is quick and very straightforward, but its imperative nature can be problematic. For example, let’s say you had two Kubernetes clusters and that one is supposed to mirror the state of the other such that the two Kubernetes clusters are exact copies of one another (one of my current clients does this with four clusters), and let’s further assume a GitOps approach is employed such that declarative descriptions of desired cluster state are checked into version control and a continuous delivery tool like Argo CD makes sure the cluster state always mirrors the declarative state description. If your ReplicaSet on cluster A is hit with a sudden increase in load and you use the imperative kubectl scale command to quickly spawn more Pods, the state on cluster A not only deviates from the state in cluster B, but also from the state described in version control. So, if the load persists and the next synchronization takes place, your ReplicaSet will be scaled down again. The much safer bet is therefore to perform scaling declaratively.

Declarative Scaling

Declarative scaling simply means to say that the number of replicas is provided in the ReplicaSet specification, which is the declarative, plain-text description of the ReplicaSet’s desired state. The previously introduced manifests file defines three replicas, so after having imperatively scaled the ReplicaSet to five in scope of the section above, it will be scaled back down to said three if you apply the file again:

# Apply manifests file
$ k apply -f
namespace/replicaset-example unchanged
replicaset.apps/hello-replicaset configured

# Retrieve Pod list
$ k -n replicaset-example get po --selector="app=hello-replicaset"
NAME                     READY   STATUS    RESTARTS   AGE
hello-replicaset-mzcdp   1/1     Running   0          57m
hello-replicaset-8lm9n   1/1     Running   0          57m
hello-replicaset-vw9p8   1/1     Running   0          53m

This means if you wanted to scale the ReplicaSet by changing its declarative state description, you’d simply have to adjust the number of replicas in the spec.replicas property:

# Download manifests file
$ curl > replicaset.yaml

# Increase number of replicas to 5
$ sed 's/replicas: 3/replicas: 5/' replicaset.yaml > adjusted-replicaset.yaml

# Apply adjusted manifests file
$ k apply -f adjusted-replicaset.yaml
namespace/replicaset-example unchanged
replicaset.apps/hello-replicaset configured

# Check number of Pods
$ k -n replicaset-example get po --selector="app=hello-replicaset"
NAME                     READY   STATUS    RESTARTS   AGE
hello-replicaset-mzcdp   1/1     Running   0          71m
hello-replicaset-8lm9n   1/1     Running   0          71m
hello-replicaset-vw9p8   1/1     Running   0          67m
hello-replicaset-tmdkb   1/1     Running   0          21s
hello-replicaset-fqjxj   1/1     Running   0          21s

Coming back to our previous example where two Kubernetes clusters needed to be kept in sync: If you modified the declarative description of the desired number of replicas, updated it in version control, and let Argo CD synchronize the state, there wouldn’t be any room for errors such as updating one cluster and forgetting about the other, or an imperatively created state deviation getting overwritten by an Argo CD synchronization. Updating the declarative state description is therefore a lot more elegant and should be preferred over making changes imperatively.

This is a good opportunity to answer a question you might have had in the context of scaling: Which replicas get deleted if I scale down a ReplicaSet? The general response to this is that the ReplicaSet’s reconciliation loop is based on the assumption that the Pods it manages are literally replicas, i.e. one Pod out of this set is entirely the same as any other from this set with regards to its state (which, in turn, is the reason why a ReplicaSet works great for stateless or nearly stateless applications, but is not suited to running stateful applications). Thus, a ReplicaSet can delete any of those Pods when the number of currently running replicas is higher than the number of desired replicas. However, the implementation of the ReplicaSet Controller suggests that younger replicas are preferred for deletion, i.e. younger replicas will be deleted first (for Pod deletion, see this function in the Controller’s source code).

Some Label-Foo

The fact that ReplicaSets acquire their Pods based on their labels enables two interesting use cases: pod adoption and pod quarantining.

Pod Adoption

I’ve prepared another manifests file defining a Namespace, a ReplicaSet, and a Pod. In particular, the number of desired replicas is three. We’re going to look at this file in just a minute, but let’s apply it to our cluster first and see what happens:

# Apply manifests file
$ k apply -f
namespace/pod-adoption-example created
replicaset.apps/adopting-replicaset created
pod/adopted-pod created

# View list of Pods
$ k -n pod-adoption-example get po 
NAME                        READY   STATUS        RESTARTS   AGE
adopting-replicaset-qcwws   1/1     Running       0          6s
adopting-replicaset-swjtt   1/1     Running       0          6s
adopting-replicaset-x6zwz   1/1     Terminating   0          6s
adopted-pod                 1/1     Running       0          6s

That’s odd! The manifests file defines a Pod and a ReplicaSet having three replicas, so four Pods should be running at all times – right? Then why is one Terminating immediately after the manifests file has been applied? Let’s check the Pod list again:

$ k -n pod-adoption-example get po
NAME                        READY   STATUS    RESTARTS   AGE
adopting-replicaset-qcwws   1/1     Running   0          2m7s
adopting-replicaset-swjtt   1/1     Running   0          2m7s
adopted-pod                 1/1     Running   0          2m7s

As you’d expect, the Terminating Pod has, well, terminated, and there are now only three Pods running. Why is this the case? Let’s take a look at the ReplicaSet’s label selector:

$ k -n pod-adoption-example get rs adopting-replicaset -o,SELECTOR:.spec.selector.matchLabels
NAME                  SELECTOR
adopting-replicaset   map[app:hello-service]

Next, which labels are present on adopted-pod?

$ k -n pod-adoption-example get po adopted-pod --show-labels
adopted-pod   1/1     Running   0          4m10s app=hello-service

This explains what happened: The ReplicaSet first spawns three new Pods, then runs another filter on the Pod list within the current namespace, and notices there are four Pods matching its label selector. It then acquires the Pod thus far not under its management, which is possible because the Pod explicitly created by means of the PodSpec in the manifests file doesn’t carry any Controller-type owner references in its metadata.ownerReferences list (after being freshly created, it doesn’t carry any references in there, for that matter), so no other Controller blocks taking ownership.

We can verify this by taking a look at said list:

$ k -n pod-adoption-example get po adopted-pod -o yaml
apiVersion: v1
kind: Pod
  # ...
    app: hello-service
  name: adopted-pod
  namespace: pod-adoption-example
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: adopting-replicaset
    uid: b5a6dc5d-469f-4617-9fde-3505e1bc7c35
  resourceVersion: "78648"
  uid: bd693099-0bdc-401c-8182-1a0909597b5c
  # ...

Since the ReplicaSet is now one replica over the desired number, it terminates one of the existing Pods in order to get down to three replicas, which gives us the final Pod list.

A word of caution: The behavior demonstrated above implies a certain necessity for being careful with labels on manually created Pods (or, more generally: all Pods not carrying a Controller-type owner reference) as existing ReplicaSet Controllers (or indeed all Controllers exhibiting the filter-then-acquire kind of behavior) will adopt such Pods if at least one label matches their selector.

Pod Quarantining

What we’ve done above also works the other way around: We can disassociate a Pod from its owning ReplicaSet by removing the labels matching the ReplicaSet’s selector.

Let’s disassociate the adopted-pod from the ReplicaSet:

# Perform in-place edit of Pod labels ('hello-service' -> 'quarantined-hello-service')
$ k -n pod-adoption-example edit po adopted-pod
# ...
    app: quarantined-hello-service
# ...
pod/adopted-pod edited

# Retrieve Pod list
$ k -n pod-adoption-example get po --show-labels
NAME                        READY   STATUS    RESTARTS   AGE   LABELS
adopting-replicaset-qcwws   1/1     Running   0          25m   app=hello-service
adopting-replicaset-swjtt   1/1     Running   0          25m   app=hello-service
adopted-pod                 1/1     Running   0          25m   app=quarantined-hello-service
adopting-replicaset-trzfc   1/1     Running   0          33s   app=hello-service

The ReplicaSet noticed there are only two Pods left matching its label selector, and so it spawns a new one to re-align the observed with the desired state. Meanwhile, the now disassociated Pod simply keeps running and is now not under management by the ReplicaSet anymore, which we can verify by taking a look at its metadata.ownerReferences list:

$ k -n pod-adoption-example get po adopted-pod -o custom-columns=OWNERS:.metadata.ownerReferences

This can be useful, for example, when a Pod misbehaves – disassociating it from its ReplicaSet will cause the latter to spawn a new, healthy one, but the misbehaving Pod will still be available for live investigation and troubleshooting.

Cleaning Up

You can delete all resources created and used in scope of the previous sections by running the following two commands:

$ k delete -f
namespace "replicaset-example" deleted
replicaset.apps "hello-replicaset" deleted

$ k delete -f
namespace "pod-adoption-example" deleted
replicaset.apps "adopting-replicaset" deleted
pod "adopted-pod" deleted


A ReplicaSet is responsible for managing a set of Pods. It does so by using a reconciliation loop that constantly compares the desired and the observed state of the Pods in question.

Reconciliation loops are a very powerful concept in Kubernetes crucial to the system’s self-healing capabilities, and what each so-called Controller implementing a reconciliation loop understands as state can vary significantly depending on its responsibilities. The ReplicaSet Controller understands the number of replicas and the type of Pod as state, and whenever the currently running number of replicas deviates from the desired number of replicas, the Controller will spawn new ones of the desired type, or delete existing ones. Both dimensions of state are provided in the ReplicaSet’s manifest using the spec.replicas and spec.template properties, respectively.

A ReplicaSet finds the set of Pods it should be managing by means of labels. Additionally, whenever a ReplicaSet finds new Pods matching its label selector, it will try to acquire them and insert itself into the metadata.ownerReferences list of all Pods in question, thus establishing an ownership link that is used to monitor all Pods in the cluster. Acquiring a Pod in this way will only work if its metadata.ownerReferences list does not contain a Controller-type item yet, so in addition to Pod monitoring, the ownership link prevents different Controllers from fighting over the same Pod.

ReplicaSets can be scaled out either by using the kubectl scale command or by updating the ReplicaSet’s declarative state description. Because kubectl scale is imperative, using it for scaling operations establishes the ground for errors such as modifying the number of desired replicas of the ReplicaSet on one Kubernetes cluster and forgetting to make the same modification on another Kubernetes cluster in an environment where both clusters should be exact copies of each other. This and the advantages of declarative state descriptions mean that rather than scaling a ReplicaSet imperatively, they should always be scaled declaratively by updating its spec.replicas property.

The Deployment object is a higher-level abstraction frequently employed in Kubernetes that builds on ReplicaSets. In the next blog post, we’ll take a look at how Deployments leverage ReplicaSets and how they complement their functionality.