Ceph

PLEASE NOTE: This document applies to v0.8 version and not to the latest stable release v1.9

Disaster Recovery

Restoring Mon Quorum

Under extenuating circumstances, the mons may lose quorum. If the mons cannot form quorum again, there is a manual procedure to get the quorum going again. The only requirement is that at least one mon is still healthy. The following steps will remove the unhealthy mons from quorum and allow you to form a quorum again with a single mon, then grow the quorum back to the original size.

For example, if you have three mons and lose quorum, you will need to remove the two bad mons from quorum, notify the good mon that it is the only mon in quorum, and then restart the good mon.

Stop the operator

First, stop the operator so it will not try to failover the mons while we are modifying the monmap

kubectl -n rook-ceph-system delete deployment rook-ceph-operator

Inject a new monmap

WARNING: Injecting a monmap must be done very carefully. If run incorrectly, your cluster could be permanently destroyed.

The Ceph monmap keeps track of the mon quorum. We will update the monmap to only contain the healthy mon. In this example, the healthy mon is rook-ceph-mon1, while the unhealthy mons are rook-ceph-mon0 and rook-ceph-mon2.

Connect to the pod of a healthy mon and run the following commands.

kubectl -n rook-ceph exec -it <mon-pod> bash

# set a few simple variables
cluster_namespace=rook
good_mon_id=rook-ceph-mon1
monmap_path=/tmp/monmap

# make sure the quorum lock file does not exist
rm -f /var/lib/rook/${good_mon_id}/data/store.db/LOCK

# extract the monmap to a file
ceph-mon -i ${good_mon_id} --extract-monmap ${monmap_path} \
  --cluster=${cluster_namespace} --mon-data=/var/lib/rook/${good_mon_id}/data \
  --conf=/var/lib/rook/${good_mon_id}/${cluster_namespace}.config \
  --keyring=/var/lib/rook/${good_mon_id}/keyring \
  --monmap=/var/lib/rook/${good_mon_id}/monmap

# review the contents of the monmap
monmaptool --print /tmp/monmap

# remove the bad mon(s) from the monmap
monmaptool ${monmap_path} --rm <bad_mon>

# in this example we remove mon0 and mon2:
monmaptool ${monmap_path} --rm rook-ceph-mon0
monmaptool ${monmap_path} --rm rook-ceph-mon2

# inject the monmap into the good mon
ceph-mon -i ${good_mon_id} --inject-monmap ${monmap_path} \
  --cluster=${cluster_namespace} --mon-data=/var/lib/rook/${good_mon_id}/data \
  --conf=/var/lib/rook/${good_mon_id}/${cluster_namespace}.config \
  --keyring=/var/lib/rook/${good_mon_id}/keyring

Exit the shell to continue.

Edit the rook configmap for mons

Edit the configmap that the operator uses to track the mons.

kubectl -n rook-ceph edit configmap rook-ceph-mon-endpoints

In the data element you will see three mons such as the following (or more depending on your moncount):

data: rook-ceph-mon0=10.100.35.200:6790;rook-ceph-mon1=10.100.35.233:6790;rook-ceph-mon2=10.100.35.12:6790

Delete the bad mons from the list, for example to end up with a single good mon:

data: rook-ceph-mon1=10.100.35.233:6790

Save the file and exit.

Restart the mon

You will need to restart the good mon pod to pick up the changes. Delete the good mon pod and kubernetes will automatically restart the mon.

kubectl -n rook-ceph delete pod -l mon=rook-ceph-mon1

Start the rook toolbox and verify the status of the cluster.

ceph -s

The status should show one mon in quorum. If the status looks good, your cluster should be healthy again.

Restart the operator

Start the rook operator again to resume monitoring the health of the cluster.

# create the operator. it is safe to ignore the errors that a number of resources already exist.
kubectl create -f operator.yaml

The operator will automatically add more mons to increase the quorum size again, depending on the monCount.