Ceph
PLEASE NOTE: This document applies to v0.8 version and not to the latest stable release v1.9
Using the Container Linux Update Operator with Rook
When you are using Container Linux (CoreOS) and have the update engine enabled, it could be that a node reboots quickly after another not leaving enough time for the Rook cluster to rebuild. The Container Linux Update Operator is the solution for this, you can block your nodes to reboot until the Ceph cluster is healthy.
Prequisites
- An operational Container Linux Kubernetes cluster (Successfully tested with 1.8.4)
- A working rook cluster
- The update-engine.service systemd unit on each machine should be unmasked, enabled and started in systemd
- The locksmithd.service systemd unit on each machine should be masked and stopped in systemd
Start the update operator
Proper reading of the README on the Container Linux Update Operator is necessary. Clone the repo and go in the examples directory.
Look for the file named update-operator.yaml and update the command part of the container from:
command:
- "/bin/update-operator"
to:
command:
- "/bin/update-operator"
- "--before-reboot-annotations"
- "ceph-before-reboot-check"
- "--after-reboot-annotations"
- "ceph-after-reboot-check"
You can also add the -v 6 argument for more extensive logging.
Now create the update-operator by invoking following commands:
kubectl create -f namespace.yaml
kubectl create -f cluster-role.yaml
kubectl create -f cluster-role-binding.yaml
kubectl create -f update-operator.yaml
kubectl create -f update-agent.yaml
These files create a new namespace reboot-coordinator, configured to listen for the node annotation ceph-reboot-check. Now you can create both files in the cluster/examples/coreos folder, here’s a short description of what each file does:
rbac.yaml: This file contains the necessary RBAC settings.ceph-after-reboot-script.yaml: This file creates aConfigMapcontaining a bash script which will be mounted in therook-toolboximage as executable file.ceph-before-reboot-script.yaml: This file creates aConfigMapcontaining a bash script which will be mounted in therook-toolboximage as executable file.before-reboot-daemonset.yaml: This file creates aDaemonSetwhich waits for a node being labeledbefore-reboot=true, runs and checks the Ceph status. If all is correct, it annotates the node withceph-before-reboot-check=true.after-reboot-daemonset.yaml: This file creates aDaemonSetwhich waits for a node being labeledafter-reboot=true, runs and unsets thenooutoption for the ceph OSDs. If all is correct, it annotates the node withceph-after-reboot-check=true.
The node annotation ceph-no-noout=true can be used to avoid ceph-before-reboot-check from setting the OSD noout flag. This annotation should only be used when deleting a node from a cluster, this way the cluster starts rebalancing immediately, not waiting for the node to come back up.
kubectl create -f rbac.yaml
kubectl create -f ceph-after-reboot-script.yaml
kubectl create -f ceph-before-reboot-script.yaml
kubectl create -f before-reboot-daemonset.yaml
kubectl create -f after-reboot-daemonset.yaml
Destroy the update operator
To destroy all elements created in this file, run:
kubectl delete -f before-reboot-daemonset.yaml
kubectl delete -f after-reboot-daemonset.yaml
kubectl delete -f ceph-after-reboot-script.yaml
kubectl delete -f ceph-before-reboot-script.yaml
kubectl delete -f rbac.yaml
Then you may safely delete the update operator itself:
From the directory of the Container Linux Update Operator you cloned earlier, go again into the examples folder and run following commands:
kubectl delete -f update-agent.yaml
kubectl delete -f update-operator.yaml
kubectl delete -f cluster-role-binding.yaml
kubectl delete -f cluster-role.yaml
kubectl delete -f namespace.yaml