Rook and Ceph upgrades are designed to ensure data remains available even while the upgrade is proceeding. Rook will perform the upgrades in a rolling fashion such that application pods are not disrupted. To ensure the upgrades are seamless, it is important to begin the upgrades with Ceph in a fully healthy state. Let's first review some ways that you can verify the health of your cluster.
If you run into any issues during the upgrade, see the troubleshooting documentation:
Pods all Running¶
In a healthy Rook cluster, all pods in the Rook namespace should be in the
Completed) state and have few, if any, pod restarts.
The Rook toolbox contains the Ceph tools that can give you status details of the cluster with the
ceph status command. Let's look at an output sample and review some of the details:
The output should look similar to the following:
In the output above, note the following indications that the cluster is in a healthy state:
- Cluster health: The overall cluster status is
HEALTH_OKand there are no warning or error status messages displayed.
- Monitors (mon): All of the monitors are included in the
- Manager (mgr): The Ceph manager is in the
- OSDs (osd): All OSDs are
- Placement groups (pgs): All PGs are in the
- (If applicable) Ceph filesystem metadata server (mds): all MDSes are
activefor all filesystems
- (If applicable) Ceph object store RADOS gateways (rgw): all daemons are
ceph status output has deviations from the general good health described above, there may be an issue that needs to be investigated further. There are other commands you may run for more details on the health of the system, such as
ceph osd status. See the Ceph troubleshooting docs for help.
Upgrading an unhealthy cluster¶
Rook will prevent the upgrade of the Ceph daemons if the health is in a
HEALTH_ERR state. If you desired to proceed with the upgrade anyway, you will need to set either
skipUpgradeChecks: true or
continueUpgradeAfterChecksEvenIfNotHealthy: true as described in the cluster CR settings.
The container version running in a specific pod in the Rook cluster can be verified in its pod spec output. For example, for the monitor pod
mon-b we can verify the container version it is running with the below commands:
The status and container versions for all Rook pods can be collected all at once with the following commands:
rook-version label exists on Ceph resources. For various resource controllers, a summary of the resource controllers can be gained with the commands below. These will report the requested, updated, and currently available replicas for various Rook resources in addition to the version of Rook for resources managed by Rook. Note that the operator and toolbox deployments do not have a
rook-version label set.
Rook Volume Health¶
Any pod that is using a Rook volume should also remain healthy:
- The pod should be in the
Runningstate with few, if any, restarts
- There should be no errors in its logs
- The pod should still be able to read and write to the attached Rook volume.