Ceph

PLEASE NOTE: This document applies to v1.6 version and not to the latest stable release v1.9

CSI Common Issues

Issues when provisioning volumes with the Ceph CSI driver can happen for many reasons such as:

  • Network connectivity between CSI pods and ceph
  • Cluster health issues
  • Slow operations
  • Kubernetes issues
  • Ceph-CSI configuration or bugs

The following troubleshooting steps can help identify a number of issues.

Block (RBD)

If you are mounting block volumes (usually RWO), these are referred to as RBD volumes in Ceph. See the sections below for RBD if you are having block volume issues.

Shared Filesystem (CephFS)

If you are mounting shared filesystem volumes (usually RWX), these are referred to as CephFS volumes in Ceph. See the sections below for CephFS if you are having filesystem volume issues.

Network Connectivity

The Ceph monitors are the most critical component of the cluster to check first. Retrieve the mon endpoints from the services:

kubectl -n rook-ceph get svc -l app=rook-ceph-mon
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
rook-ceph-mon-a   ClusterIP   10.104.165.31   <none>        6789/TCP,3300/TCP   18h
rook-ceph-mon-b   ClusterIP   10.97.244.93    <none>        6789/TCP,3300/TCP   21s
rook-ceph-mon-c   ClusterIP   10.99.248.163   <none>        6789/TCP,3300/TCP   8s

If host networking is enabled in the CephCluster CR, you will instead need to find the node IPs for the hosts where the mons are running.

The clusterIP is the mon IP and 3300 is the port that will be used by Ceph-CSI to connect to the ceph cluster. These endpoints must be accessible by all clients in the cluster, including the CSI driver.

If you are seeing issues provisioning the PVC then you need to check the network connectivity from the provisioner pods.

  • For CephFS PVCs, check network connectivity from the csi-cephfsplugin container of the csi-cephfsplugin-provisioner pods
  • For Block PVCs, check network connectivity from the csi-rbdplugin container of the csi-rbdplugin-provisioner pods

For redundancy, there are two provisioner pods for each type. Make sure to test connectivity from all provisioner pods.

Connect to the provisioner pods and verify the connection to the mon endpoints such as the following:

# Connect to the csi-cephfsplugin container in the provisioner pod
kubectl -n rook-ceph exec -ti deploy/csi-cephfsplugin-provisioner -c csi-cephfsplugin -- bash

# Test the network connection to the mon endpoint
curl 10.104.165.31:3300 2>/dev/null
ceph v2

If you see the response “ceph v2”, the connection succeeded. If there is no response then there is a network issue connecting to the ceph cluster.

Check network connectivity for all monitor IP’s and ports which are passed to ceph-csi.

Ceph Health

Sometimes an unhealthy Ceph cluster can contribute to the issues in creating or mounting the PVC. Check that your Ceph cluster is healthy by connecting to the Toolbox and running the ceph commands:

ceph health detail
HEALTH_OK

Slow Operations

Even slow ops in the ceph cluster can contribute to the issues. In the toolbox, make sure that no slow ops are present and the ceph cluster is healthy

ceph -s
cluster:
 id:     ba41ac93-3b55-4f32-9e06-d3d8c6ff7334
 health: HEALTH_WARN
         30 slow ops, oldest one blocked for 10624 sec, mon.a has slow ops

If Ceph is not healthy, check the following health for more clues:

  • The Ceph monitor logs for errors
  • The OSD logs for errors
  • Disk Health
  • Network Health

Ceph Troubleshooting

Check if the RBD Pool exists

Make sure the pool you have specified in the storageclass.yaml exists in the ceph cluster.

Suppose the pool name mentioned in the storageclass.yaml is replicapool. It can be verified to exist in the toolbox:

ceph osd lspools
1 device_health_metrics
2 replicapool

If the pool is not in the list, create the CephBlockPool CR for the pool if you have not already. If you have already created the pool, check the Rook operator log for errors creating the pool.

Check if the Filesystem exists

For the shared filesystem (CephFS), check that the filesystem and pools you have specified in the storageclass.yaml exist in the Ceph cluster.

Suppose the fsName name mentioned in the storageclass.yaml is myfs. It can be verified in the toolbox:

ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-data0 ]

Now verify the pool mentioned in the storageclass.yaml exists, such as the example myfs-data0.

ceph osd lspools
1 device_health_metrics
2 replicapool
3 myfs-metadata0
4 myfs-data0

The pool for the filesystem will have the suffix -data0 compared the filesystem name that is created by the CephFilesystem CR.

subvolumegroups

If the subvolumegroup is not specified in the ceph-csi configmap (where you have passed the ceph monitor information), Ceph-CSI creates the default subvolumegroup with the name csi. Verify that the subvolumegroup exists:

ceph fs subvolumegroup ls myfs
[
   {
       "name": "csi"
   }
]

If you don’t see any issues with your Ceph cluster, the following sections will start debugging the issue from the CSI side.

Provisioning Volumes

At times the issue can also exist in the Ceph-CSI or the sidecar containers used in Ceph-CSI.

Ceph-CSI has included number of sidecar containers in the provisioner pods such as: csi-attacher, csi-resizer, csi-provisioner, csi-cephfsplugin, csi-snapshotter, and liveness-prometheus.

The CephFS provisioner core CSI driver container name is csi-cephfsplugin as one of the container names. For the RBD (Block) provisioner you will see csi-rbdplugin as the container name.

Here is a summary of the sidecar containers:

csi-provisioner

The external-provisioner is a sidecar container that dynamically provisions volumes by calling ControllerCreateVolume() and ControllerDeleteVolume() functions of CSI drivers. More details about external-provisioner can be found here.

If there is an issue with PVC Create or Delete, check the logs of the csi-provisioner sidecar container.

kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-provisioner

csi-resizer

The CSI external-resizer is a sidecar container that watches the Kubernetes API server for PersistentVolumeClaim updates and triggers ControllerExpandVolume operations against a CSI endpoint if the user requested more storage on the PersistentVolumeClaim object. More details about external-provisioner can be found here.

If any issue exists in PVC expansion you can check the logs of the csi-resizer sidecar container.

kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-resizer

csi-snapshotter

The CSI external-snapshotter sidecar only watches for VolumeSnapshotContent create/update/delete events. It will talk to ceph-csi containers to create or delete snapshots. More details about external-snapshotter can be found here.

In Kubernetes 1.17 the volume snapshot feature was promoted to beta. In Kubernetes 1.20, the feature gate is enabled by default on standard Kubernetes deployments and cannot be turned off.

Make sure you have installed the correct snapshotter CRD version. If you have not installed the snapshotter controller, see the Snapshots guide.

kubectl get crd | grep snapshot
volumesnapshotclasses.snapshot.storage.k8s.io    2021-01-25T11:19:38Z
volumesnapshotcontents.snapshot.storage.k8s.io   2021-01-25T11:19:39Z
volumesnapshots.snapshot.storage.k8s.io          2021-01-25T11:19:40Z

The above CRDs must have the matching version in your snapshotclass.yaml or snapshot.yaml. Otherwise, the VolumeSnapshot and VolumesnapshotContent will not be created.

The snapshot controller is responsible for creating both VolumeSnapshot and VolumesnapshotContent object. If the objects are not getting created, you may need to check the logs of the snapshot-controller container.

Rook only installs the snapshotter sidecar container, not the controller. It is recommended that Kubernetes distributors bundle and deploy the controller and CRDs as part of their Kubernetes cluster management process (independent of any CSI Driver).

If your Kubernetes distribution does not bundle the snapshot controller, you may manually install these components.

If any issue exists in the snapshot Create/Delete operation you can check the logs of the csi-snapshotter sidecar container.

kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-snapshotter

If you see an error such as:

GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID
0001-0009-rook-ceph-0000000000000001-8d0ba728-0e17-11eb-a680-ce6eecc894de already >exists.

The issue typically is in the Ceph cluster or network connectivity. If the issue is in Provisioning the PVC Restarting the Provisioner pods help(for CephFS issue restart csi-cephfsplugin-provisioner-xxxxxx CephFS Provisioner. For RBD, restart the csi-rbdplugin-provisioner-xxxxxx pod. If the issue is in mounting the PVC, restart the csi-rbdplugin-xxxxx pod (for RBD) and the csi-cephfsplugin-xxxxx pod for CephFS issue.

Mounting the volume to application pods

When a user requests to create the application pod with PVC, there is a three-step process

  • CSI driver registration
  • Create volume attachment object
  • Stage and publish the volume

csi-driver registration

csi-cephfsplugin-xxxx or csi-rbdplugin-xxxx is a daemonset pod running on all the nodes where your application gets scheduled. If the plugin pods are not running on the node where your application is scheduled might cause the issue, make sure plugin pods are always running.

Each plugin pod has two important containers: one is driver-registrar and csi-rbdplugin or csi-cephfsplugin. Sometimes there is also a liveness-prometheus container.

driver-registrar

The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet. More details can be found here.

If any issue exists in attaching the PVC to the application pod check logs from driver-registrar sidecar container in plugin pod where your application pod is scheduled.

kubectl -n rook-ceph logs deploy/csi-rbdplugin -c driver-registrar
I0120 12:28:34.231761  124018 main.go:112] Version: v2.0.1
I0120 12:28:34.233910  124018 connection.go:151] Connecting to unix:///csi/csi.sock
I0120 12:28:35.242469  124018 node_register.go:55] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0120 12:28:35.243364  124018 node_register.go:64] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0120 12:28:35.243673  124018 node_register.go:86] Skipping healthz server because port set to: 0
I0120 12:28:36.318482  124018 main.go:79] Received GetInfo call: &InfoRequest{}
I0120 12:28:37.455211  124018 main.go:89] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
E0121 05:19:28.658390  124018 connection.go:129] Lost connection to unix:///csi/csi.sock.
E0125 07:11:42.926133  124018 connection.go:129] Lost connection to unix:///csi/csi.sock.

You should see the response RegistrationStatus{PluginRegistered:true,Error:,} in the logs to confirm that plugin is registered with kubelet.

If you see a driver not found an error in the application pod describe output. Restarting the csi-xxxxplugin-xxx pod on the node may help.

Volume Attachment

Each provisioner pod also has a sidecar container called csi-attacher.

csi-attacher

The external-attacher is a sidecar container that attaches volumes to nodes by calling ControllerPublish and ControllerUnpublish functions of CSI drivers. It is necessary because the internal Attach/Detach controller running in Kubernetes controller-manager does not have any direct interfaces to CSI drivers. More details can be found here.

If any issue exists in attaching the PVC to the application pod first check the volumettachment object created and also log from csi-attacher sidecar container in provisioner pod.

kubectl get volumeattachment
NAME                                                                   ATTACHER                        PV                                         NODE       ATTACHED   AGE
csi-75903d8a902744853900d188f12137ea1cafb6c6f922ebc1c116fd58e950fc92   rook-ceph.cephfs.csi.ceph.com   pvc-5c547d2a-fdb8-4cb2-b7fe-e0f30b88d454   minikube   true       4m26s
kubectl logs po/csi-rbdplugin-provisioner-d857bfb5f-ddctl -c csi-attacher

CephFS Stale operations

Check for any stale mount commands on the csi-cephfsplugin-xxxx pod on the node where your application pod is scheduled.

You need to exec in the csi-cephfsplugin-xxxx pod and grep for stale mount operators.

Identify the csi-cephfsplugin-xxxx pod running on the node where your application is scheduled with kubectl get po -o wide and match the node names.

kubectl exec -it csi-cephfsplugin-tfk2g -c csi-cephfsplugin -- sh
ps -ef |grep mount

root          67      60  0 11:55 pts/0    00:00:00 grep mount
ps -ef |grep ceph

root           1       0  0 Jan20 ?        00:00:26 /usr/local/bin/cephcsi --nodeid=minikube --type=cephfs --endpoint=unix:///csi/csi.sock --v=0 --nodeserver=true --drivername=rook-ceph.cephfs.csi.ceph.com --pidlimit=-1 --metricsport=9091 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true
root          69      60  0 11:55 pts/0    00:00:00 grep ceph

If any commands are stuck check the dmesg logs from the node. Restarting the csi-cephfsplugin pod may also help sometimes.

If you don’t see any stuck messages, confirm the network connectivity, Ceph health, and slow ops.

RBD Stale operations

Check for any stale map/mkfs/mount commands on the csi-rbdplugin-xxxx pod on the node where your application pod is scheduled.

You need to exec in the csi-rbdplugin-xxxx pod and grep for stale operators like (rbd map, rbd unmap, mkfs, mount and umount).

Identify the csi-rbdplugin-xxxx pod running on the node where your application is scheduled with kubectl get po -o wide and match the node names.

kubectl exec -it csi-rbdplugin-vh8d5 -c csi-rbdplugin -- sh
ps -ef |grep map
root     1297024 1296907  0 12:00 pts/0    00:00:00 grep map
ps -ef |grep mount
root        1824       1  0 Jan19 ?        00:00:00 /usr/sbin/rpc.mountd
ceph     1041020 1040955  1 07:11 ?        00:03:43 ceph-mgr --fsid=ba41ac93-3b55-4f32-9e06-d3d8c6ff7334 --keyring=/etc/ceph/keyring-store/keyring --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug  --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:10.111.136.166:3300,v1:10.111.136.166:6789] --mon-initial-members=a --id=a --setuser=ceph --setgroup=ceph --client-mount-uid=0 --client-mount-gid=0 --foreground --public-addr=172.17.0.6
root     1297115 1296907  0 12:00 pts/0    00:00:00 grep mount
ps -ef |grep mkfs
root     1297291 1296907  0 12:00 pts/0    00:00:00 grep mkfs
ps -ef |grep umount
root     1298500 1296907  0 12:01 pts/0    00:00:00 grep umount
ps -ef |grep unmap
root     1298578 1296907  0 12:01 pts/0    00:00:00 grep unmap

If any commands are stuck check the dmesg logs from the node. Restarting the csi-rbdplugin pod also may help sometimes.

If you don’t see any stuck messages, confirm the network connectivity, Ceph health, and slow ops.

dmesg logs

Check the dmesg logs on the node where pvc mounting is failing or the csi-rbdplugin container of the csi-rbdplugin-xxxx pod on that node.

dmesg

RBD Commands

If nothing else helps, get the last executed command from the ceph-csi pod logs and run it manually inside the provisioner or plugin pod to see if there are errors returned even if they couldn’t be seen in the logs.

$ rbd ls --id=csi-rbd-node -m=10.111.136.166:6789 --key=AQDpIQhg+v83EhAAgLboWIbl+FL/nThJzoI3Fg==

Where -m is one of the mon endpoints and the --key is the key used by the CSI driver for accessing the Ceph cluster.