Ceph
Ceph NFS Server CRD
Overview
Rook allows exporting NFS shares of a CephFilesystem or CephObjectStore through the CephNFS custom resource definition. This will spin up a cluster of NFS Ganesha servers that coordinate with one another via shared RADOS objects. The servers will be configured for NFSv4.1+ access only, as serving earlier protocols can inhibit responsiveness after a server restart.
WARNING: Due to a number of Ceph issues and changes, Rook officially only supports Ceph v16.2.7 or higher for CephNFS. If you are using an earlier version, upgrade your Ceph version following the advice given in Rook’s v1.8 NFS docs.
Samples
apiVersion: ceph.rook.io/v1
kind: CephNFS
metadata:
name: my-nfs
namespace: rook-ceph
spec:
# Settings for the NFS server
server:
active: 1
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- nfs-node
topologySpreadConstraints:
tolerations:
- key: nfs-node
operator: Exists
podAffinity:
podAntiAffinity:
annotations:
my-annotation: something
labels:
my-label: something
resources:
limits:
cpu: "500m"
memory: "1024Mi"
requests:
cpu: "500m"
memory: "1024Mi"
priorityClassName:
logLevel: NIV_INFO
NFS Settings
Server
The server
spec sets configuration for Rook-created NFS-Ganesha servers.
active
: The number of active NFS servers. Rook supports creating more than one active NFS server, but cannot guarantee high availability. For values greater than 1, see the known issue below.placement
: Kubernetes placement restrictions to apply to NFS server Pod(s). This is similar to placement defined for daemons configured by the CephCluster CRD.annotations
: Kubernetes annotations to apply to NFS server Pod(s)labels
: Kubernetes labels to apply to NFS server Pod(s)resources
: Kubernetes resource requests and limits to set on NFS server Pod(s)priorityClassName
: Set priority class name for the NFS server Pod(s)logLevel
: The log level that NFS-Ganesha servers should output.</br> Default value: NIV_INFO</br> Supported values: NIV_NULL | NIV_FATAL | NIV_MAJ | NIV_CRIT | NIV_WARN | NIV_EVENT | NIV_INFO | NIV_DEBUG | NIV_MID_DEBUG | NIV_FULL_DEBUG | NB_LOG_LEVEL
Creating Exports
When a CephNFS is first created, all NFS daemons within the CephNFS cluster will share a configuration with no exports defined.
Using the Ceph Dashboard
Exports can be created via the Ceph dashboard for Ceph v16 as well. To enable and use the Ceph dashboard in Rook, see here.
Using the Ceph CLI
The Ceph CLI can be used from the Rook toolbox pod to create and manage NFS exports. To do so, first ensure the necessary Ceph mgr modules are enabled, if necessary, and that the Ceph orchestrator backend is set to Rook.
Enable the Ceph orchestrator if necessary
- Required for Ceph v16.2.7 and below
- Optional for Ceph v16.2.8 and above
- Must be disabled for Ceph v17.2.0 due to a Ceph regression
ceph mgr module enable rook ceph mgr module enable nfs ceph orch set backend rook
Ceph’s NFS CLI can create NFS exports
that are backed by CephFS (a CephFilesystem) or
Ceph Object Gateway (a CephObjectStore).
cluster_id
or cluster-name
in the Ceph NFS docs normally refers to the name of the NFS cluster,
which is the CephNFS name in the Rook context.
For creating an NFS export for the CephNFS and CephFilesystem example manifests, the below command
can be used. This creates an export for the /test
pseudo path.
ceph nfs export create cephfs my-nfs /test myfs
The below command will list the current NFS exports for the example CephNFS cluster, which will give the output shown for the current example.
ceph nfs export ls my-nfs
[
"/test"
]
The simple /test
export’s info can be listed as well. Notice from the example that only NFS
protocol v4 via TCP is supported.
ceph nfs export info my-nfs /test
{
"export_id": 1,
"path": "/",
"cluster_id": "my-nfs",
"pseudo": "/test",
"access_type": "RW",
"squash": "none",
"security_label": true,
"protocols": [
4
],
"transports": [
"TCP"
],
"fsal": {
"name": "CEPH",
"user_id": "nfs.my-nfs.1",
"fs_name": "myfs"
},
"clients": []
}
If you are done managing NFS exports and don’t need the Ceph orchestrator module enabled for anything else, it may be preferable to disable the Rook and NFS mgr modules to free up a small amount of RAM in the Ceph mgr Pod.
ceph orch set backend ""
ceph mgr module disable rook
Mounting exports
Each CephNFS server has a unique Kubernetes Service. This is because NFS clients can’t readily
handle NFS failover. CephNFS services are named with the pattern
rook-ceph-nfs-<cephnfs-name>-<id>
<id>
is a unique letter ID (e.g., a, b, c, etc.) for a given
NFS server. For example, rook-ceph-nfs-my-nfs-a
.
For each NFS client, choose an NFS service to use for the connection. With NFS v4, you can mount an
export by its path using a mount command like below. You can mount all exports at once by omitting
the export path and leaving the directory as just /
.
mount -t nfs4 -o proto=tcp <nfs-service-address>:/<export-path> <mount-location>
Exposing the NFS server outside of the Kubernetes cluster
Use a LoadBalancer Service to expose an NFS server (and its exports) outside of the Kubernetes
cluster. The Service’s endpoint can be used as the NFS service address when
mounting the export manually. We provide an example Service here:
deploy/examples/nfs-load-balancer.yaml
.
Scaling the active server count
It is possible to scale the size of the cluster up or down by modifying the spec.server.active
field. Scaling the cluster size up can be done at will. Once the new server comes up, clients can be
assigned to it immediately.
The CRD always eliminates the highest index servers first, in reverse order from how they were started. Scaling down the cluster requires that clients be migrated from servers that will be eliminated to others. That process is currently a manual one and should be performed before reducing the size of the cluster.
WARNING: see the known issue below about setting this value greater than one.
Known issues
server.active count greater than 1
- Active-active scale out does not work well with the NFS protocol. If one NFS server in a cluster
is offline, other servers may block client requests until the offline server returns, which may
not always happen due to the Kubernetes scheduler.
- Workaround: It is safest to run only a single NFS server, but we do not limit this if it benefits your use case.
Ceph v17.2.0
- Ceph NFS management with the Rook mgr module enabled has a breaking regression with the Ceph
Quincy v17.2.0 release.
- Workaround: Leave Ceph’s Rook orchestrator mgr module disabled. If you have enabled it, you must
disable it using the snippet below from the toolbox.
ceph orch set backend "" ceph mgr module disable rook
- Workaround: Leave Ceph’s Rook orchestrator mgr module disabled. If you have enabled it, you must
disable it using the snippet below from the toolbox.
Advanced configuration
All CephNFS daemons are configured using shared RADOS objects stored in a Ceph pool named .nfs
.
Users can modify the configuration object for each CephNFS cluster if they wish to customize the
configuration.
Changing configuration of the .nfs pool
By default, Rook creates the .nfs
pool with Ceph’s default configuration. If you wish to change
the configuration of this pool (for example to change its failure domain or replication factor), you
can create a CephBlockPool with the spec.name
field set to .nfs
. This pool must be
replicated and cannot be erasure coded.
deploy/examples/nfs.yaml
contains a sample for reference.
Adding custom NFS-Ganesha config file changes
The NFS-Ganesha config file format for these objects is documented in the NFS-Ganesha project.
Use Ceph’s rados
tool from the toolbox to interact with the configuration object. The below
command will get you started by dumping the contents of the config object to stdout. The output will
look something like the example shown if you have already created two exports as documented above.
It is best not to modify any of the export objects created by Ceph so as not to cause errors with
Ceph’s export management.
rados --pool <pool> --namespace <namespace> get conf-nfs.<cephnfs-name> -
%url "rados://<pool>/<namespace>/export-1"
%url "rados://<pool>/<namespace>/export-2"
rados ls
and rados put
are other commands you will want to work with the other shared
configuration objects.
Of note, it is possible to pre-populate the NFS configuration and export objects prior to creating CephNFS server clusters.
Ceph CSI NFS provisioner and NFS CSI driver
EXPERIMENTAL: this feature is experimental, and we do not guarantee it is bug-free, nor will we support upgrades to future versions
In version 1.9.1, Rook is able to deploy the experimental NFS Ceph CSI driver. This requires Ceph CSI version 3.6.0 or above. We recommend Ceph v16.2.7 or above.
For this section, we will refer to Rook’s deployment examples in the deploy/examples directory.
The Ceph CSI NFS provisioner and driver require additional RBAC to operate. Apply the
deploy/examples/csi/nfs/rbac.yaml
manifest to deploy the additional resources.
Rook will only deploy the Ceph CSI NFS provisioner and driver components when the
ROOK_CSI_ENABLE_NFS
config is set to "true"
in the rook-ceph-operator-config
configmap. Change
the value in your manifest, or patch the resource as below.
kubectl --namespace rook-ceph patch configmap rook-ceph-operator-config --type merge --patch '{"data":{"ROOK_CSI_ENABLE_NFS": "true"}}'
NOTE: The rook-ceph operator Helm chart will deploy the required RBAC and enable the driver components if
csi.nfs.enabled
is set totrue
.
In order to create NFS exports via the CSI driver, you must first create a CephFilesystem to serve as the underlying storage for the exports, and you must create a CephNFS to run an NFS server that will expose the exports.
From the examples, filesystem.yaml
creates a CephFilesystem called myfs
, and nfs.yaml
creates
an NFS server called my-nfs
.
You may need to enable or disable the Ceph orchestrator. Follow the same steps documented above based on your Ceph version and desires.
You must also create a storage class. Ceph CSI is designed to support any arbitrary Ceph cluster,
but we are focused here only on Ceph clusters deployed by Rook. Let’s take a look at a portion of
the example storage class found at deploy/examples/csi/nfs/storageclass.yaml
and break down how
the values are determined.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-nfs
provisioner: rook-ceph.nfs.csi.ceph.com # [1]
parameters:
nfsCluster: my-nfs # [2]
server: rook-ceph-nfs-my-nfs-a # [3]
clusterID: rook-ceph # [4]
fsName: myfs # [5]
pool: myfs-replicated # [6]
# [7] (entire csi.storage.k8s.io/* section immediately below)
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# ... some fields omitted ...
provisioner
: rook-ceph.nfs.csi.ceph.com because rook-ceph is the namespace where the CephCluster is installednfsCluster
: my-nfs because this is the name of the CephNFSserver
: rook-ceph-nfs-my-nfs-a because Rook creates this Kubernetes Service for the CephNFS named my-nfsclusterID
: rook-ceph because this is the namespace where the CephCluster is installedfsName
: myfs because this is the name of the CephFilesystem used to back the NFS exportspool
: myfs-replicated because myfs is the name of the CephFilesystem defined infsName
and because replicated is the name of a data pool defined in the CephFilesystemcsi.storage.k8s.io/*
: note that these values are shared with the Ceph CSI CephFS provisioner
See deploy/examples/csi/nfs/pvc.yaml
for an example of how to create a PVC that will create an NFS
export. The export will be created and a PV created for the PVC immediately, even without a Pod to
mount the PVC. The share
parameter set on the resulting PV contains the share path (share
) which
can be used as the export path when mounting the export manually.
See deploy/examples/csi/nfs/pod.yaml
for an example of how a PVC can be connected to an
application pod.