Documentation/Storage-Configuration/Block-Storage-RBD/nvme-of.md
This feature is experimental
NVMe over Fabrics (NVMe-oF) allows RBD volumes to be exposed and accessed via the NVMe/TCP protocol. This enables both Kubernetes pods within the cluster and external clients outside the cluster to connect to Ceph block storage using standard NVMe-oF initiators, providing high-performance block storage access over the network.
The NVMe-oF integration in Rook serves two primary purposes:
External Client Access: Rook serves as a backend for external clients outside the cluster, enabling non-Kubernetes workloads to access Ceph block storage through standard NVMe-oF initiators. This allows organizations to leverage their Ceph storage infrastructure for both containerized and traditional workloads.
In-Cluster Consumption: Pods inside the Kubernetes cluster can consume storage via the NVMe-oF protocol, providing an alternative to traditional RBD mounts with potential performance benefits for certain workloads.
Both use cases are supported, allowing you to choose the appropriate access method based on your specific requirements and deployment scenarios.
For more background and design details, see the NVMe-oF gateway design doc. For the Ceph-CSI NVMe-oF design proposal, see the ceph-csi NVMe-oF proposal.
This guide assumes a Rook cluster as explained in the Quickstart Guide.
Before creating the NVMe-oF gateway, you need to create a CephBlockPool that will be used by the gateway:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: nvmeof
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
Create the pool:
kubectl create -f deploy/examples/csi/nvmeof/nvmeof-pool.yaml
The CephNVMeOFGateway CRD manages the NVMe-oF gateway infrastructure. The operator will automatically create the following resources:
Create the gateway:
apiVersion: ceph.rook.io/v1
kind: CephNVMeOFGateway
metadata:
name: nvmeof
namespace: rook-ceph
spec:
# Container image for the NVMe-oF gateway daemon
image: quay.io/ceph/nvmeof:1.5
# Pool name that will be used by the NVMe-oF gateway
pool: nvmeof
# ANA (Asymmetric Namespace Access) group name
group: group-a
# Number of gateway instances to run
instances: 1
hostNetwork: false
Apply the gateway configuration:
kubectl create -f deploy/examples/nvmeof-test.yaml
Verify the gateway is running:
kubectl get pod -n rook-ceph -l app=rook-ceph-nvmeof
Example Output
NAME READY STATUS RESTARTS AGE
rook-ceph-nvmeof-nvmeof-a-85844ff6b8-4r8gj 1/1 Running 0 91s
The NVMe-oF CSI driver is deployed via the ceph-csi operator.
Apply the Driver CR for NVMe-oF that will trigger the creation of the
Ceph-CSI/NVMe-oF deployment and daemonset:
kubectl create -f deploy/examples/csi/nvmeof/driver.yaml
Verify the CSI operator created the controller and node plugins:
kubectl get pods -n rook-ceph | grep nvmeof
Example Output
rook-ceph.nvmeof.csi.ceph.com-ctrlplugin-d9d77fb7c-kkl28 5/5 Running 0 60s
rook-ceph.nvmeof.csi.ceph.com-nodeplugin-xvt5g 2/2 Running 0 60s
Create a StorageClass that uses the NVMe-oF CSI driver.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-nvmeof
parameters:
clusterID: rook-ceph
pool: nvmeof
subsystemNQN: nqn.2016-06.io.spdk:cnode1.rook-ceph
nvmeofGatewayAddress: "rook-ceph-nvmeof-nvmeof-a.rook-ceph.svc.cluster.local"
nvmeofGatewayPort: "5500"
listeners: |
[
{
"hostname": "rook-ceph-nvmeof-nvmeof-a"
}
]
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-modify-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-modify-secret-namespace: rook-ceph
csi.storage.k8s.io/node-expand-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-expand-secret-namespace: rook-ceph
imageFormat: "2"
imageFeatures: layering,deep-flatten,exclusive-lock,object-map,fast-diff
provisioner: rook-ceph.nvmeof.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
!!! note
The provisioner name rook-ceph.nvmeof.csi.ceph.com is prefixed
with the operator namespace.
kubectl create -f deploy/examples/csi/nvmeof/storageclass.yaml
Create a PVC using the NVMe-oF storage class:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvmeof-external-volume
namespace: default
spec:
storageClassName: ceph-nvmeof
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Mi
!!! note This PVC is created for CSI driver provisioning. The volume will be accessible via NVMe-oF protocol by both Kubernetes pods within the cluster and external clients outside the cluster using standard NVMe-oF initiators.
Create the PVC:
kubectl create -f deploy/examples/csi/nvmeof/pvc.yaml
Verify the PVC is bound:
kubectl get pvc nvmeof-external-volume
Example Output
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nvmeof-external-volume Bound pvc-b4108580-5cfa-46d3-beff-320088a5bf3c 128Mi RWO ceph-nvmeof 20m
Create a pod that consumes the NVMe-oF volume:
kubectl create -f deploy/examples/csi/nvmeof/pod.yaml
Verify the pod is running:
kubectl get pods -n default nvmeof-test-pod
Example Output
NAME READY STATUS RESTARTS AGE
nvmeof-test-pod 1/1 Running 0 60s
Once the PVC is created and bound, the volume is available via NVMe-oF. The volume can be accessed by both Kubernetes pods within the cluster and external clients outside the cluster.
External clients outside the Kubernetes cluster can connect to the gateway using standard NVMe-oF procedures.
nvme-tcp kernel module loaded and nvme-cli installedFrom the external client, discover available NVMe-oF subsystems:
nvme discover -t tcp -a <gateway-service-ip> -s 5500
Replace <gateway-service-ip> with the gateway service ClusterIP or an accessible endpoint.
Connect to the discovered subsystem:
nvme connect -t tcp -n <subsystem-nqn> -a <gateway-ip> -s 5500
Replace:
<subsystem-nqn> with the subsystemNQN value from your StorageClass (e.g., nqn.2016-06.io.spdk:cnode1.rook-ceph)<gateway-ip> with the gateway service IP or pod IPOnce connected, the NVMe namespace will appear as a block device on the client:
lsblk | grep nvme
The device will typically appear as /dev/nvmeXnY where X is the controller number and Y is the namespace ID.
If you want to format and mount the device:
# Format the device
sudo mkfs.ext4 /dev/nvmeXnY
# Mount the device
sudo mkdir /mnt/nvmeof
sudo mount /dev/nvmeXnY /mnt/nvmeof
For production deployments, configure multiple gateway instances for high availability:
instances: 2 or higher in the CephNVMeOFGateway speclisteners arrayExample with multiple instances:
spec:
instances: 2
# ... other settings
Then update the StorageClass listeners to include all gateway
hostnames:
listeners: |
[
{
"hostname": "rook-ceph-nvmeof-nvmeof-a"
},
{
"hostname": "rook-ceph-nvmeof-nvmeof-b"
}
]
kubectl logs -n rook-ceph -l app=rook-ceph-nvmeof --tail=100
kubectl logs -n rook-ceph deploy/rook-ceph.nvmeof.csi.ceph.com-ctrlplugin --tail=100
kubectl describe service -n rook-ceph rook-ceph-nvmeof-nvmeof-a
kubectl describe pvc nvmeof-external-volume
Ensure the rook-ceph-csi-config ConfigMap exists and contains the cluster configuration:
kubectl get configmap -n rook-ceph rook-ceph-csi-config -o yaml
!!! warning Deleting the PVC will also delete the underlying RBD image and NVMe namespace. Ensure you have backups if needed.
To clean up all the artifacts created:
# Delete the test pod
kubectl delete -f deploy/examples/csi/nvmeof/pod.yaml
# Delete the PVC
kubectl delete pvc nvmeof-external-volume
# Delete the StorageClass
kubectl delete storageclass ceph-nvmeof
# Delete the NVMe-oF CSI operator resources
kubectl delete -f deploy/examples/csi/nvmeof/csi-operator-nvmeof.yaml
# Delete the NVMe-oF gateway
kubectl delete -f deploy/examples/nvmeof-test.yaml
# Delete the block pool (optional)
kubectl delete -f deploy/examples/csi/nvmeof/nvmeof-pool.yaml