ray-project · spencer-p · Dec 23, 2024 · Jan 3, 2025
diff --git a/docs/best-practice/gcs-fault-tolerance-persistent-redis.md b/docs/best-practice/gcs-fault-tolerance-persistent-redis.md
@@ -0,0 +1,127 @@
+# Fault Tolerant GCS with persistent Redis
+
+Using Redis to back up the Global Control Store (GCS) with KubeRay provides
+fault tolerance in the event that the Ray Head is lost, allowing the new Ray
+Head to rebuild its state from reading Redis.
+
+However, if Redis also has data loss, the Ray Head state will be lost.
+
+You may want further protection in the event that your Redis cluster experiences
+partial or total failure. This guide documents how to configure and tune Redis
+for a highly available Ray Cluster with KubeRay.
+
+Tuning your ray cluster to be highly available can safeguard long-running jobs
+against unexpected failures or allow you to run on commodity
+hardware/pre-emptible machines.
+
+## Solution overview
+
+KubeRay supports using Redis to persist the GCS, which allows us to move the
+point of failure (for data loss) externally. We still have to configure Redis
+itself to be resilient to failures.
+
+Our solution will provision a [Persistent
+Volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) backed
+by hardware storage, which Redis will use to write regular snapshots. If Redis
+(or its host node) is lost, the Redis deployment can be restored from the
+snapshot.
+
+While Redis supports clustering, KubeRay only supports standalone (single
+replica) Redis, so clustering is omitted.
+
+## Persistent storage
+
+Specialty storage volumes (like Google Cloud Storage FUSE or S3) do not support
+append operations, which Redis uses to efficiently write its Append Only File
+(AOF) log. When using using these options, it is recommended to disable AOF.
+
+With GCP GKE and Azure AKS, the default storage classes are [persistent
+disks](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes)
+and [SSD Azure
+disks](https://learn.microsoft.com/en-us/azure/aks/azure-csi-disk-storage-provision)
+respectively, and the only configuration needed to provision a disk is as
+follows:
+
+```
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: redis-data
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 8Gi
+  storageClassName: standard-rwo
+```
+
+On AWS, you must [Create a storage
+class](https://docs.aws.amazon.com/eks/latest/userguide/create-storage-class.html)
+yourself as well.
+
+## Tuning backups
+
+Redis supports database dumps at set intervals, which is good for fast recovery
+and high performance during normal operation.
+
+Redis also supports journaling at frequent intervals (or continuously), which
+can provide stronger durabililty at the cost of more disk writes (i.e., slower
+performance).
+
+A good starting point for backups is to enable both, which can be done like so:
+
+```
+# Dump a backup every 60s, if there are 1000 writes since the prev. backup.
+save 60 1000
+dbfilename dump.rdb
+
+# Enable the append-only log file.
+appendonly yes
+appendfilename "appendonly.aof"
+
+```
+
+In this recommended configuration, full backups are created every 60s while the
+append-only log is updated every second, which is a reasonable balance for disk
+space, latency, and data safety.
+
+There are more options to configure the AOF, defaults shown here:
+
+```
+# Sync the log to disk every second.
+# Alternatives are "no" and "always" (every write).
+appendfsync everysec
+auto-aof-rewrite-percentage 100
+auto-aof-rewrite-min-size 64mb
+```
+
+You can view the full reference for 5.0.8
+[here](https://raw.githubusercontent.com/redis/redis/refs/tags/7.4.0/redis.conf).
+
+
+If your job is generally idempotent and can resume from several minutes of state
+loss, you may prefer to disable the append-only log.
+
+If you prefer your job to lose as little state as possible, then you may prefer
+to set `appendfsync` to `always`, such that all write are stored immediately.
+
+## Putting it together
+
+Edit [the full YAML](../../config/samples/ray-cluster.persistent-redis.yaml) to
+your satisfaction and apply it:
+
+```
+kubectl apply -f config/samples/ray-cluster.persistent-redis.yaml
+```
+
+Verify that a disk has been provisioned and redis is running:
+
+```
+kubectl get persistentvolumes
+kubectl get pods
+# Should see redis-0 running.
+```
+
+After running a job with some state in GCS, you will be able to delete the ray
+head pod as well as the redis pod without data loss.
diff --git a/ray-operator/config/samples/ray-cluster.persistent-redis.yaml b/ray-operator/config/samples/ray-cluster.persistent-redis.yaml
@@ -0,0 +1,193 @@
+apiVersion: ray.io/v1
+kind: RayCluster
+metadata:
+  annotations:
+    ray.io/ft-enabled: "true" # enable Ray GCS FT
+    # In most cases, you don't need to set `ray.io/external-storage-namespace` because KubeRay will
+    # automatically set it to the UID of RayCluster. Only modify this annotation if you fully understand
+    # the behaviors of the Ray GCS FT and RayService to avoid misconfiguration.
+    # [Example]:
+    # ray.io/external-storage-namespace: "my-raycluster-storage"
+  name: raycluster-external-redis
+spec:
+  rayVersion: '2.9.0'
+  headGroupSpec:
+    # The `rayStartParams` are used to configure the `ray start` command.
+    # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
+    # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
+    rayStartParams:
+      # Setting "num-cpus: 0" to avoid any Ray actors or tasks being scheduled on the Ray head Pod.
+      num-cpus: "0"
+      # redis-password should match "requirepass" in redis.conf in the ConfigMap above.
+      # Ray 2.3.0 changes the default redis password from "5241590000000000" to "".
+      redis-password: $REDIS_PASSWORD
+    # Pod template
+    template:
+      spec:
+        containers:
+          - name: ray-head
+            image: rayproject/ray:2.9.0
+            resources:
+              limits:
+                cpu: "1"
+              requests:
+                cpu: "1"
+            env:
+              # Ray will read the RAY_REDIS_ADDRESS environment variable to establish
+              # a connection with the Redis server. In this instance, we use the "redis"
+              # Kubernetes ClusterIP service name, also created by this YAML, as the
+              # connection point to the Redis server.
+              - name: RAY_REDIS_ADDRESS
+                value: redis:6379
+              # This environment variable is used in the `rayStartParams` above.
+              - name: REDIS_PASSWORD
+                valueFrom:
+                  secretKeyRef:
+                    name: redis-password-secret
+                    key: password
+            ports:
+              - containerPort: 6379
+                name: redis
+              - containerPort: 8265
+                name: dashboard
+              - containerPort: 10001
+                name: client
+            volumeMounts:
+              - mountPath: /tmp/ray
+                name: ray-logs
+        volumes:
+          - name: ray-logs
+            emptyDir: {}
+  workerGroupSpecs:
+    # the pod replicas in this group typed worker
+    - replicas: 1
+      minReplicas: 1
+      maxReplicas: 10
+      groupName: small-group
+      # The `rayStartParams` are used to configure the `ray start` command.
+      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
+      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
+      rayStartParams: {}
+      # Pod template
+      template:
+        spec:
+          containers:
+            - name: ray-worker
+              image: rayproject/ray:2.9.0
+              volumeMounts:
+                - mountPath: /tmp/ray
+                  name: ray-logs
+              resources:
+                limits:
+                  cpu: "1"
+                requests:
+                  cpu: "1"
+          volumes:
+            - name: ray-logs
+              emptyDir: {}
+---
+kind: ConfigMap
+apiVersion: v1
+metadata:
+  name: redis-config
+  labels:
+    app: redis
+data:
+  redis.conf: |-
+    dir /data
+    port 6379
+    bind 0.0.0.0
+    protected-mode no
+    requirepass 5241590000000000
+    pidfile /data/redis-6379.pid
+    # Dump a backup every 60s, if there are 1000 writes since the prev. backup.
+    save 60 1000
+    dbfilename dump.rdb
+    # Enable the append-only log file.
+    appendonly yes
+    # Sync the log to disk every second.
+    # Alternatives are "no" and "always" (every write).
+    appendfsync everysec
+    # These are the default values, change if desired.
+    appendfilename "appendonly.aof"
+    auto-aof-rewrite-percentage 100
+    auto-aof-rewrite-min-size 64mb
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: redis
+  labels:
+    app: redis
+spec:
+  type: ClusterIP
+  ports:
+    - name: redis
+      port: 6379
+  selector:
+    app: redis
+---
+# This volume claim will use the default storage class for your cluster/provider.
+# On GCP, this is a persistent disk, which is more performant than GCS FUSE
+# (which does not support append operations).
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: redis-data
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 8Gi
+  # choose a storageClassName provided by your Kubernetes:
+  #storageClassName: standard-rwo
+---
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: redis
+  labels:
+    app: redis
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: redis
+  template:
+    metadata:
+      labels:
+        app: redis
+    spec:
+      containers:
+        - name: redis
+          image: redis:7.4.0
+          command:
+            - "sh"
+            - "-c"
+            - "redis-server /usr/local/etc/redis/redis.conf"
+          ports:
+            - containerPort: 6379
+          volumeMounts:
+            - name: config
+              mountPath: /usr/local/etc/redis/redis.conf
+              subPath: redis.conf
+            - name: redis-data
+              mountPath: /data
+      volumes:
+        - name: config
+          configMap:
+            name: redis-config
+        - name: redis-data
+          persistentVolumeClaim:
+            claimName: redis-data
+---
+# Redis password
+apiVersion: v1
+kind: Secret
+metadata:
+  name: redis-password-secret
+type: Opaque
+data:
+  # echo -n "5241590000000000" | base64
+  password: NTI0MTU5MDAwMDAwMDAwMA==