Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice for fault-tolerant redis with kuberay #2684

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions docs/best-practice/gcs-fault-tolerance-persistent-redis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Fault Tolerant GCS with persistent Redis

Using Redis to back up the Global Control Store (GCS) with KubeRay provides
fault tolerance in the event that the Ray Head is lost, allowing the new Ray
Head to rebuild its state from reading Redis.

However, if Redis also has data loss, the Ray Head state will be lost.

You may want further protection in the event that your Redis cluster experiences
partial or total failure. This guide documents how to configure and tune Redis
for a highly available Ray Cluster with KubeRay.

Tuning your ray cluster to be highly available can safeguard long-running jobs
against unexpected failures or allow you to run on commodity
hardware/pre-emptible machines.

## Solution overview

KubeRay supports using Redis to persist the GCS, which allows us to move the
point of failure (for data loss) externally. We still have to configure Redis
itself to be resilient to failures.

Our solution will provision a [Persistent
Volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) backed
by hardware storage, which Redis will use to write regular snapshots. If Redis
(or its host node) is lost, the Redis deployment can be restored from the
snapshot.

While Redis supports clustering, KubeRay only supports standalone (single
replica) Redis, so clustering is omitted.

## Persistent storage

Specialty storage volumes (like Google Cloud Storage FUSE or S3) do not support
append operations, which Redis uses to efficiently write its Append Only File
(AOF) log. When using using these options, it is recommended to disable AOF.

With GCP GKE and Azure AKS, the default storage classes are [persistent
disks](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes)
and [SSD Azure
disks](https://learn.microsoft.com/en-us/azure/aks/azure-csi-disk-storage-provision)
respectively, and the only configuration needed to provision a disk is as
follows:

```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: standard-rwo
```

On AWS, you must [Create a storage
class](https://docs.aws.amazon.com/eks/latest/userguide/create-storage-class.html)
yourself as well.

## Tuning backups

Redis supports database dumps at set intervals, which is good for fast recovery
and high performance during normal operation.

Redis also supports journaling at frequent intervals (or continuously), which
can provide stronger durabililty at the cost of more disk writes (i.e., slower
performance).

A good starting point for backups is to enable both, which can be done like so:

```
# Dump a backup every 60s, if there are 1000 writes since the prev. backup.
save 60 1000
dbfilename dump.rdb

# Enable the append-only log file.
appendonly yes
appendfilename "appendonly.aof"

```

In this recommended configuration, full backups are created every 60s while the
append-only log is updated every second, which is a reasonable balance for disk
space, latency, and data safety.

There are more options to configure the AOF, defaults shown here:

```
# Sync the log to disk every second.
# Alternatives are "no" and "always" (every write).
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
```

You can view the full reference for 5.0.8
[here](https://raw.githubusercontent.com/redis/redis/refs/tags/7.4.0/redis.conf).


If your job is generally idempotent and can resume from several minutes of state
loss, you may prefer to disable the append-only log.

If you prefer your job to lose as little state as possible, then you may prefer
to set `appendfsync` to `always`, such that all write are stored immediately.

## Putting it together

Edit [the full YAML](../../config/samples/ray-cluster.persistent-redis.yaml) to
your satisfaction and apply it:

```
kubectl apply -f config/samples/ray-cluster.persistent-redis.yaml
```

Verify that a disk has been provisioned and redis is running:

```
kubectl get persistentvolumes
kubectl get pods
# Should see redis-0 running.
```

After running a job with some state in GCS, you will be able to delete the ray
head pod as well as the redis pod without data loss.
193 changes: 193 additions & 0 deletions ray-operator/config/samples/ray-cluster.persistent-redis.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
apiVersion: ray.io/v1
kind: RayCluster
metadata:
annotations:
ray.io/ft-enabled: "true" # enable Ray GCS FT
# In most cases, you don't need to set `ray.io/external-storage-namespace` because KubeRay will
# automatically set it to the UID of RayCluster. Only modify this annotation if you fully understand
# the behaviors of the Ray GCS FT and RayService to avoid misconfiguration.
# [Example]:
# ray.io/external-storage-namespace: "my-raycluster-storage"
name: raycluster-external-redis
spec:
rayVersion: '2.9.0'
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams:
# Setting "num-cpus: 0" to avoid any Ray actors or tasks being scheduled on the Ray head Pod.
num-cpus: "0"
# redis-password should match "requirepass" in redis.conf in the ConfigMap above.
# Ray 2.3.0 changes the default redis password from "5241590000000000" to "".
redis-password: $REDIS_PASSWORD
# Pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "1"
requests:
cpu: "1"
env:
# Ray will read the RAY_REDIS_ADDRESS environment variable to establish
# a connection with the Redis server. In this instance, we use the "redis"
# Kubernetes ClusterIP service name, also created by this YAML, as the
# connection point to the Redis server.
- name: RAY_REDIS_ADDRESS
value: redis:6379
# This environment variable is used in the `rayStartParams` above.
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-password-secret
key: password
ports:
- containerPort: 6379
name: redis
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
volumes:
- name: ray-logs
emptyDir: {}
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 10
groupName: small-group
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
# Pod template
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.9.0
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
resources:
limits:
cpu: "1"
requests:
cpu: "1"
volumes:
- name: ray-logs
emptyDir: {}
---
kind: ConfigMap
apiVersion: v1
metadata:
name: redis-config
labels:
app: redis
data:
redis.conf: |-
dir /data
port 6379
bind 0.0.0.0
protected-mode no
requirepass 5241590000000000
pidfile /data/redis-6379.pid
# Dump a backup every 60s, if there are 1000 writes since the prev. backup.
save 60 1000
dbfilename dump.rdb
# Enable the append-only log file.
appendonly yes
# Sync the log to disk every second.
# Alternatives are "no" and "always" (every write).
appendfsync everysec
# These are the default values, change if desired.
appendfilename "appendonly.aof"
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
---
apiVersion: v1
kind: Service
metadata:
name: redis
labels:
app: redis
spec:
type: ClusterIP
ports:
- name: redis
port: 6379
selector:
app: redis
---
# This volume claim will use the default storage class for your cluster/provider.
# On GCP, this is a persistent disk, which is more performant than GCS FUSE
# (which does not support append operations).
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
# choose a storageClassName provided by your Kubernetes:
#storageClassName: standard-rwo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
labels:
app: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7.4.0
command:
- "sh"
- "-c"
- "redis-server /usr/local/etc/redis/redis.conf"
ports:
- containerPort: 6379
volumeMounts:
- name: config
mountPath: /usr/local/etc/redis/redis.conf
subPath: redis.conf
- name: redis-data
mountPath: /data
volumes:
- name: config
configMap:
name: redis-config
- name: redis-data
persistentVolumeClaim:
claimName: redis-data
---
# Redis password
apiVersion: v1
kind: Secret
metadata:
name: redis-password-secret
type: Opaque
data:
# echo -n "5241590000000000" | base64
password: NTI0MTU5MDAwMDAwMDAwMA==
Loading