-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stash v1beta1 design discussion #648
Comments
Stash has traditionally targeted file backups 'within' a workload. Injecting sidecars to gain access to the mounted disk within the Pod. I believe this was motivated by the early days of k8s where most disk was exclusive-mounted on a Pod (ReadWriteOnce) and sometime not auto-provisioned (via PVC/PV). However, now a great many persistent file storage options are auto-provisioned (PVCs) and are ReadWriteMany, where the same PVC/PV volume or folder is mountable multiple times, e.g NFS, EFS, CephFS. If Stash is being redesigned, it would be nice to have an option for PVCs where:
|
@whereisaaron We have updated design doc. It includes your requested feature. We made some significant changes of previous design. Please review and give us your valuable feedback. |
Do you already have an ETA for this new version? |
@lorenzomorandini we are targeting for March. |
Backing up target PVC’s looks good. I assume you can restore to a PVC target the same way? The StashTemplate multi-step stuff looks perhaps too complex? Is it really needed? Are the examples useful? E.g. you only need to init a repo once. |
@whereisaaron yes. You can restore to PVC in same way. Check Recover Volume part. You are right. Multi-step stuff looks bit complex. But, user don't have to worry about it. Stash will take care of them. Actually, it will give us re-usability of backup components in our other tools. It will also give the users ability to configure their own backup process using Stash. It will also make easy to add support for new database as it will not require to change anything in the operator. We have to add just an Action and StashTemplate. |
I'm really impressed by this, and it looks like the wishlist of many here. Again, thanks for the amazing work you are doing, this is beautiful! First, I'd like to comment that it looks like the new backup mechanism from AWS, and I'm excited from both :) For default backup, I'd use the same strategy as for StorageClass for instance, an annotation. This way, you avoid the creation of another CRD - DefaultBackupConfiguration. I think also, with the introduction of VolumeSnapshot in 1.12 that you have to cope with that in your design. I'm a baremetal hobyist and an AWS profesional, I'd love to use stash in both environment :) And this last comment leads me to the last project I'm working on. Hopefully, next monday, we'll discuss about that in sig-apps I'd love that you could come, and discuss about that topic. |
I worked a bit more on it. I have some more comments. An To go back to AWS blogpost, I think we need the followoing:
I think it is kind of a cool idea to imagine KubeDB deploys the BackupProcedure of PG for instance, and as a user, when I deploy my PG instance, I configure the I still don't grasp the need to backup workload data, as workload should have volumes. |
@pierreozoux thank you for this great feedback. I really loved the naming. I will discuss with my team about this. I think we can make some adjustment by following your feedback before we announce this design doc to public.
That's the plan for integration with KubeDB.
Yes. We are doing this for |
Folks, fyi, @hossainemruz and I did a round of review today and left the comments in the pr #647. |
👍 |
RestoreSession
Discussion Recording: https://www.youtube.com/watch?v=BLS_Iipe6ew |
|
Improvement:
|
Restic issues
|
|
|
|
Stash Design Overview
We are going to make a design overhaul of Stash to simplify backup and recovery process and support some most requested features. This doc will discuss what features stash is going to support and how these features may work.
We have introduced some new crd such as Function, Task etc. and made whole process more modular. This will make easy to add support for new features and the users will also be able to customize backup process. Furthermore, this will make stash resources inter-operable between different tools and even might allow to use stash resources as function in serverless concept.
We are hoping this design will graduate to GA. So, we are taking security seriously. We are going to make sure that nobody can bypass clusters security using Stash. This might requires to remove some existing features (for example, restore from different namespace). However, we will provide an alternate way to cover those use cases.
Goal
Goal of this new design to support following features:
Schedule Backup and Restore Workload Data
Backup Workload Data
User will be able to backup data from a running workload.
What user have to do?
Repository
crd.BackupConfiguration
crd pointing to targeted workload.Sample
Repository
crd:Sample
BackupConfiguration
crd:How it will work?
BackupCofiguration
crd. When it will find aBackupConfiguration
crd, it will inject asidecar
container to the workload and start acron
for scheduled backup.cron
will createBackupSession
crd.sidecar
container watches forBackupSession
crd. If it find one, it will take backup instantly and updateBackupSession
status accordingly.Sample
BackupSession
crd:Restore Workload Data
User will be able to restore backed up data either into a separate volume or into the same workload from where the backup was taken. Here, is an example for recovering into same workload.
What user have to do?
RestoreSession
crd pointingtarget
field to the workload.Sample
RestoreSession
crd to restore into same workload:How it will work?
RestoreSession
crd created to restore into a workload, it will inject ainit-container
to the targeted workload.init-container
will restore data inside the workload.Schedule Backup and Restore PVC
Backup PVC
User will be also able to backup stand-alone pvc. This is useful for
ReadOnlyMany
orReadWriteMany
type pvc.What user have to do?
Create a
Repository
crd for respective backend.Create a
BackupConfiguration
crd pointingtarget
field to the volume.Sample
BackupConfiguration
crd to backup a PVC:How it will work?
CronJob
using information of respectiveTask
crd specified bytask
field.CronJob
will take periodic backup of the target volume.Restore PVC
User will be able to restore backed up data into a volume.
What user have to do?
RestoreSession
crd pointingtarget
field to the target volume where the recovered data will be stored.Sample
RestoreSession
crd to restore into a volume:How it will work?
RestoreSession
crd created to restore into a volume, it will launch a Job to restore into that volume.Schedule Backup and Restore Database
Backup Database
User will be able to backup database using Stash.
What user have to do?
Repository
crd for respective backend.AppBinding
crd which holds connection information for the database. If the database is deployed with KubeDB,AppBinding
crd will be created automatically for each database.BackupConfiguration
crd pointing to theAppBinding
crd.Sample
AppBinding
crd:Sample
BackupConfiguration
crd for database backup:How it will work?
BackupConfiguration
crd for database backup, it will lunch aCronJob
to take periodic backup of this database.Restore Database
User will be able to initialize a database from backed up snapshot.
What user have to do?
RestoreSession
crd withtarget
field pointing to respectiveAppBinding
crd of the target database.Sample
RestoreSession
crd to restore database:How it will work?:
Schedule Backup Cluster YAMLs
User will be able to backup yaml of the cluster resources. However, currently stash will not provide automatic restore cluster from the YAMLs. So, user will have to create them manually.
In future, Stash might be able to backup and restore not only YAMLs but also entire cluster.
What user have to do?
Repository
crd for respective backend.BackupConfiguration
crd withtask
field point to aTask
crd that backup cluster.Sample
BackupConfiguration
crd to backup YAMLs of cluster resources:How it will work?
CronJob
using informations of theTask
crd specified throughtask
filed.CronJob
will take periodic backup of the cluster.Trigger Backup Instantly
User will be able to trigger a scheduled backup instantly.
What user have to do?
BackupSession
crd pointing to the targetBackupConfiguration
crd.Sample
BackupSession
crd for triggering instant backup:How it will work?
sidecar
container, thesidecar
container will take instant backup as it watches forBackupSession
crd.CronJob
, Stash will lunch another job to take instant backup of the target.Default Backup
User will also be able to configure a
default
backup for the cluster. So, user will no longer need to createRepository
andBackupConfiguration
crd for every workload he want to backup. Instead, she will need to add some annotations to the target workload.What user have to do?
BackupTemplate
crd which will hold backend information and backup information.AppBinding
crd.Default Backup of Workload Data
Sample
BackupTemplate
crd to backup workload data:Sample workload with annotations for default backup:
Default Backup of a PVC
Sample
BackupTemplate
crd for stand-alone pvc backup:Sample PVC with annotation for default backup:
Default Backup of Database
Sample
BackupTemplate
crd for database backup:Sample
AppBinding
crd with annotations for default backup:How it will work?
AppBinding
crds. When Stash will find an workload/volume/AppBinding crd with these annotations, it will create aRepository
crd and aBackupConfiguration
crd using the information from respectiveTask
.Auto Restore
User will be also able to configure an automatic recovery for a particular workload. Each time the workload restart, at first it will perform restore data from backup then original workload's container will start.
What user have to do?
Sample workload wit annotation to restore on restart:
How it will work?
RestoreSession
crd configured for auto recovery, it will inject aninit-container
to the target.init-container
will perform recovery on each restart.Stash cli/kubectl-plugin
We are going to provide a Stash plugin for
kubectl
. This will help to perform following operations:RestoreSession
object.Function
Function
are independent single-containered workload specification that perform only single task. For example, pgBackup takes backup a PostgreSQL database and clusterBackup takes backup of YAMLs of cluster resources.Function
crd has some variable fields with$
prefix which hast be resolved while creating respective workload. You can consider these variable fields as input for anFunction
.Some example
Function
definition is given below:clusterBackup
pgBackup
pgRecovery
stashPostBackup
stashPostRecovery
Task
A complete backup process may need to perform multiple function. For example, if you want to backup a PostgreSQL database, we need to initialize a
Repository
, then backup the database and finally updateRepository
andBackupSession
status to inform backup is completed or push backup metrics to apushgateway
.Task
specifies these functions sequentially along with their inputs.We have chosen to break complete backup process into several independent steps so that those individual functions can be used with other tool than Stash. It also make easy to add support for new feature. For example, to add support new database backup, we will just require to add a
Function
andTask
crd. We will no longer need change anything in Stash operator code. This will also helps users to backup databases that are not officially supported by stash.Some sample
Task
is given below:pgBackup
pgRecovery
clusterBackup
The text was updated successfully, but these errors were encountered: