Version: 2.0-beta

Backup & Restore

This document identifies what must be backed up in a CIVITAS/CORE V2 deployment and describes procedures for backup and restore.

What Must Be Backed Up

Data	Component	Storage Type	Criticality
PostgreSQL databases	CloudNativePG cluster	PersistentVolumeClaim	Critical — contains Keycloak realms/users, Portal data, Authz policies, FROST data
Kafka data	Strimzi Kafka brokers	PersistentVolumeClaim	High — contains event streams, Apicurio schema registry (KafkaSQL)
Etcd data	Etcd StatefulSet	PersistentVolumeClaim	High — contains APISIX route configurations
Kubernetes Secrets	secrets component	Kubernetes API (etcd)	Critical — contains generated passwords, TLS certs, database credentials
Deployment configuration	`deployment/` directory	Local filesystem / Git	Critical — environment-specific Helmfile configuration

warning

Kubernetes Secrets are not stored in PersistentVolumes but in the cluster's etcd. They must be backed up separately, either via Velero or by exporting them manually.

Recommended Tool: Velero

Velero is the recommended tool for backing up and restoring Kubernetes resources and persistent volumes. It supports scheduled backups, volume snapshots, and disaster recovery.

Prerequisites

Velero installed in the cluster (installation guide)
A storage backend configured (S3-compatible object storage, Azure Blob, GCS, etc.)
Volume snapshot support (CSI snapshotter or Velero's Restic/Kopia integration)

Example Velero Configuration

Install Velero with an S3-compatible backend:

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.11.0 \
  --bucket civitas-backups \
  --secret-file ./credentials-velero \
  --backup-location-config region=eu-central-1,s3ForcePathStyle=true,s3Url=https://s3.example.com \
  --snapshot-location-config region=eu-central-1 \
  --use-node-agent \
  --default-volumes-to-fs-backup

Scheduled Backups

Create a daily backup schedule for the Civitas namespace:

velero schedule create civitas-daily \
  --schedule="0 2 * * *" \
  --include-namespaces <instanceSlug> \
  --ttl 720h \
  --default-volumes-to-fs-backup

This creates a backup every day at 02:00 UTC, retaining backups for 30 days.

Manual Backup

velero backup create civitas-manual-$(date +%Y%m%d) \
  --include-namespaces <instanceSlug> \
  --default-volumes-to-fs-backup \
  --wait

Verify the backup:

velero backup describe civitas-manual-<date>
velero backup logs civitas-manual-<date>

Component-Specific Backup Procedures

PostgreSQL (CloudNativePG)

CloudNativePG supports continuous backup to object storage natively. This is the preferred method for PostgreSQL backups as it provides point-in-time recovery (PITR).

Enabling CNPG Backup

Configure backup in your environment's postgres.yaml.gotmpl:

postgres:
  cluster:
    rawValues:
      cluster:
        backup:
          barmanObjectStore:
            destinationPath: "s3://civitas-pg-backups/"
            endpointURL: "https://s3.example.com"
            s3Credentials:
              accessKeyId:
                name: pg-backup-s3
                key: ACCESS_KEY_ID
              secretAccessKey:
                name: pg-backup-s3
                key: SECRET_ACCESS_KEY
          retentionPolicy: "30d"

Create the S3 credentials secret:

kubectl create secret generic pg-backup-s3 \
  --from-literal=ACCESS_KEY_ID='your-access-key' \
  --from-literal=SECRET_ACCESS_KEY='your-secret-key' \
  -n <instanceSlug>

Manual CNPG Backup

Trigger an on-demand backup:

kubectl apply -f - <<EOF
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: pg-backup-$(date +%Y%m%d)
  namespace: <instanceSlug>
spec:
  cluster:
    name: postgres-cluster
  method: barmanObjectStore
EOF

Check backup status:

kubectl get backup -n <instanceSlug>

Kafka (Strimzi)

Kafka data is stored in PersistentVolumeClaims. Backup options:

Velero volume snapshots (recommended): Back up the Kafka PVCs as part of the namespace-level Velero backup.
Topic-level backup: Use tools like MirrorMaker 2 for cross-cluster replication.

info

Kafka's log retention policy (log.retention.ms: 2592000000 = 30 days by default) means older messages are automatically deleted. For long-term retention, configure topic-level retention or use an external sink.

Kubernetes Secrets

Secrets are auto-generated by the secrets component on first deployment. If lost, they cannot be regenerated with the same values. Back them up explicitly:

# Export all secrets from the instance namespace
kubectl get secrets -n <instanceSlug> -o yaml > secrets-backup-$(date +%Y%m%d).yaml

warning

Store secret backups encrypted and in a secure location. They contain passwords for all platform components.

Velero includes Kubernetes Secrets by default when backing up a namespace.

Restore Procedures

Full Restore with Velero

Order of operations:

Ensure the target cluster meets all prerequisites
Install Velero with the same backend configuration as the source cluster
Restore the namespace:

velero restore create civitas-restore \
  --from-backup civitas-daily-<timestamp> \
  --include-namespaces <instanceSlug> \
  --wait

Verify the restore:

# Check all pods are running
kubectl get pods -n <instanceSlug>

# Check Helm releases
helm list -n <instanceSlug> -a

# Check CNPG cluster status
kubectl get cluster -n <instanceSlug>

# Check Kafka cluster status
kubectl get kafka -n <instanceSlug>

Run helmfile apply to reconcile any drift:

helmfile -f deployment/helmfile.yaml apply -e <environment>

PostgreSQL Restore (CNPG)

To restore PostgreSQL from a CNPG Barman backup, configure recovery in your environment:

postgres:
  cluster:
    recovery:
      s3:
        accessKey: "your-access-key"
        secretKey: "your-secret-key"
    rawValues:
      cluster:
        bootstrap:
          recovery:
            source: postgres-cluster
            recoveryTarget:
              targetTime: "2025-01-15T10:00:00Z"  # Point-in-time recovery
        externalClusters:
          - name: postgres-cluster
            barmanObjectStore:
              destinationPath: "s3://civitas-pg-backups/"
              endpointURL: "https://s3.example.com"
              s3Credentials:
                accessKeyId:
                  name: pg-backup-s3
                  key: ACCESS_KEY_ID
                secretAccessKey:
                  name: pg-backup-s3
                  key: SECRET_ACCESS_KEY

Then redeploy the PostgreSQL component:

helmfile -f deployment/helmfile.yaml apply -e <environment> --selector component=postgres

Secrets-Only Restore

If you only need to restore secrets (e.g., after accidental deletion), apply the backup file created during the Kubernetes Secrets backup step:

kubectl apply -f secrets-backup-<date>.yaml

Then restart affected pods to pick up the restored secrets:

kubectl rollout restart deployment -n <instanceSlug>

Success Criteria

A restore is considered successful when:

All pods are in Running / Completed state
kubectl get cluster -n <instanceSlug> shows PostgreSQL cluster as Cluster in healthy state
kubectl get kafka -n <instanceSlug> shows Kafka cluster as Ready
Portal is accessible at https://portal.<domain> and login works
Keycloak admin console is accessible at https://idm.<domain>/admin
Previously created data (users, entities, configurations) is present

What Must Be Backed Up​

Recommended Tool: Velero​

Prerequisites​

Example Velero Configuration​

Scheduled Backups​

Manual Backup​

Component-Specific Backup Procedures​

PostgreSQL (CloudNativePG)​

Enabling CNPG Backup​

Manual CNPG Backup​

Kafka (Strimzi)​

Kubernetes Secrets​

Restore Procedures​

Full Restore with Velero​

PostgreSQL Restore (CNPG)​

Secrets-Only Restore​

Success Criteria​