Troubleshooting
Pods Not Starting
Symptom: Pods stuck in Pending, CrashLoopBackOff, or ImagePullBackOff.
Diagnosis:
# Check pod events
kubectl describe pod <pod-name> -n <instanceSlug>
# Check pod logs
kubectl logs <pod-name> -n <instanceSlug>
# Check events in the namespace
kubectl get events -n <instanceSlug> --sort-by='.lastTimestamp'
**Common causes:**
- **Pending:** Insufficient resources (CPU/memory). Scale your cluster or reduce resource requests.
- **ImagePullBackOff:** Container image not accessible. Check image names in `components/<name>/images.yaml` and ensure network/registry access.
- **CrashLoopBackOff:** Application error. Check logs for configuration issues or missing dependencies.
tip
If you see a keycloak-config-cli pod stuck in CrashLoopBackOff with errors related to missing secrets, ensure that you added the secrets as described in the prerequisites documentation.
Helmfile Sync Fails
Symptom: helmfile sync or helmfile apply exits with an error without any abnormalities in the kubernetes cluster.
Diagnosis:
# Run with verbose output
helmfile -f deployment/helmfile.yaml sync -e local --debug
# Validate templates without deploying
helmfile -f deployment/helmfile.yaml lint -e local
# Render templates to check for errors
helmfile -f deployment/helmfile.yaml template -e local
Common causes:
- Template rendering error: Syntax error in
.gotmplfiles. Check the error message for file and line number. - Timeout: A release took too long to become ready. The default timeout is 300 seconds (configurable in
defaults/helm-defaults.yaml). Increase it if needed. - CRD not installed: Some components (PostgreSQL, Kafka) require CRDs from their operators. Ensure the operator part deploys before the cluster part.
Increasing Timeouts
If deployments timeout during helmfile sync, increase the timeout in defaults/helm-defaults.yaml:
helmDefaults:
wait: true
waitForJobs: true
timeout: 600 # seconds (default: 300)