Skip to main content

ADR 011: Select Observability Dashboard

Date: 2025-09-16

Status: Accepted

Decision Makers: @JulianSobott @cr0ssing @luckey @DerLinne

Context

We need a tool to monitor and explore the platform. In Civitas v1, Grafana was used for this. The tool should be able to a) explore logs b) explore metrics c) create dashboards

Checked Architecture Principles

  • [none] Model-centric data flow – not applicable
  • [partial] Distributed architecture with unified user experience – Grafana is only used internally for monitoring.
  • [full] Modular design – Grafana only visualizes data, which is stored in other services. Underlying services like Loki could be replaced, but also Grafana could be replaced, without affecting much of the infrastructure.
  • [full] Integration capability through defined interfaces – Grafana has many connectors to the most common data sources.
  • [full] Open source as the default – Fully open-source, strong community, long-term sustainability. However, there exists an enterprise/cloud version.
  • [full] Cloud-native architecture – Runs in Docker and can be deployed via Helm
  • [full] Prefer standard solutions over custom development – Grafana is a widely used observability tool
  • [full] Self-contained deployment – Can be deployed and operated entirely inside Kubernetes.
  • [full] Technological consistency to ensure maintainability
  • [full] Multi-tenancy – Grafana allows the creation of organizations that isolate data sources and dashboards.
  • [partial] Security by design – Support OAuth2, Audit logs and RBAC. But permissions and roles are rather limited.

Decision

Grafana should be used as the tool for observability and monitoring. Most devs are familiar with it, and it satisfies most of our requirements.

Consequences

Monitoring tools for metrics and logs should be compatible with Grafana.

Alternatives

  • OpenSearch Dashboards: tightly coupled to OpenSearch, and focus primarily on logs
  • Zabbix: Less for rich, interactive dashboards over logs or tracing. More oriented toward threshold-based alerts.

See also