CIVITAS/CORE V2 Platform Architecture
The following figure depicts the target architecture of the platform. For the state of development and roadmap, please visit our gitlab.
Architecture Areas
The platform architecture follows the Architecture Principles. It is the product of our Architecture Decision Records (ADRs). It implements the full set of platform capabilities as defined in the Capability Map. The architecture is structured into multiple areas, each with a specific responsibility.
Platform Access
The Platform Access area is the entrypoint to the platform for users and external systems via an API gateway. It enforces authentication and authorization for all platform components. It provides API routes for platform management and data access. It can be used to configure individual routes for datasets.
Dataset & Platform Management
The Dataset & Platform Management area provides a central UI for platform and data management. It is used to create, manage, and process data models using Datasets, Datasources, and Datastructures. It follows a service-oriented architecture style. It propagates user events to other platform components via an event bus (for example, “role created” or “dataset created”). It also propagates data models to other platform components via the message bus. It provides identity and access management, and it provides interfaces for monitoring platform components.
Data Flow Management
The Data Flow Management area implements the data flows defined by Datasets. It follows an event-driven architecture style to achieve loose coupling and feature extensibility. The message bus transports both data models and payload data as part of these data flows. This area orchestrates data flows between components. Data models are consumed by configuration adapters to configure platform components. Various standard components are used to provide persistence and standard API implementations, including the ability to deploy Postgres databases for components in the platform.
Data Presentation
The Data Presentation area is loosely coupled with the platform via public interfaces. It is configured by models consumed by its configuration adapters. It offers web clients to visualize, explore, and discover datasets.
Public Interfaces & Data Flow
Public interfaces are the interfaces that are exposed outside of platform components and outside of the Kubernetes cluster. There are two types of public interfaces: interfaces to read and write payload data and to read dataset metadata (for example OGC WMS/WFS/W[X]S, STA, NGSI-LD, and interfaces for connectors, as well as the DCAT-AP.de API), and interfaces to manage the platform (for example Datasets, Datasources, Datastructures, users, roles, and groups via the Portal Backend API).
Data-consuming components in the Data Presentation area read data from these public interfaces. Data is routed by Redpanda Connect in data flows defined in datasets; for this, data is transported via the message bus. For managing users, roles, and groups, the Portal Frontend is used; the Keycloak admin UI is not supposed to be used directly. The platform domain-specific version of the users/roles/groups data is persisted in the database of the Portal Backend. This view is synced to Keycloak, which performs the actual authentication and authorization via the Keycloak Configuration Adapter (see Propagation of Platform Configuration).
Propagation of Platform Configuration
Platform management follows the concept of the model-centric data flow as depicted in the following figure:
A data flow can consist of multiple well-defined steps. Data is ingested via a standard API or an inbound connector. The data can then be transformed (for example filtering, analysis, aggregation, or changes to the datastructure) before it is persisted in a platform-managed storage. The platform offers various persistence storages, such as FROST-Server, Stellio, or a Postgres database. For publication, the data can be transformed again to comply with the datastructures of the chosen standard APIs or outbound connectors. The data is then made available via public interfaces (for example OGC WMS/WFS, SensorThings API or NGSI-LD) or via specialized outbound connectors. To derive higher-value data, data can be transformed and persisted multiple times within a single data flow. Each step in the data flow is optional, meaning data does not need to be transformed or persisted if it is not required. Each step in the data flow is configured by a model. Models can reference each other; for example, the input datastructure of a transformation and the output datastructure of the previous step (such as a connector) is the same and can be defined once and by referenced by both models of the given steps. The models are managed by one or more platform components.
The platform is configured through models defined by the user. Components that cannot consume these models directly are configured by their specific configuration adapters, which consume the models on their behalf and configure the component. The models are transported within the platform via the message bus. Models can be defined by the user at runtime via the Portal Frontend in Datasets, Datasources, and Datastructures. Changes to models are sent to the Portal Backend. The Portal Backend propagates (user) events in the platform via the message bus. The Portal Backend also sends models to Eclipse Fennec Model Atlas to be converted into a common format and to be linked with other models. Eclipse Fennec Model Atlas uses Apicurio Registry as its persistence layer and handles versioning as well. Apicurio Registry itself persists the models in Apache Kafka for high availability.
Authentication & Authorization Flow
This section describes where authentication and authorization are performed in the platform. The Portal Frontend authenticates the user using Keycloak (see Authentication Flow). Every user action performed in the Frontend and sent to the Backend is authenticated using a JWT token from Keycloak. These user actions are authorized by the Portal Backend using the Policy Decision Point (PDP). The PDP implements the Authorization Model to determine the user’s permissions. Every API request for payload data is authenticated by Keycloak and authorized by the PDP.
Components & Capabilities
In this section the components of the platform and their responsibilities are described briefly.
Platform Access
Apache APISIX (API Management)
Apache APISIX is an open-source API gateway for traffic management, security and observability. In CIVITAS/CORE it is used as the centralized entrypoint to route and protect APIs (Authorization, Authentication, Rate Limiting, Monitoring, etc.) and to expose datasource-specific APIs dynamically.
Dataset & Platform Management
Open Policy Agent (Policy Decision Point)
Open Policy Agent is the platform’s authorization service: it evaluates “who may do what” against the CIVITAS/CORE authorization model (see Authorization Data Model). In CIVITAS/CORE it is used by APISIX to authorize user actions and API requests across management and data interfaces.
Auth Adapter (User Permission Retriever)
The Auth Adapter is a component that retrieves the user permissions from the Portal Backend database and passes them to the Open Policy Agent. It enables the Policy Decision Point to make authorization decisions based on role assignments and possibly other information specific to the platform's domain model.
Keycloak (Identity Management)
Keycloak is an open-source Identity and Access Management (IAM) solution providing SSO and OAuth2/OpenID Connect flows. In CIVITAS/CORE it is used to authenticate users and issue JWTs for the Portal Frontend/Backend and for API access. Keycloak’s admin UI is not intended for day-to-day operations
Portal Frontend (Central User Interface)
The Portal Frontend is the central web UI for operating CIVITAS/CORE. It provides a single source of truth for datasets, datasources, datastructures, users, roles, groups and other entities of the domain model. It is developed and maintained by the CIVITAS/CORE developement team.
Portal Backend (Platform Management Backend)
The Portal Backend provides the management APIs and business logic for platform administration and configuration propagation. In CIVITAS/CORE it persists the entities of the platform domain model, publishes events/models to the message bus, and enforces authorization (via the PDP). It is developed and maintained by the CIVITAS/CORE developement team.
Eclipse Fennec Model Atlas (Model Transformation & Linkage)
Eclipse Fennec Model Atlas is a model transformation and linkage component used to convert data models from various formats into a common representation and link them with related models. For that it uses the EMF framework. In CIVITAS/CORE it is used to validate and process datastructure and pipeline models so downstream components can be configured consistently.
Apicurio Registry (Model Versioning)
Apicurio Registry is an artifact/schema registry for storing and versioning models. In CIVITAS/CORE it is used as the persistence/versioning layer for models and their evolution over time. It uses Apache Kafka as a storage backend.
Data Flow Management
Apache Kafka (Message Bus)
Apache Kafka is a distributed event streaming platform. In CIVITAS/CORE it is used as the message bus to transport events, models and data in dataflows, enabling loosely coupled, event-driven data flow orchestration.
Redpanda Connect (Data Flow Orchestration + Connectors)
Redpanda Connect is a stream processing and connector framework for building ingestion and transformation pipelines. In CIVITAS/CORE it is used to implement dataset-defined data flows and to integrate external sources/sinks via connectors. For that, it orchestrates the data flows between components of the platform.
CloudNativePG (Postgres DB Operator)
CloudNativePG is a Kubernetes operator for running and managing PostgreSQL clusters. In CIVITAS/CORE it is used to provision and operate Postgres databases for platform components in a standardized, automated way.
Geoserver (OGC API Broker)
GeoServer is an open-source geospatial server that publishes spatial data via OGC standards such as WMS, WFS, etc. In CIVITAS/CORE it is used as the OGC API broker to expose geospatial datasets via standardized interfaces.
FROST-Server (SensorThings API Broker)
FROST-Server is a complete server implementation of the OGC SensorThings API. In CIVITAS/CORE it is used as the SensorThings API broker to store and serve IoT/sensor observations via STA.
FIWARE Stellio (NGSI-LD Broker)
Stellio is an NGSI-LD compatible context broker for managing and querying context information as linked data. In CIVITAS/CORE it is used as the NGSI-LD broker to ingest, store, and provide query access to NGSI-LD entities and relationships.
Data Presentation
Grafana (Dashboard Engine)
Grafana is an observability and dashboarding platform for metrics/logs/traces visualization. In CIVITAS/CORE it is used to build dashboard from payload data from datasets.
Apache Superset (Dashboard Engine)
Apache Superset is an open-source BI and data exploration platform. In CIVITAS/CORE it is used to explore datasets and build analytical dashboards on top of the exposed data interfaces.
Masterportal (Map Client)
Masterportal is an open-source, configurable web map client/geoportal framework. In CIVITAS/CORE it is used to visualize geospatial datasets and provide map-based exploration in the presentation layer.
Data Catalog (Metadata Catalog)
The Data Catalog is a a web interface that is used as the metadata catalog to search, explore and share public and private dataset metadata. In CIVITAS/CORE it is the user interface to not only present datasets but also enable re-use of datasets definitions and their data structures and data sources.