CIVITAS/CORE V2 Platform Architecture

The following figure depicts the target architecture of the platform. For the state of development and roadmap, please visit our gitlab.

Model-Centric Data Flow

Layers:

Architecture Areas

The platform architecture follows the Architecture Principles. It is the product of our Architecture Decision Records (ADRs). It implements the full set of platform capabilities as defined in the Capability Map. The architecture is structured into multiple areas, each with a specific responsibility.

Platform Access

The Platform Access area is the entrypoint to the platform for users and external systems via an API gateway. It enforces authentication and authorization for all platform components. It provides API routes for platform management and data access. It can be used to configure individual routes for datasets.

Dataset & Platform Management

The Dataset & Platform Management area provides a central UI for platform and data management. It is used to create, manage, and process data models using Datasets, Datasources, and Datastructures. It follows a service-oriented architecture style. It propagates user events to other platform components via an event bus (for example, “role created” or “dataset created”). It also propagates data models to other platform components via the message bus. It provides identity and access management, and it provides interfaces for monitoring platform components.

Data Flow Management

The Data Flow Management area implements the data flows defined by Datasets. It follows an event-driven architecture style to achieve loose coupling and feature extensibility. The message bus transports both data models and payload data as part of these data flows. This area orchestrates data flows between components. Data models are consumed by configuration adapters to configure platform components. Various standard components are used to provide persistence and standard API implementations, including the ability to deploy Postgres databases for components in the platform.

Data Presentation

The Data Presentation area is loosely coupled with the platform via public interfaces. It is configured by models consumed by its configuration adapters. It offers web clients to visualize, explore, and discover datasets.

Public Interfaces & Data Flow

Public interfaces are the interfaces that are exposed outside of platform components and outside of the Kubernetes cluster. There are two types of public interfaces: interfaces to read and write payload data and to read dataset metadata (for example OGC WMS/WFS/W[X]S, STA, NGSI-LD, and interfaces for connectors, as well as the DCAT-AP.de API), and interfaces to manage the platform (for example Datasets, Datasources, Datastructures, users, roles, and groups via the Portal Backend API).

Data-consuming components in the Data Presentation area read data from these public interfaces. Data is routed by Redpanda Connect in data flows defined in datasets; for this, data is transported via the message bus. For managing users, roles, and groups, the Portal Frontend is used; the Keycloak admin UI is not supposed to be used directly. The platform domain-specific version of the users/roles/groups data is persisted in the database of the Portal Backend. This view is synced to Keycloak, which performs the actual authentication and authorization via the Keycloak Configuration Adapter (see Propagation of Platform Configuration).

Propagation of Platform Configuration

Platform management follows the concept of the model-centric data flow as depicted in the following figure:

A data flow can consist of multiple well-defined steps. Data is ingested via a standard API or an inbound connector. The data can then be transformed (for example filtering, analysis, aggregation, or changes to the datastructure) before it is persisted in a platform-managed storage. The platform offers various persistence storages, such as FROST-Server, Stellio, or a Postgres database. For publication, the data can be transformed again to comply with the datastructures of the chosen standard APIs or outbound connectors. The data is then made available via public interfaces (for example OGC WMS/WFS, SensorThings API or NGSI-LD) or via specialized outbound connectors. To derive higher-value data, data can be transformed and persisted multiple times within a single data flow. Each step in the data flow is optional, meaning data does not need to be transformed or persisted if it is not required. Each step in the data flow is configured by a model. Models can reference each other; for example, the input datastructure of a transformation and the output datastructure of the previous step (such as a connector) is the same and can be defined once and by referenced by both models of the given steps. The models are managed by one or more platform components.

The platform is configured through models defined by the user. Components that cannot consume these models directly are configured by their specific configuration adapters, which consume the models on their behalf and configure the component. The models are transported within the platform via the message bus. Models can be defined by the user at runtime via the Portal Frontend in Datasets, Datasources, and Datastructures. Changes to models are sent to the Portal Backend. The Portal Backend propagates (user) events in the platform via the message bus. The Portal Backend also sends models to Eclipse Fennec Model Atlas to be converted into a common format and to be linked with other models. Eclipse Fennec Model Atlas uses Apicurio Registry as its persistence layer and handles versioning as well. Apicurio Registry itself persists the models in Apache Kafka for high availability.

Authentication & Authorization Flow

This section describes where authentication and authorization are performed in the platform. The Portal Frontend authenticates the user using Keycloak (see Authentication Flow). Every user action performed in the Frontend and sent to the Backend is authenticated using a JWT token from Keycloak. These user actions are authorized by the Portal Backend using the Policy Decision Point (PDP). The PDP implements the Authorization Model to determine the user’s permissions. Every API request for payload data is authenticated by Keycloak and authorized by the PDP.

Components & Capabilities

In this section the components of the platform and their responsibilities are described briefly.

Platform Access

Apache APISIX (API Management)

Apache APISIX is an open-source API gateway for traffic management, security and observability. In CIVITAS/CORE it is used as the centralized entrypoint to route and protect APIs (Authorization, Authentication, Rate Limiting, Monitoring, etc.) and to expose datasource-specific APIs dynamically.

Architecture Areas​

Platform Access​

Dataset & Platform Management​

Data Flow Management​

Data Presentation​

Public Interfaces & Data Flow​

Propagation of Platform Configuration​

Authentication & Authorization Flow​

Components & Capabilities​

Platform Access​

Apache APISIX (API Management)​

Dataset & Platform Management​

Open Policy Agent (Policy Decision Point)​

Auth Adapter (User Permission Retriever)​

Keycloak (Identity Management)​

Portal Frontend (Central User Interface)​

Portal Backend (Platform Management Backend)​

Eclipse Fennec Model Atlas (Model Transformation & Linkage)​

Apicurio Registry (Model Versioning)​

Data Flow Management​

Apache Kafka (Message Bus)​

Redpanda Connect (Data Flow Orchestration + Connectors)​

CloudNativePG (Postgres DB Operator)​

Geoserver (OGC API Broker)​

FROST-Server (SensorThings API Broker)​

FIWARE Stellio (NGSI-LD Broker)​

Data Presentation​

Grafana (Dashboard Engine)​

Apache Superset (Dashboard Engine)​

Masterportal (Map Client)​

Data Catalog (Metadata Catalog)​

Architecture Areas

Platform Access

Dataset & Platform Management

Data Flow Management

Data Presentation

Public Interfaces & Data Flow

Propagation of Platform Configuration

Authentication & Authorization Flow

Components & Capabilities

Platform Access

Apache APISIX (API Management)

Dataset & Platform Management

Open Policy Agent (Policy Decision Point)

Auth Adapter (User Permission Retriever)

Keycloak (Identity Management)

Portal Frontend (Central User Interface)

Portal Backend (Platform Management Backend)

Eclipse Fennec Model Atlas (Model Transformation & Linkage)

Apicurio Registry (Model Versioning)

Data Flow Management

Apache Kafka (Message Bus)

Redpanda Connect (Data Flow Orchestration + Connectors)

CloudNativePG (Postgres DB Operator)

Geoserver (OGC API Broker)

FROST-Server (SensorThings API Broker)

FIWARE Stellio (NGSI-LD Broker)

Data Presentation

Grafana (Dashboard Engine)

Apache Superset (Dashboard Engine)

Masterportal (Map Client)

Data Catalog (Metadata Catalog)