Skip to main content

ADR 037: Use Redpanda Connect as the Default ETL Pipeline Runtime for Event-driven Data Ingestion and Transformation

Date: 2026-03-10

Status: Proposed

Decision Makers: CIVITAS/CORE Architecture Board

Context

CIVITAS/CORE v2 requires a runtime engine for ETL-style and event-driven data pipelines that can ingest data from heterogeneous sources, transform payloads, and deliver results reliably into the platform event backbone and downstream targets.

The selected runtime must fit the architectural direction of CIVITAS/CORE v2:

  • model-centric platform architecture
  • strong alignment with event-driven integration patterns
  • cloud-native and headless deployment in Kubernetes
  • operation as part of a modular platform rather than as a standalone data product
  • support for low-latency streaming use cases as well as practical transformation pipelines
  • operational simplicity for platform teams
  • extensibility through configuration rather than extensive custom code

Within the project, Redpanda Connect has already been explored through proofs of concept, UI work, demo stacks, test environments, and architecture work around concrete pipeline structures.

Redpanda Connect is a headless data streaming and transformation engine with a broad catalog of inputs, outputs, and processors. It supports Kafka-compatible inputs and outputs, declarative transformations using Bloblang and mapping processors, broker patterns such as fan-out, dynamic inputs and outputs via APIs, health endpoints, metrics, and tracing.

At the same time, CIVITAS/CORE has historically considered other ETL add-ons such as Apache Airflow and Node-RED. The decision therefore needs to compare Redpanda Connect with plausible alternatives rather than assume it is the only viable option.

Checked Architecture Principles

  • [partial] Model-centric data flow
  • [full] Distributed architecture with unified user experience
  • [full] Modular design
  • [full] Integration capability through defined interfaces
  • [partial] Open source as the default
  • [full] Cloud-native architecture
  • [full] Prefer standard solutions over custom development
  • [full] Self-contained deployment
  • [full] Technological consistency to ensure maintainability
  • [partial] Multi-tenancy
  • [partial] Security by design

Comments on partial ratings:

  • Model-centric data flow: The runtime itself is configuration-driven, not model-driven. Pipeline definitions in CIVITAS/CORE therefore need to remain derived from platform domain models and metadata instead of treating Redpanda Connect YAML as the source of truth.
  • Open source as the default: Redpanda Connect has a community and enterprise split. The default runtime choice is still close to the open-source preference, but not every advanced capability is available under the same licensing model.
  • Multi-tenancy: Tenant separation is not provided by the runtime alone and must be enforced through deployment topology, namespaces, credentials, topics, quotas, and policies.
  • Security by design: The engine offers useful runtime features, but secure operation still depends on CIVITAS/CORE controls such as secret handling, network policies, RBAC, workload isolation, and a controlled connector catalog.

Decision

CIVITAS/CORE v2 adopts Redpanda Connect as the default runtime engine for ETL-style and event-driven pipelines.

This means:

  • Redpanda Connect is the preferred execution engine for ingestion, routing, mapping, filtering, enrichment, and sink delivery of streaming and near-real-time platform pipelines.
  • Pipeline definitions in CIVITAS/CORE remain platform-owned artifacts. The platform domain model, UI and editor, and model registry remain the authoritative layer; Redpanda Connect configurations are derived runtime artifacts.
  • Redpanda Connect is used primarily as a headless execution substrate, not as the user-facing modeling environment.
  • Kafka-compatible integration is treated as a first-class pattern because it aligns with CIVITAS/CORE's event backbone and Redpanda Connect's native strengths.
  • Other pipeline engines may still be supported in the future, but they are exceptions or extension points rather than the default execution path.

Consequences

  • The pipeline domain model and editor must stay independent from Redpanda Connect syntax and map platform concepts to runtime artifacts explicitly.
  • Provisioning components must generate, validate, deploy, update, and version Redpanda Connect configurations in Kubernetes.
  • Security controls such as secret injection, connector allowlists, egress restrictions, identity separation, and workload isolation become mandatory platform concerns.
  • Health checks, metrics, and tracing can be integrated into the central observability stack, which improves platform operations.
  • Redpanda Connect becomes the default for streaming and integration-oriented ETL, but not automatically for every orchestration problem. Long-running, human-in-the-loop, calendar-driven, or heavily scheduled workflows may justify a different tool.
  • The platform should keep its pipeline model sufficiently engine-agnostic so that a future runtime change remains possible.

Alternatives

  • Apache Airflow: Rejected as the default runtime because it is strongest for DAG-based orchestration and scheduled workflows, but less natural as the primary engine for low-latency event-stream integration.
  • Apache NiFi: Rejected as the default runtime because it is powerful and feature-rich, but introduces an additional control surface and user-facing flow paradigm that overlaps with CIVITAS/CORE's own modeling ambitions.
  • Kafka Connect: Rejected as the default runtime because it is well suited for connector-based source and sink integration, but less expressive for richer transformations and flow composition. In addition, license requirements make it less attractive for CIVITAS/CORE.
  • Custom-built pipeline engine: Rejected because it would create avoidable maintenance burden and duplicate capabilities already available in mature tooling.

See also

  • ADR 036: Event driven Communication and Loose Coupling