Version: 2.0-beta

ADR 023: OPA backed by AuthZ database adapter as PDP

Date: 2026-01-28

Status: Accepted

Decision Makers: @cr0ssing

Context

CIVITAS/CORE 2.0 is an urban data platform that manages DataSpaces, DataSets, and associated metadata across multiple platform services. The platform requires an authorization architecture that:

Complies with BSI TR-03187 (AR-12: established security standards; AR-19: centralized authorization for all platform components)
Enforces a Relationship-Based Access Control (ReBAC) model: Users belong to Groups, Groups are assigned Roles with Scope (Tenant/DataSpace/DataSet), Roles carry Permissions
Integrates with an existing PostgreSQL authorization schema and Keycloak for authentication
Works with Apache APISIX as the already-selected API gateway
Supports multiple frontends (management portal, data visualization) and backend services
Provides fail-secure behavior and audit logging

The authorization data model is already defined in PostgreSQL and follows a multi-hop relationship chain: User > Groups > Assignments (scoped to Tenant, DataSpace, or DataSet) > Roles > Permissions. Binary assignments (system roles assigned to groups via group_roles) and ternary assignments (data/governance roles assigned to groups with a scope via assignments) coexist. Keycloak issues JWT tokens that identify the user; all authorization data -- group memberships, role assignments, and permissions -- lives in PostgreSQL and is queried at request time. The architecture treats PostgreSQL as the single source of truth.

Frontend applications are built with Next.js and use NextAuth.js as a Backend-for-Frontend (BFF). NextAuth handles OAuth2 session management (token acquisition, callbacks, refresh) via direct communication with Keycloak, and proxies backend API calls through APISIX where authorization is enforced.

Checked Architecture Principles

Rating	Principle
full	Model-centric data flow
full	Distributed architecture with unified user experience
full	Modular design
full	Integration capability through defined interfaces
full	Open source as the default
full	Cloud-native architecture
full	Prefer standard solutions over custom development
full	Self-contained deployment
full	Technological consistency to ensure maintainability
partial	Multi-tenancy
full	Security by design

Multi-tenancy (partial): The authorization schema supports a TENANT scope level in the hierarchy (TENANT > DATASPACE > DATASET), and Rego policies implement tenant-scoped authorization checks. However, full tenant isolation (separate Keycloak realms, tenant-scoped database rows, tenant routing) is not yet implemented. The architecture does not preclude multi-tenancy and the scope hierarchy was designed with it in mind.

Decision

We adopt a centralized, gateway-enforced authorization architecture using Open Policy Agent (OPA) as the Policy Decision Point (PDP) and Apache APISIX as the Policy Enforcement Point (PEP), with a Spring Boot adapter service bridging OPA to the existing PostgreSQL authorization schema.

Architecture

Request flow:

The browser sends requests to the NextAuth.js BFF, which proxies backend API calls through APISIX.
APISIX validates the JWT signature via Keycloak's JWKS endpoint (OpenID Connect plugin) and encodes the claims into an X-Userinfo header.
APISIX forwards the request context (path, method, headers) to OPA for an authorization decision.
OPA extracts the user identity from the JWT claims, extracts the resource type and ID from the request URI, and maps the HTTP method to an action (GET > read, POST/PUT > write, DELETE > delete). OPA then queries the adapter service for the user's group memberships, role assignments (both system roles via group_roles and scoped roles via assignments), and permissions. PostgreSQL is the single source of truth for all authorization data.
OPA evaluates the Rego policy: it checks assignments at the most specific scope first (DATASET if applicable), then DATASPACE, then TENANT. Permissions from all of the user's groups are combined via union (most permissive).
APISIX enforces the decision: forward to the backend on allow, return 403 on deny.

Route-to-permission mapping:

APISIX sends raw HTTP context (path, method) to OPA. The authorization schema defines its own permission model (e.g., data.read, data.write, data.delete). The Rego policy is responsible for mapping between these two:

Parsing the request path to extract resource type (e.g., /api/dataspaces/123 → DATASPACE) and resource ID
Mapping HTTP methods to actions (GET → read, POST/PUT → write, DELETE → delete)
Translating actions to the permission names used in the database

This mapping logic lives in Rego code. If the API surface or permission model changes, the Rego policy must be updated accordingly.

This approach assumes that (backend routes+HTTP verb) tuples map cleanly to authorization permissions -- a GET on a resource means "read", a DELETE means "delete", etc. This holds for straightforward CRUD APIs but may not generalize to all backends. APIs with complex operations (e.g., a POST that triggers a workflow spanning multiple resources, or a single endpoint with action parameters in the body) would require more sophisticated mapping logic or a different authorization pattern (e.g., explicit permission checks in application code, which would conflict with AR-19).

Technology Selection: Why OPA

Selection criteria

The PDP must satisfy:

AR-12 (established standards): Industry-standard, proven technology -- no custom authorization logic
AR-19 (centralized authorization): Single service decides all authorization; no policy code in backends
APISIX integration: Native plugin or straightforward HTTP integration
PostgreSQL compatibility: Query existing schema without requiring data migration or dual-store sync
Production track record: Proven at scale in comparable environments

Why Open Policy Agent

OPA is a CNCF graduated project (the highest maturity level), used in production by Netflix, Pinterest, Cloudflare, and Goldman Sachs. It satisfies all selection criteria:

AR-12: Industry standard with active security review process
AR-19: Standalone service; application code contains zero authorization logic
APISIX: Ships with a native OPA plugin (opa)
PostgreSQL: Rego policies make HTTP calls to the adapter service, which queries PostgreSQL -- no data duplication
Track record: Battle-tested at scale; extensive documentation and community

Additional benefits:

Declarative policies: Rego separates policy from application code; policies are versioned and tested independently
Fail-secure: default allow := false ensures denial on error
Observability: Supports decision logging (requires configuration and a log receiver), Prometheus metrics, distributed tracing
Flexibility: Can evolve to ABAC, time-based access, or additional enforcement points without architectural changes

Alternatives considered

Alternative	Strengths	Why discarded
Keycloak Authorization Services	Already using Keycloak for authentication; built-in UI for policy management; supports resources, scopes, and policies	Data model mismatch: Keycloak's authorization model is resource/scope-based, not relationship-based. The existing PostgreSQL schema implements ReBAC with multi-hop relationships (User → Group → Assignment → Role → Permission) and a scope hierarchy (TENANT/DATASPACE/DATASET) that doesn't map cleanly to Keycloak's concepts. Would require either abandoning the existing schema or complex bidirectional sync. Keycloak becomes a single point of failure for both authn and authz.
Cerbos	YAML policies (easier than Rego), native PostgreSQL adapter	No APISIX plugin (requires custom integration), smaller community, less proven at scale. Custom gateway work negates YAML simplicity.
OpenFGA	Purpose-built for ReBAC, graph-based queries	Requires syncing all authorization data from PostgreSQL into OpenFGA's store, introducing eventual consistency risks and dual-store operational burden. No APISIX plugin.
Ory Keto	ReBAC-focused, part of Ory ecosystem	Same dual-store sync issues as OpenFGA. No APISIX plugin. Smaller community than OPA.
Casbin	Supports multiple models (ACL, RBAC, ABAC), PostgreSQL adapter, lightweight	Library, not a service -- must be embedded in each backend, violating AR-19 (centralized authorization). No APISIX plugin.
Custom Spring Boot service	Direct PostgreSQL access, familiar technology	Violates AR-12 (not an established standard). Requires custom APISIX plugin. Puts custom security code on the critical path with no built-in policy testing or audit logging.

Consequences

Affected components

APISIX: All routes must include the OPA plugin configuration. New services added to the platform get authorization enforcement by adding the plugin to their routes.
NextAuth.js BFF: Proxies backend API calls through APISIX. NextAuth's own /api/auth/* routes (sign-in, callback, session, sign-out) are handled locally and communicate directly with Keycloak.
Backend services (CIVITAS Portal, Protected Backend, future services): Must NOT implement authorization logic. Services rely on APISIX having enforced authorization before the request arrives. Services only configure Spring Security for JWT authentication (defense-in-depth).
Adapter Service: New authorization data requirements (e.g., new scope types, new entity relationships) require adding REST endpoints here and corresponding Rego queries in OPA.
OPA policies: The Rego policy files are the single source of truth for all authorization rules. Changes to access control semantics (e.g., adding DENY permissions, enabling group hierarchy inheritance) are isolated to policy.
Operations: OPA and the adapter service are infrastructure components that must be monitored and kept available. OPA supports decision logging for audit trails (requires configuration).

Positive effects

Single place to audit all authorization rules (Rego policies)
Adding new services to the platform requires only APISIX route configuration, not authorization code
Policy changes can be deployed independently of application releases
OPA's test framework provides regression safety for authorization logic
OPA supports decision logging for audit trails (requires configuration and a log receiver)
Production-proven technology reduces risk of security vulnerabilities

Negative effects

Authorization checks add network latency (APISIX > OPA > Adapter > PostgreSQL); mitigated by co-locating services and caching (Caffeine in the adapter service, materialized views in PostgreSQL; Redis or OPA bundles for more advanced scenarios)
Team must learn Rego; mitigated by documentation and an existing policy codebase as a reference
Adapter service is an additional component to maintain; mitigated by its narrow scope (read-only queries against the authorization schema)

Risks and mitigations

Risk	Impact	Likelihood	Mitigation
Rego policy bugs cause incorrect authorization decisions	High	Medium	OPA test framework for policy unit tests; peer review; staged rollouts
Adapter service becomes a performance bottleneck	Medium	Medium	Caffeine caching in adapter; materialized views in PostgreSQL; monitor latency; scale horizontally
OPA policy complexity grows unmanageable	Medium	Low	Modular policy structure; documentation; policy linting
Network failures between components	High	Low	Retry logic; health checks; circuit breakers; fail-secure default
Cache invalidation issues cause stale decisions	High	Medium	Event-driven invalidation; short TTLs; monitoring

Context​

Checked Architecture Principles​

Decision​

Architecture​

Technology Selection: Why OPA​

Selection criteria​

Why Open Policy Agent​

Alternatives considered​

Consequences​

Affected components​

Positive effects​

Negative effects​

Risks and mitigations​

See also​