Skip to main content

ADR 023: OPA backed by AuthZ database adapter as PDP

Date: 2026-01-28

Status: Accepted

Decision Makers: @cr0ssing

Context

CIVITAS/CORE 2.0 is an urban data platform that manages DataSpaces, DataSets, and associated metadata across multiple platform services. The platform requires an authorization architecture that:

  • Complies with BSI TR-03187 (AR-12: established security standards; AR-19: centralized authorization for all platform components)
  • Enforces a Relationship-Based Access Control (ReBAC) model: Users belong to Groups, Groups are assigned Roles with Scope (Tenant/DataSpace/DataSet), Roles carry Permissions
  • Integrates with an existing PostgreSQL authorization schema and Keycloak for authentication
  • Works with Apache APISIX as the already-selected API gateway
  • Supports multiple frontends (management portal, data visualization) and backend services
  • Provides fail-secure behavior and audit logging

The authorization data model is already defined in PostgreSQL and follows a multi-hop relationship chain: User > Groups > Assignments (scoped to Tenant, DataSpace, or DataSet) > Roles > Permissions. Binary assignments (system roles assigned to groups via group_roles) and ternary assignments (data/governance roles assigned to groups with a scope via assignments) coexist. Keycloak issues JWT tokens that identify the user; all authorization data -- group memberships, role assignments, and permissions -- lives in PostgreSQL and is queried at request time. The architecture treats PostgreSQL as the single source of truth.

Frontend applications are built with Next.js and use NextAuth.js as a Backend-for-Frontend (BFF). NextAuth handles OAuth2 session management (token acquisition, callbacks, refresh) via direct communication with Keycloak, and proxies backend API calls through APISIX where authorization is enforced.

Checked Architecture Principles

RatingPrinciple
fullModel-centric data flow
fullDistributed architecture with unified user experience
fullModular design
fullIntegration capability through defined interfaces
fullOpen source as the default
fullCloud-native architecture
fullPrefer standard solutions over custom development
fullSelf-contained deployment
fullTechnological consistency to ensure maintainability
partialMulti-tenancy
fullSecurity by design

Multi-tenancy (partial): The authorization schema supports a TENANT scope level in the hierarchy (TENANT > DATASPACE > DATASET), and Rego policies implement tenant-scoped authorization checks. However, full tenant isolation (separate Keycloak realms, tenant-scoped database rows, tenant routing) is not yet implemented. The architecture does not preclude multi-tenancy and the scope hierarchy was designed with it in mind.

Decision

We adopt a centralized, gateway-enforced authorization architecture using Open Policy Agent (OPA) as the Policy Decision Point (PDP) and Apache APISIX as the Policy Enforcement Point (PEP), with a Spring Boot adapter service bridging OPA to the existing PostgreSQL authorization schema.

Architecture

adr-002-architecture.svg

Request flow:

  1. The browser sends requests to the NextAuth.js BFF, which proxies backend API calls through APISIX.
  2. APISIX validates the JWT signature via Keycloak's JWKS endpoint (OpenID Connect plugin) and encodes the claims into an X-Userinfo header.
  3. APISIX forwards the request context (path, method, headers) to OPA for an authorization decision.
  4. OPA extracts the user identity from the JWT claims, extracts the resource type and ID from the request URI, and maps the HTTP method to an action (GET > read, POST/PUT > write, DELETE > delete). OPA then queries the adapter service for the user's group memberships, role assignments (both system roles via group_roles and scoped roles via assignments), and permissions. PostgreSQL is the single source of truth for all authorization data.
  5. OPA evaluates the Rego policy: it checks assignments at the most specific scope first (DATASET if applicable), then DATASPACE, then TENANT. Permissions from all of the user's groups are combined via union (most permissive).
  6. APISIX enforces the decision: forward to the backend on allow, return 403 on deny.

Route-to-permission mapping:

APISIX sends raw HTTP context (path, method) to OPA. The authorization schema defines its own permission model (e.g., data.read, data.write, data.delete). The Rego policy is responsible for mapping between these two:

  • Parsing the request path to extract resource type (e.g., /api/dataspaces/123 → DATASPACE) and resource ID
  • Mapping HTTP methods to actions (GET → read, POST/PUT → write, DELETE → delete)
  • Translating actions to the permission names used in the database

This mapping logic lives in Rego code. If the API surface or permission model changes, the Rego policy must be updated accordingly.

This approach assumes that (backend routes+HTTP verb) tuples map cleanly to authorization permissions -- a GET on a resource means "read", a DELETE means "delete", etc. This holds for straightforward CRUD APIs but may not generalize to all backends. APIs with complex operations (e.g., a POST that triggers a workflow spanning multiple resources, or a single endpoint with action parameters in the body) would require more sophisticated mapping logic or a different authorization pattern (e.g., explicit permission checks in application code, which would conflict with AR-19).

Technology Selection: Why OPA

Selection criteria

The PDP must satisfy:

  1. AR-12 (established standards): Industry-standard, proven technology -- no custom authorization logic
  2. AR-19 (centralized authorization): Single service decides all authorization; no policy code in backends
  3. APISIX integration: Native plugin or straightforward HTTP integration
  4. PostgreSQL compatibility: Query existing schema without requiring data migration or dual-store sync
  5. Production track record: Proven at scale in comparable environments

Why Open Policy Agent

OPA is a CNCF graduated project (the highest maturity level), used in production by Netflix, Pinterest, Cloudflare, and Goldman Sachs. It satisfies all selection criteria:

  • AR-12: Industry standard with active security review process
  • AR-19: Standalone service; application code contains zero authorization logic
  • APISIX: Ships with a native OPA plugin (opa)
  • PostgreSQL: Rego policies make HTTP calls to the adapter service, which queries PostgreSQL -- no data duplication
  • Track record: Battle-tested at scale; extensive documentation and community

Additional benefits:

  • Declarative policies: Rego separates policy from application code; policies are versioned and tested independently
  • Fail-secure: default allow := false ensures denial on error
  • Observability: Supports decision logging (requires configuration and a log receiver), Prometheus metrics, distributed tracing
  • Flexibility: Can evolve to ABAC, time-based access, or additional enforcement points without architectural changes

Alternatives considered

AlternativeStrengthsWhy discarded
Keycloak Authorization ServicesAlready using Keycloak for authentication; built-in UI for policy management; supports resources, scopes, and policiesData model mismatch: Keycloak's authorization model is resource/scope-based, not relationship-based. The existing PostgreSQL schema implements ReBAC with multi-hop relationships (User → Group → Assignment → Role → Permission) and a scope hierarchy (TENANT/DATASPACE/DATASET) that doesn't map cleanly to Keycloak's concepts. Would require either abandoning the existing schema or complex bidirectional sync. Keycloak becomes a single point of failure for both authn and authz.
CerbosYAML policies (easier than Rego), native PostgreSQL adapterNo APISIX plugin (requires custom integration), smaller community, less proven at scale. Custom gateway work negates YAML simplicity.
OpenFGAPurpose-built for ReBAC, graph-based queriesRequires syncing all authorization data from PostgreSQL into OpenFGA's store, introducing eventual consistency risks and dual-store operational burden. No APISIX plugin.
Ory KetoReBAC-focused, part of Ory ecosystemSame dual-store sync issues as OpenFGA. No APISIX plugin. Smaller community than OPA.
CasbinSupports multiple models (ACL, RBAC, ABAC), PostgreSQL adapter, lightweightLibrary, not a service -- must be embedded in each backend, violating AR-19 (centralized authorization). No APISIX plugin.
Custom Spring Boot serviceDirect PostgreSQL access, familiar technologyViolates AR-12 (not an established standard). Requires custom APISIX plugin. Puts custom security code on the critical path with no built-in policy testing or audit logging.

Consequences

Affected components

  • APISIX: All routes must include the OPA plugin configuration. New services added to the platform get authorization enforcement by adding the plugin to their routes.
  • NextAuth.js BFF: Proxies backend API calls through APISIX. NextAuth's own /api/auth/* routes (sign-in, callback, session, sign-out) are handled locally and communicate directly with Keycloak.
  • Backend services (CIVITAS Portal, Protected Backend, future services): Must NOT implement authorization logic. Services rely on APISIX having enforced authorization before the request arrives. Services only configure Spring Security for JWT authentication (defense-in-depth).
  • Adapter Service: New authorization data requirements (e.g., new scope types, new entity relationships) require adding REST endpoints here and corresponding Rego queries in OPA.
  • OPA policies: The Rego policy files are the single source of truth for all authorization rules. Changes to access control semantics (e.g., adding DENY permissions, enabling group hierarchy inheritance) are isolated to policy.
  • Operations: OPA and the adapter service are infrastructure components that must be monitored and kept available. OPA supports decision logging for audit trails (requires configuration).

Positive effects

  • Single place to audit all authorization rules (Rego policies)
  • Adding new services to the platform requires only APISIX route configuration, not authorization code
  • Policy changes can be deployed independently of application releases
  • OPA's test framework provides regression safety for authorization logic
  • OPA supports decision logging for audit trails (requires configuration and a log receiver)
  • Production-proven technology reduces risk of security vulnerabilities

Negative effects

  • Authorization checks add network latency (APISIX > OPA > Adapter > PostgreSQL); mitigated by co-locating services and caching (Caffeine in the adapter service, materialized views in PostgreSQL; Redis or OPA bundles for more advanced scenarios)
  • Team must learn Rego; mitigated by documentation and an existing policy codebase as a reference
  • Adapter service is an additional component to maintain; mitigated by its narrow scope (read-only queries against the authorization schema)

Risks and mitigations

RiskImpactLikelihoodMitigation
Rego policy bugs cause incorrect authorization decisionsHighMediumOPA test framework for policy unit tests; peer review; staged rollouts
Adapter service becomes a performance bottleneckMediumMediumCaffeine caching in adapter; materialized views in PostgreSQL; monitor latency; scale horizontally
OPA policy complexity grows unmanageableMediumLowModular policy structure; documentation; policy linting
Network failures between componentsHighLowRetry logic; health checks; circuit breakers; fail-secure default
Cache invalidation issues cause stale decisionsHighMediumEvent-driven invalidation; short TTLs; monitoring

See also