ADR 023: OPA backed by AuthZ database adapter as PDP
Date: 2026-01-28
Status: Accepted
Decision Makers: @cr0ssing
Context
CIVITAS/CORE 2.0 is an urban data platform that manages DataSpaces, DataSets, and associated metadata across multiple platform services. The platform requires an authorization architecture that:
- Complies with BSI TR-03187 (AR-12: established security standards; AR-19: centralized authorization for all platform components)
- Enforces a Relationship-Based Access Control (ReBAC) model: Users belong to Groups, Groups are assigned Roles with Scope (Tenant/DataSpace/DataSet), Roles carry Permissions
- Integrates with an existing PostgreSQL authorization schema and Keycloak for authentication
- Works with Apache APISIX as the already-selected API gateway
- Supports multiple frontends (management portal, data visualization) and backend services
- Provides fail-secure behavior and audit logging
The authorization data model is already defined in PostgreSQL and follows a multi-hop relationship chain: User > Groups > Assignments (scoped to Tenant, DataSpace, or DataSet) > Roles > Permissions. Binary assignments (system roles assigned to groups via group_roles) and ternary assignments (data/governance roles assigned to groups with a scope via assignments) coexist. Keycloak issues JWT tokens that identify the user; all authorization data -- group memberships, role assignments, and permissions -- lives in PostgreSQL and is queried at request time. The architecture treats PostgreSQL as the single source of truth.
Frontend applications are built with Next.js and use NextAuth.js as a Backend-for-Frontend (BFF). NextAuth handles OAuth2 session management (token acquisition, callbacks, refresh) via direct communication with Keycloak, and proxies backend API calls through APISIX where authorization is enforced.
Checked Architecture Principles
| Rating | Principle |
|---|---|
| full | Model-centric data flow |
| full | Distributed architecture with unified user experience |
| full | Modular design |
| full | Integration capability through defined interfaces |
| full | Open source as the default |
| full | Cloud-native architecture |
| full | Prefer standard solutions over custom development |
| full | Self-contained deployment |
| full | Technological consistency to ensure maintainability |
| partial | Multi-tenancy |
| full | Security by design |
Multi-tenancy (partial): The authorization schema supports a TENANT scope level in the hierarchy (TENANT > DATASPACE > DATASET), and Rego policies implement tenant-scoped authorization checks. However, full tenant isolation (separate Keycloak realms, tenant-scoped database rows, tenant routing) is not yet implemented. The architecture does not preclude multi-tenancy and the scope hierarchy was designed with it in mind.
Decision
We adopt a centralized, gateway-enforced authorization architecture using Open Policy Agent (OPA) as the Policy Decision Point (PDP) and Apache APISIX as the Policy Enforcement Point (PEP), with a Spring Boot adapter service bridging OPA to the existing PostgreSQL authorization schema.
Architecture
Request flow:
- The browser sends requests to the NextAuth.js BFF, which proxies backend API calls through APISIX.
- APISIX validates the JWT signature via Keycloak's JWKS endpoint (OpenID Connect plugin) and encodes the claims into an
X-Userinfoheader. - APISIX forwards the request context (path, method, headers) to OPA for an authorization decision.
- OPA extracts the user identity from the JWT claims, extracts the resource type and ID from the request URI, and maps the HTTP method to an action (GET > read, POST/PUT > write, DELETE > delete). OPA then queries the adapter service for the user's group memberships, role assignments (both system roles via
group_rolesand scoped roles viaassignments), and permissions. PostgreSQL is the single source of truth for all authorization data. - OPA evaluates the Rego policy: it checks assignments at the most specific scope first (DATASET if applicable), then DATASPACE, then TENANT. Permissions from all of the user's groups are combined via union (most permissive).
- APISIX enforces the decision: forward to the backend on allow, return 403 on deny.
Route-to-permission mapping:
APISIX sends raw HTTP context (path, method) to OPA. The authorization schema defines its own permission model (e.g., data.read, data.write, data.delete). The Rego policy is responsible for mapping between these two:
- Parsing the request path to extract resource type (e.g.,
/api/dataspaces/123→ DATASPACE) and resource ID - Mapping HTTP methods to actions (GET → read, POST/PUT → write, DELETE → delete)
- Translating actions to the permission names used in the database
This mapping logic lives in Rego code. If the API surface or permission model changes, the Rego policy must be updated accordingly.
This approach assumes that (backend routes+HTTP verb) tuples map cleanly to authorization permissions -- a GET on a resource means "read", a DELETE means "delete", etc. This holds for straightforward CRUD APIs but may not generalize to all backends. APIs with complex operations (e.g., a POST that triggers a workflow spanning multiple resources, or a single endpoint with action parameters in the body) would require more sophisticated mapping logic or a different authorization pattern (e.g., explicit permission checks in application code, which would conflict with AR-19).
Technology Selection: Why OPA
Selection criteria
The PDP must satisfy:
- AR-12 (established standards): Industry-standard, proven technology -- no custom authorization logic
- AR-19 (centralized authorization): Single service decides all authorization; no policy code in backends
- APISIX integration: Native plugin or straightforward HTTP integration
- PostgreSQL compatibility: Query existing schema without requiring data migration or dual-store sync
- Production track record: Proven at scale in comparable environments
Why Open Policy Agent
OPA is a CNCF graduated project (the highest maturity level), used in production by Netflix, Pinterest, Cloudflare, and Goldman Sachs. It satisfies all selection criteria:
- AR-12: Industry standard with active security review process
- AR-19: Standalone service; application code contains zero authorization logic
- APISIX: Ships with a native OPA plugin (
opa) - PostgreSQL: Rego policies make HTTP calls to the adapter service, which queries PostgreSQL -- no data duplication
- Track record: Battle-tested at scale; extensive documentation and community
Additional benefits:
- Declarative policies: Rego separates policy from application code; policies are versioned and tested independently
- Fail-secure:
default allow := falseensures denial on error - Observability: Supports decision logging (requires configuration and a log receiver), Prometheus metrics, distributed tracing
- Flexibility: Can evolve to ABAC, time-based access, or additional enforcement points without architectural changes
Alternatives considered
| Alternative | Strengths | Why discarded |
|---|---|---|
| Keycloak Authorization Services | Already using Keycloak for authentication; built-in UI for policy management; supports resources, scopes, and policies | Data model mismatch: Keycloak's authorization model is resource/scope-based, not relationship-based. The existing PostgreSQL schema implements ReBAC with multi-hop relationships (User → Group → Assignment → Role → Permission) and a scope hierarchy (TENANT/DATASPACE/DATASET) that doesn't map cleanly to Keycloak's concepts. Would require either abandoning the existing schema or complex bidirectional sync. Keycloak becomes a single point of failure for both authn and authz. |
| Cerbos | YAML policies (easier than Rego), native PostgreSQL adapter | No APISIX plugin (requires custom integration), smaller community, less proven at scale. Custom gateway work negates YAML simplicity. |
| OpenFGA | Purpose-built for ReBAC, graph-based queries | Requires syncing all authorization data from PostgreSQL into OpenFGA's store, introducing eventual consistency risks and dual-store operational burden. No APISIX plugin. |
| Ory Keto | ReBAC-focused, part of Ory ecosystem | Same dual-store sync issues as OpenFGA. No APISIX plugin. Smaller community than OPA. |
| Casbin | Supports multiple models (ACL, RBAC, ABAC), PostgreSQL adapter, lightweight | Library, not a service -- must be embedded in each backend, violating AR-19 (centralized authorization). No APISIX plugin. |
| Custom Spring Boot service | Direct PostgreSQL access, familiar technology | Violates AR-12 (not an established standard). Requires custom APISIX plugin. Puts custom security code on the critical path with no built-in policy testing or audit logging. |
Consequences
Affected components
- APISIX: All routes must include the OPA plugin configuration. New services added to the platform get authorization enforcement by adding the plugin to their routes.
- NextAuth.js BFF: Proxies backend API calls through APISIX. NextAuth's own
/api/auth/*routes (sign-in, callback, session, sign-out) are handled locally and communicate directly with Keycloak. - Backend services (CIVITAS Portal, Protected Backend, future services): Must NOT implement authorization logic. Services rely on APISIX having enforced authorization before the request arrives. Services only configure Spring Security for JWT authentication (defense-in-depth).
- Adapter Service: New authorization data requirements (e.g., new scope types, new entity relationships) require adding REST endpoints here and corresponding Rego queries in OPA.
- OPA policies: The Rego policy files are the single source of truth for all authorization rules. Changes to access control semantics (e.g., adding DENY permissions, enabling group hierarchy inheritance) are isolated to policy.
- Operations: OPA and the adapter service are infrastructure components that must be monitored and kept available. OPA supports decision logging for audit trails (requires configuration).
Positive effects
- Single place to audit all authorization rules (Rego policies)
- Adding new services to the platform requires only APISIX route configuration, not authorization code
- Policy changes can be deployed independently of application releases
- OPA's test framework provides regression safety for authorization logic
- OPA supports decision logging for audit trails (requires configuration and a log receiver)
- Production-proven technology reduces risk of security vulnerabilities
Negative effects
- Authorization checks add network latency (APISIX > OPA > Adapter > PostgreSQL); mitigated by co-locating services and caching (Caffeine in the adapter service, materialized views in PostgreSQL; Redis or OPA bundles for more advanced scenarios)
- Team must learn Rego; mitigated by documentation and an existing policy codebase as a reference
- Adapter service is an additional component to maintain; mitigated by its narrow scope (read-only queries against the authorization schema)
Risks and mitigations
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Rego policy bugs cause incorrect authorization decisions | High | Medium | OPA test framework for policy unit tests; peer review; staged rollouts |
| Adapter service becomes a performance bottleneck | Medium | Medium | Caffeine caching in adapter; materialized views in PostgreSQL; monitor latency; scale horizontally |
| OPA policy complexity grows unmanageable | Medium | Low | Modular policy structure; documentation; policy linting |
| Network failures between components | High | Low | Retry logic; health checks; circuit breakers; fail-secure default |
| Cache invalidation issues cause stale decisions | High | Medium | Event-driven invalidation; short TTLs; monitoring |
See also
- BSI TR-03187 Technical Guideline
- Open Policy Agent
- APISIX OPA Plugin
- CIVITAS/CORE Architecture Documentation
- Ticket #428: Authorization PoC