Version: 2.0-beta

ADR 022: Backend JSON-LD API for Datasets

Date: 2025-12-02

Status: Reviewed

Decision Makers: @lukas.wydra @jonasfrz

Context

The backend requires a standardized approach for serializing and exposing API resources related to datasets and their associated classes. To ensure semantic interoperability and align with Linked Data principles, we need a serialization format that supports rich metadata and extensibility. Additionally, API consumers often require nested resources in a single request to reduce round-trips and improve performance.

Checked Architecture Principles

[full] Model-centric data flow – JSON-LD enables semantic, model-driven data representation with context and linked vocabularies
[full] Distributed architecture with unified user experience – JSON-LD supports distributed data integration through standardized contexts
[full] Modular design – Serialization and persistence layers are separated; libraries integrate cleanly with Spring Data JPA
[full] Integration capability through defined interfaces – JSON-LD is a W3C standard enabling cross-platform semantic interoperability
[full] Open source as the default – Apache Jena and Spring Data JPA Entity Graph are fully open source
[full] Cloud-native architecture – Lightweight libraries compatible with containerized and scalable deployments
[full] Prefer standard solutions over custom development – Leverages W3C JSON-LD standard and established JPA patterns. Apache Jena has a broad community behind it.
[full] Self-contained deployment – Libraries can be bundled and deployed without external dependencies
[full] Technological consistency to ensure maintainability – Integrates with existing Spring Boot and JPA stack, minor work for comfortable usage of Apache Jena e.g. by implementing annotations to provide metadata in the POJOs themselves is left to implement
[none] Multi-tenancy – Not directly relevant to serialization format choice
[partial] Security by design – JSON-LD serialization itself is neutral; security depends on implementation of access controls and data filtering

Decision

Serialization

Adopt JSON-LD 1.1 as the API serialization format for all endpoints.

DCAT compliant API

Use Apache Jena as the serialization library for the DCAT API:

Achieves near-complete conformance with the JSON-LD 1.1 specification through usage of Titanium JSON-LD
Actively maintained with regular updates and community support
Provides robust Java API for JSON-LD processing and transformation

To provide an ergonomic serialization API, Apache Jena will be integrated with a custom Jackson ObjectMapper:

DTOs are annotated with custom annotations that provide JSON-LD metadata (e.g., @context, @type, @id)
The custom ObjectMapper translates annotated DTOs into JSON-LD format using Apache Jenas processing capabilities
A PoC on how to serialize DCAT-compatible POJOs using Apache Jena can be found here

B4F-API

For B4F-endpoints, each node should contain a mandatory node identifier (@id)
Usage of Apache Jena and @context- and @type-fields is optional for the B4F-API

API interface

JSON-LD only concerns itself with the structure of the payload of a given endpoint, not about the endpoints themselves. Thus, it does not limit us on designing this interface. For now, this initial set of conventions is proposed:

All GET endpoints in the scope of this ADR shall support an include query parameter to specify nested resources, leveraging JSON-LD's included nodes feature for efficient data aggregation
For all endpoints that are expected to return Collections, pagination shall be implemented by using Spring Datas support for Pageable, leading to uniform support for the query parameters page, size and sort.

Persistence

Implement performant resolution of nested resources using Spring Data JPA Entity Graphs:

Dynamic entity graphs eliminate N+1 query problems when fetching related entities
Entity graph structure can be dynamically constructed by parsing the include query parameter
Provides declarative control over fetch strategies without polluting domain model with fetch annotations

Consequences

API responses provide semantic context and support Linked Data consumption patterns
Clients can optimize data retrieval by specifying required nested resources in a single request
The include parameter reduces API round-trips and improves performance for complex object graphs
Developers must understand JSON-LD concepts and entity graph configuration to implement endpoints effectively
Dynamic entity graphs require careful tuning to avoid over-fetching data

Alternatives

Serialization:

Plain JSON with custom nesting: Simpler but lacks semantic interoperability and standardization
GraphQL: Offers flexible querying but requires separate query language and additional infrastructure
HAL or JSON:API: Provide hypermedia support but lack the semantic richness or ecosystem of JSON-LD

JSON-LD Libraries:

RDF4J: Active project supporting a broader variety of RDF serialization formats (Turtle, N-Triples, RDF/XML, etc.). However, it's more heavyweight and focuses on direct management of RDF triple stores, with no native PostgreSQL support, making it less suitable for our JPA-based architecture
Jackson JSON-LD: Less feature-complete, lower conformance to JSON-LD 1.1 specification
jb4jsonld-jackson, Pinto: Small projects with limited maintenance activity and no JSON-LD 1.1 support
Custom implementation: High development effort, difficult to maintain standard compliance

Persistence:

Manual entity fetching with JOIN FETCH queries: Requires verbose, repetitive JPQL for each use case
Static entity graphs via @NamedEntityGraph: Less flexible, requires predefined fetch strategies
DTO projections: Type-safe but requires mapping boilerplate and multiple projection classes per entity

Context​

Checked Architecture Principles​

Decision​

Serialization​

DCAT compliant API​

B4F-API​

API interface​

Persistence​

Consequences​

Alternatives​

See also​