ADR 022: Backend JSON-LD API for Datasets
Date: 2025-12-02
Status: Reviewed
Decision Makers: @lukas.wydra @jonasfrz
Context
The backend requires a standardized approach for serializing and exposing API resources related to datasets and their associated classes. To ensure semantic interoperability and align with Linked Data principles, we need a serialization format that supports rich metadata and extensibility. Additionally, API consumers often require nested resources in a single request to reduce round-trips and improve performance.
Checked Architecture Principles
- [full] Model-centric data flow – JSON-LD enables semantic, model-driven data representation with context and linked vocabularies
- [full] Distributed architecture with unified user experience – JSON-LD supports distributed data integration through standardized contexts
- [full] Modular design – Serialization and persistence layers are separated; libraries integrate cleanly with Spring Data JPA
- [full] Integration capability through defined interfaces – JSON-LD is a W3C standard enabling cross-platform semantic interoperability
- [full] Open source as the default – Apache Jena and Spring Data JPA Entity Graph are fully open source
- [full] Cloud-native architecture – Lightweight libraries compatible with containerized and scalable deployments
- [full] Prefer standard solutions over custom development – Leverages W3C JSON-LD standard and established JPA patterns. Apache Jena has a broad community behind it.
- [full] Self-contained deployment – Libraries can be bundled and deployed without external dependencies
- [full] Technological consistency to ensure maintainability – Integrates with existing Spring Boot and JPA stack, minor work for comfortable usage of Apache Jena e.g. by implementing annotations to provide metadata in the POJOs themselves is left to implement
- [none] Multi-tenancy – Not directly relevant to serialization format choice
- [partial] Security by design – JSON-LD serialization itself is neutral; security depends on implementation of access controls and data filtering
Decision
Serialization
Adopt JSON-LD 1.1 as the API serialization format for all endpoints.
DCAT compliant API
Use Apache Jena as the serialization library for the DCAT API:
- Achieves near-complete conformance with the JSON-LD 1.1 specification through usage of Titanium JSON-LD
- Actively maintained with regular updates and community support
- Provides robust Java API for JSON-LD processing and transformation
To provide an ergonomic serialization API, Apache Jena will be integrated with a custom Jackson ObjectMapper:
- DTOs are annotated with custom annotations that provide JSON-LD metadata (e.g.,
@context,@type,@id) - The custom ObjectMapper translates annotated DTOs into JSON-LD format using Apache Jenas processing capabilities
- A PoC on how to serialize DCAT-compatible POJOs using Apache Jena can be found here
B4F-API
- For B4F-endpoints, each node should contain a mandatory node identifier (
@id) - Usage of Apache Jena and
@context- and@type-fields is optional for the B4F-API
API interface
JSON-LD only concerns itself with the structure of the payload of a given endpoint, not about the endpoints themselves. Thus, it does not limit us on designing this interface. For now, this initial set of conventions is proposed:
- All GET endpoints in the scope of this ADR shall support an
includequery parameter to specify nested resources, leveraging JSON-LD's included nodes feature for efficient data aggregation - For all endpoints that are expected to return
Collections, pagination shall be implemented by using Spring Datas support forPageable, leading to uniform support for the query parameterspage,sizeandsort.
Persistence
Implement performant resolution of nested resources using Spring Data JPA Entity Graphs:
- Dynamic entity graphs eliminate N+1 query problems when fetching related entities
- Entity graph structure can be dynamically constructed by parsing the
includequery parameter - Provides declarative control over fetch strategies without polluting domain model with fetch annotations
Consequences
- API responses provide semantic context and support Linked Data consumption patterns
- Clients can optimize data retrieval by specifying required nested resources in a single request
- The
includeparameter reduces API round-trips and improves performance for complex object graphs - Developers must understand JSON-LD concepts and entity graph configuration to implement endpoints effectively
- Dynamic entity graphs require careful tuning to avoid over-fetching data
Alternatives
Serialization:
- Plain JSON with custom nesting: Simpler but lacks semantic interoperability and standardization
- GraphQL: Offers flexible querying but requires separate query language and additional infrastructure
- HAL or JSON:API: Provide hypermedia support but lack the semantic richness or ecosystem of JSON-LD
JSON-LD Libraries:
- RDF4J: Active project supporting a broader variety of RDF serialization formats (Turtle, N-Triples, RDF/XML, etc.). However, it's more heavyweight and focuses on direct management of RDF triple stores, with no native PostgreSQL support, making it less suitable for our JPA-based architecture
- Jackson JSON-LD: Less feature-complete, lower conformance to JSON-LD 1.1 specification
- jb4jsonld-jackson, Pinto: Small projects with limited maintenance activity and no JSON-LD 1.1 support
- Custom implementation: High development effort, difficult to maintain standard compliance
Persistence:
- Manual entity fetching with JOIN FETCH queries: Requires verbose, repetitive JPQL for each use case
- Static entity graphs via
@NamedEntityGraph: Less flexible, requires predefined fetch strategies - DTO projections: Type-safe but requires mapping boilerplate and multiple projection classes per entity