Skip to main content

ADR 022: Backend JSON-LD API for Datasets

Date: 2025-12-02

Status: Reviewed

Decision Makers: @lukas.wydra @jonasfrz

Context

The backend requires a standardized approach for serializing and exposing API resources related to datasets and their associated classes. To ensure semantic interoperability and align with Linked Data principles, we need a serialization format that supports rich metadata and extensibility. Additionally, API consumers often require nested resources in a single request to reduce round-trips and improve performance.

Checked Architecture Principles

  • [full] Model-centric data flow – JSON-LD enables semantic, model-driven data representation with context and linked vocabularies
  • [full] Distributed architecture with unified user experience – JSON-LD supports distributed data integration through standardized contexts
  • [full] Modular design – Serialization and persistence layers are separated; libraries integrate cleanly with Spring Data JPA
  • [full] Integration capability through defined interfaces – JSON-LD is a W3C standard enabling cross-platform semantic interoperability
  • [full] Open source as the default – Apache Jena and Spring Data JPA Entity Graph are fully open source
  • [full] Cloud-native architecture – Lightweight libraries compatible with containerized and scalable deployments
  • [full] Prefer standard solutions over custom development – Leverages W3C JSON-LD standard and established JPA patterns. Apache Jena has a broad community behind it.
  • [full] Self-contained deployment – Libraries can be bundled and deployed without external dependencies
  • [full] Technological consistency to ensure maintainability – Integrates with existing Spring Boot and JPA stack, minor work for comfortable usage of Apache Jena e.g. by implementing annotations to provide metadata in the POJOs themselves is left to implement
  • [none] Multi-tenancy – Not directly relevant to serialization format choice
  • [partial] Security by design – JSON-LD serialization itself is neutral; security depends on implementation of access controls and data filtering

Decision

Serialization

Adopt JSON-LD 1.1 as the API serialization format for all endpoints.

DCAT compliant API

Use Apache Jena as the serialization library for the DCAT API:

To provide an ergonomic serialization API, Apache Jena will be integrated with a custom Jackson ObjectMapper:

  • DTOs are annotated with custom annotations that provide JSON-LD metadata (e.g., @context, @type, @id)
  • The custom ObjectMapper translates annotated DTOs into JSON-LD format using Apache Jenas processing capabilities
  • A PoC on how to serialize DCAT-compatible POJOs using Apache Jena can be found here

B4F-API

  • For B4F-endpoints, each node should contain a mandatory node identifier (@id)
  • Usage of Apache Jena and @context- and @type-fields is optional for the B4F-API

API interface

JSON-LD only concerns itself with the structure of the payload of a given endpoint, not about the endpoints themselves. Thus, it does not limit us on designing this interface. For now, this initial set of conventions is proposed:

  • All GET endpoints in the scope of this ADR shall support an include query parameter to specify nested resources, leveraging JSON-LD's included nodes feature for efficient data aggregation
  • For all endpoints that are expected to return Collections, pagination shall be implemented by using Spring Datas support for Pageable, leading to uniform support for the query parameters page, size and sort.

Persistence

Implement performant resolution of nested resources using Spring Data JPA Entity Graphs:

  • Dynamic entity graphs eliminate N+1 query problems when fetching related entities
  • Entity graph structure can be dynamically constructed by parsing the include query parameter
  • Provides declarative control over fetch strategies without polluting domain model with fetch annotations

Consequences

  • API responses provide semantic context and support Linked Data consumption patterns
  • Clients can optimize data retrieval by specifying required nested resources in a single request
  • The include parameter reduces API round-trips and improves performance for complex object graphs
  • Developers must understand JSON-LD concepts and entity graph configuration to implement endpoints effectively
  • Dynamic entity graphs require careful tuning to avoid over-fetching data

Alternatives

Serialization:

  • Plain JSON with custom nesting: Simpler but lacks semantic interoperability and standardization
  • GraphQL: Offers flexible querying but requires separate query language and additional infrastructure
  • HAL or JSON:API: Provide hypermedia support but lack the semantic richness or ecosystem of JSON-LD

JSON-LD Libraries:

  • RDF4J: Active project supporting a broader variety of RDF serialization formats (Turtle, N-Triples, RDF/XML, etc.). However, it's more heavyweight and focuses on direct management of RDF triple stores, with no native PostgreSQL support, making it less suitable for our JPA-based architecture
  • Jackson JSON-LD: Less feature-complete, lower conformance to JSON-LD 1.1 specification
  • jb4jsonld-jackson, Pinto: Small projects with limited maintenance activity and no JSON-LD 1.1 support
  • Custom implementation: High development effort, difficult to maintain standard compliance

Persistence:

  • Manual entity fetching with JOIN FETCH queries: Requires verbose, repetitive JPQL for each use case
  • Static entity graphs via @NamedEntityGraph: Less flexible, requires predefined fetch strategies
  • DTO projections: Type-safe but requires mapping boilerplate and multiple projection classes per entity

See also