Version: V2-Next

PostgreSQL Artifact Registry

Overview

Model Management persists all model artifacts in a PostgreSQL-backed artifact registry that it owns. The schema is created and migrated automatically by Flyway on startup; the application requires a datasource to run.

The registry stores:

the logical identity of each artifact (format-independent)
immutable artifact versions, each with an authored primary format
per-version, per-format representations — JSON Schema, raw XSD, core-json
the XSD namespace lookup
the reference graph between artifacts

API DTOs are projections built at the boundary from the stored artifact, not ORM entities. This avoids duplicated truth: MappingDto.source is derived from config.source, PipelineDto.mappingRefs from the pipeline nodes, etc. — the artifact document remains canonical and the projections cannot drift from it.

Design principles

CORE URNs are the stable public identity of every artifact — the format is never part of the identity. The same DataStructure may be authored as JSON Schema in one version and as XSD in another.
Artifact payloads are canonical JSON documents, stored as jsonb.
XSD payloads are stored as raw text; a JSON Schema representation is generated from them on read (and cached).
Each version records its authored primary format; the format-specific content lives in artifact_representation (one row per format).
Versioning is explicit, backend-owned and modelled in the database.
References are stored relationally for fast dependency/impact queries, and the complete graph is kept — cycles included.
Persistence access uses a thin JdbcClient repository layer; large dynamic JSON trees are stored as jsonb/text rather than decomposed into entity fields (no JPA/Hibernate).

Tables

The schema lives in db/migration/V1__artifact_registry.sql (Flyway).

`artifact` — logical identity

create table artifact (
  id              uuid primary key,
  logical_urn     text not null unique,
  artifact_type   text not null,   -- datastructure | dataset | mapping | pipeline | datasource | datasink
  name            text not null,
  title           text,
  description     text,
  current_version text,            -- the default read version
  created_at      timestamptz not null,
  updated_at      timestamptz not null
);

artifact_type is the functional CORE category. Identity is format-agnostic, so the format is not stored here. Both JSON Schema and XSD DataStructures have artifact_type = 'datastructure'; the format is recorded per version (artifact_version.primary_format) and per representation (below). The logical_urn always uses the datastructure segment; a :xsd: artifact-type URN is rejected with HTTP 400.

Indexes: artifact_type, name, updated_at.

`artifact_version` — immutable version metadata

create table artifact_version (
  id             uuid primary key,
  artifact_id    uuid not null references artifact(id) on delete cascade,
  version        text not null,
  primary_format text not null,   -- jsonschema | core-json | xsd
  title          text,            -- versioned metadata (set on rename)
  description    text,
  created_at     timestamptz not null,
  created_by     text,
  unique (artifact_id, version)
);

A version records its authored (primary) format and points at its representations; the content lives in artifact_representation.
primary_format drives DataStructureDto.format and the default of GET /schema (without a format parameter).
title/description are versioned metadata set by a rename: a rename creates a new patch version with the new title, format-independently (so XSD versions are renamable too); older versions keep their own title. DataStructureDto.title is the version's title when set, else the URN name segment.
artifact.current_version points at the latest version.

`artifact_representation` — per-version, per-format content

create table artifact_representation (
  id            uuid primary key,
  version_id    uuid not null references artifact_version(id) on delete cascade,
  format        text not null,   -- jsonschema | core-json | xsd
  content_type  text,
  content_jsonb jsonb,
  content_text  text,
  content_hash  text,
  generation    text not null default 'stored',  -- stored | generated
  created_at    timestamptz not null,
  unique (version_id, format)
);

One row per format of a version. JSON content goes into content_jsonb; raw XSD goes into content_text.
Each version has exactly one authored representation (generation = 'stored'), in its primary_format. A JSON Schema derived from an XSD may later be persisted with generation = 'generated'; a generated representation never counts as authored content.
content_hash makes repeated imports of identical authored content idempotent (no spurious new version).
unique (version_id, format) permits at most one representation per format per version; a different format in a new version is always allowed (v1 XSD, v2 JSON-only is valid).
A GIN index on content_jsonb supports payload-level search.

`artifact_reference` — dependency edges

create table artifact_reference (
  id                 uuid primary key,
  from_version_id    uuid not null references artifact_version(id) on delete cascade,
  target_urn         text not null,
  target_artifact_id uuid references artifact(id) on delete set null,
  target_version_id  uuid references artifact_version(id) on delete set null,
  reference_type     text not null,  -- schema-ref | dataset-ref | pipeline-node | xsd-import | mapping-source | mapping-target | datasource-ref | datasink-ref
  reference_name     text,
  sort_order         int,
  created_at         timestamptz not null
);

The complete reference graph is stored, including cycles.
target_urn is stored verbatim as authored — a pinned …:1.0.0 or the …:latest token — never normalised to the logical form.
target_artifact_id resolves the reference to the target artifact's logical identity; target_version_id resolves a pinned reference to its concrete version (null for latest, logical, or a not-yet-imported pin). Both are nullable / on delete set null, and are back-filled when a previously-missing target artifact or target version is later created.

`xsd_namespace` — namespace lookup

create table xsd_namespace (
  namespace   text primary key,
  artifact_id uuid not null references artifact(id) on delete cascade,
  version_id  uuid references artifact_version(id) on delete set null,
  updated_at  timestamptz not null
);

A durable namespace→artifact index for XSD-backed DataStructures, so xs:import resolution does not depend on a startup scan completing.

DTO mapping

DTOs are built from artifact plus the selected artifact_version.

DTO	Stored	Derived on read
`DataStructureDto`	the version's representation (`content_jsonb`/`content_text`, `content_type`), `artifact_version.primary_format` and `title`	`id` (versioned URN), `logicalId`, `version`, `title` (version title metadata when set, else the URN name segment), `format` (= primary format), `availableFormats` (stored + derivable — an XSD version also lists `jsonschema`)
`DataSetDto`	manifest in `content_jsonb` (id/title/version/`*Refs`)	`dataset-ref` rows are a derived index of the manifest's `*Refs`
`MappingDto`	`config` = full document	`source` = `config.source`, `target` = `config.target`
`PipelineDto`	`config` = full document	`dataSourceRefs`/`dataSinkRefs`/`mappingRefs` from `config.nodes[]` by kind
`DataSourceDto` / `DataSinkDto`	`config` = full document	`connectionType` = `config.connectionType`
`Envelope<T>`	—	wraps a DTO with dependencies/dependents (graph), versions, and a `compatibility` marker (always `"NONE"`)

Reference extraction

Edges are derived from the artifact content when it is written and persisted as artifact_reference rows:

`reference_type`	Source
`schema-ref`	`$ref` values (CORE URNs) in a JSON Schema
`xsd-import`	`xs:import/@namespace`: a CORE-URN namespace links that DataStructure directly (any format); a classic XML namespace is resolved via `xsd_namespace`
`mapping-source` / `mapping-target`	`config.source` / `config.target` of a Mapping
`pipeline-node`	source/sink/mapping refs of `config.nodes[]`
`dataset-ref`, `datasource-ref`, `datasink-ref`	the DataSet manifest's `*Refs` arrays

The DataSet manifest stays the canonical document; the relational rows are a derived index, written in the same transaction.

Versioning

Versioning is owned by Model Management — clients never choose the next version directly, they declare how it should be bumped.

The logical URN identifies the artifact independent of version; the versioned URN identifies one immutable version.
The first version of a new artifact is 1.0.0.
An update creates a new artifact_version; the number is computed from current_version and the requested bump:
```
1.2.3 + patch -> 1.2.4
1.2.3 + minor -> 1.3.0
1.2.3 + major -> 2.0.0
```
Older versions are never overwritten; current_version is the default read.
The bump is request metadata, not part of the payload — a query parameter on the update endpoints:
```
PUT /api/v1/datastructures?id=urn:core:…&versionBump=minor
```
A request targeting a versioned URN resolves the logical identity and bumps from current_version; the URN's version segment is ignored as the new number.
A rename (PATCH) is a patch-version bump that records the new title/description as versioned metadata, format-independently — so XSD DataStructures are renamable too (the XSD content is unchanged); older versions keep their title.
A read id (and a reference) may use the :latest token, resolving to the current (highest-SemVer) version. :latest is never a writable identity — a client-supplied :latest (like a legacy :xsd:) identity is rejected with 400.

There is no schema-compatibility calculation: the Envelope.compatibility field is always "NONE" and there are no compatibility endpoints.

Write flow

Every artifact write runs in one transaction:

Normalise the incoming ID to a logical CORE URN.
Validate the payload with the existing validation services.
Reject a :xsd: artifact-type URN with HTTP 400 (before any write).
Upsert artifact (insert on first sight; first version 1.0.0).
If the current version's authored representation in the same format has an identical content hash → no-op (idempotent). A different format is always a new version.
Otherwise compute the next SemVer version and insert a new artifact_version with primary_format = the written format.
Insert the version's authored artifact_representation (generation = 'stored').
Replace the version's artifact_reference rows from the extracted references.
Advance artifact.current_version.
For XSDs, upsert the xsd_namespace row (scoped to the version).

A storage failure rolls the whole write back and surfaces as HTTP 502.

Read flow

Versioned URN → load that artifact_version.
Logical URN → load artifact.current_version's version.
Pick the representation: GET /schema without a format returns the authored (primary) representation; ?format=json-schema returns the JSON Schema (an XSD version is converted on read); ?format=xsd returns the raw XSD (404 if the version has no XSD representation). formatsOf(urn) reports the stored formats plus the derivable jsonschema for XSD versions.
Convert to the API DTO (or raw JSON / raw XSD) at the boundary.

Conditional requests (ETags)

The same content_hash that makes writes idempotent doubles as the HTTP ETag. ArtifactRegistryClient.contentHash(urn, format) returns the hash of the stored representation a read would serve (the requested format, or the authored format when omitted); a derived representation has no stored hash and so no ETag.

The API layer turns this into the conditional-request contract — a strong ETag "sha256:<hex>" on single-artifact reads, If-None-Match → 304, and an optional If-Match → 412 on writes — see Conditional requests (ETags). Because the hash is over the serialised authored content it is byte-stable but not canonical: two semantically equal JSON documents with different key order can hash differently (acceptable for caching and lost-update detection).

Stage workflow (M3)

artifact_version.state carries a per-version lifecycle state (default released). ArtifactRegistryClient.transitionVersion moves a version along the fixed draft → review → approved → released → deprecated → retired path in a single transaction, appending an artifact_version_event audit row and — on release — advancing the artifact's released_version pointer (read back by resolveReleased). Transitions are idempotent (a no-op when already in the target state) and reject disallowed moves.

The workflow is opt-in (modelforge.workflow.enabled, default false): off, every write is stored released, so current_version is always the released version (status quo); on, writes start as draft. See Stage workflow (M3).

Dependency graph & cache

The durable source of dependency edges is artifact_reference. Two in-memory indexes serve runtime queries and are rebuilt from PostgreSQL on startup (on ApplicationReadyEvent, after Flyway has migrated):

DependencyGraphService — a version-precise graph whose nodes are versioned URNs and whose edges are kept verbatim (pinned or :latest); rebuilt from each artifact's current version on startup. It serves dependencies, dependents, impact and transitive queries — reporting concrete versioned IDs (a logical or :latest query resolves to the current version) — and tolerates cycles.
ModelStore — the DataSet read cache, reloaded from the registry.

Both survive a restart because the data lives in PostgreSQL.

Service boundary

The single seam between the domain services and storage is the ArtifactRegistryClient interface. Exactly one implementation is wired:

PostgresArtifactRegistryClient — active when a datasource is configured.
NoopArtifactRegistryClient — degraded Null-Object when no datasource is configured (model-forge.registry.backend other than postgres, or no datasource): reads return empty, writes return HTTP 503.

Domain behaviour (validation, URN creation, reference extraction, graph rebuild, dereferenced/bundled views, DataSet resolution) stays in the services.

Implementation notes

Access: a thin JdbcClient repository layer; jsonb is written with cast(:json as jsonb) and read back as text. No JPA/Hibernate.
Migrations: plain-SQL Flyway migrations under db/migration.
Search: scalar metadata (URN, name, title, description, type) first; a GIN index on content_jsonb is available for payload-level queries.
Tests: Testcontainers-PostgreSQL integration tests (*IT, run in the postgres-it CI job) exercise store/fetch/list/version/delete for every type, backend SemVer, content-hash idempotency, XSD import + namespace lookup, the full cyclic graph, transactional rollback, DataSet resolution, the dependency queries, pipeline publication, format-agnostic identity (:xsd: rejection, formatsOf, a per-version format change), cross-format links via a CORE-URN xs:import namespace, and durability across a Model Management and PostgreSQL restart.

Overview​

Design principles​

Tables​

artifact — logical identity​

artifact_version — immutable version metadata​

artifact_representation — per-version, per-format content​

artifact_reference — dependency edges​

xsd_namespace — namespace lookup​

DTO mapping​

Reference extraction​

Versioning​

Write flow​

Read flow​

Conditional requests (ETags)​

Stage workflow (M3)​

Dependency graph & cache​

Service boundary​

Implementation notes​