Skip to main content
Version: V2-Next

Model-Centric Data Flow

Model Management manages DataStructures as living artifacts — not as static files. Every schema has a stable global identity, is versionable, referenceable and can be projected into multiple consumable representations without modifying the original model.

The core principle: the schema is the contract — between producers (DataSources), transformations (Mappings) and consumers (DataSinks). All views are different windows onto the same contract, tailored to the respective consumer.

Model-centric data flow

Each stage adds information without destroying the previous one: the original schema in the registry remains untouched; views are ephemerally computed projections.

JSON Schema as the system language

All entities in the system — sensor readings, road segments, observations — are modelled as JSON Schema 2020-12 documents. Every schema carries a CORE URN as its $id:

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0",
"title": "GeoPoint",
"type": "object",
"required": ["lat", "lon"],
"properties": {
"lat": { "type": "number" },
"lon": { "type": "number" },
"elevation": { "type": "number" }
}
}

The URN is not an opaque UUID but a readable, stable identity from which the artifact type, logical identity, version and display name are derived mechanically — see URN Format.

Composition via $ref

Schemas reference other schemas through their CORE URN as the $ref value:

{
"$id": "urn:core:platform:civitas:datastructure:common:TrafficObservation:1.0.0",
"title": "TrafficObservation",
"properties": {
"location": { "$ref": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0" },
"timestamp": { "type": "string", "format": "date-time" },
"count": { "type": "integer" }
}
}

This is deliberately not a JSON-Schema-internal reference (#/$defs/…) but an external registry reference. The dependency is explicit, globally addressable and independent of the particular document: TrafficObservation can be stored in any context — the reference to GeoPoint stays stable.

The import flow

Schema import flow

POST /api/v1/datastructures runs four steps:

  1. Validate — networknt validates the document against the JSON Schema 2020-12 meta-schema (structure). x-core-ref is a non-validating annotation here; a registry-aware step then checks that every concrete x-core-ref foreign-key target exists in the registry (ReferenceExistenceValidator, diagnostic unresolved-core-ref) — skipped when no registry is configured.
  2. Normalise — an existing CORE URN $id is respected; otherwise the backend generates one from title. All $ref URNs are extracted.
  3. Store in the registry — a transactional write: upsert the artifact, insert a new artifact_version, and store the $ref URNs as artifact_reference edges (the full graph, including cycles). A content hash keeps re-imports of identical content idempotent.
  4. Update the dependency graph — forward and reverse edges in memory.

The PostgreSQL registry is the primary persistence layer, not just a cache. The dependency graph is reconstructed at backend startup from the schema content ($ref URNs) and the stored artifact_reference edges.

Recommendation

Declare the CORE URN as $id in the source document (like the examples in schemas/examples/). The document is then fully self-describing and independent of the import context.

Validation on two levels

Level A — schema syntax (on import): is this JSON a valid JSON Schema? Fails on invalid type, missing required, wrong format.

Level B — reference consistency: do the artifact's references point at things that actually exist? DataSet validation checks the manifest's structure against the bundled CORE DataSet schema:

POST /api/v1/datasets/validate (dry run, always 200 with {valid, diagnostics})
PUT /api/v1/datasets?id=… (same check on save, 400 on errors)

x-core-ref is a typed foreign-key annotation — it marks a string field as holding the CORE URN of another artifact:

{
"stehtAn": {
"type": "string",
"x-core-ref": {
"type": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0"
}
}
}

It is not a validating JSON Schema keyword (it is a non-validating annotation). Instead, on DataStructure import/update a registry-aware step checks that every concrete x-core-ref target exists in the registry (ReferenceExistenceValidator, diagnostic unresolved-core-ref); urn:core:type:<Kind> category markers and the no-registry mode are skipped. The full semantics are in the CORE-IR Reference.

The dependency graph

In parallel to persistence, the dependency graph keeps all dependencies in memory — bidirectionally and version-precise: its nodes are versioned URNs and the queries below report concrete versioned IDs (a logical or :latest query resolves to the current version):

QueryDescriptionHTTP endpoint
DependenciesWhat does this schema need directly?GET /api/v1/datastructures/dependencies?id=…
DependentsWho directly depends on this schema?GET /api/v1/datastructures/dependents?id=…
Transitive dependenciesAll recursive dependencies (BFS forward)GET /api/v1/datastructures/transitive-dependencies?id=…
ImpactAll schemas transitively affected by a change (BFS backward)GET /api/v1/datastructures/impact?id=…

Delete protection: DELETE /api/v1/datastructures?id=… fails with HTTP 409 if dependents still exist. The response contains the list of blocking schemas.

The two views

Two qualitatively different views can be generated from the same stored schema — both on demand, without modifying the original .

a. Dereferenced view

Goal: a single, standalone document without external dependencies.

GET /api/v1/datastructures/views/dereferenced?id=…

All CORE URN $ref values are recursively replaced with the referenced schema:

{
"title": "TrafficObservation",
"properties": {
"location": {
"type": "object",
"properties": {
"lat": { "type": "number" },
"lon": { "type": "number" }
}
},
"timestamp": { "type": "string", "format": "date-time" },
"count": { "type": "integer" }
}
}

Cycle protection: if a URN reappears during inlining, the $ref node is kept.

Benefit: can be validated without a registry. Suitable for code generation, Swagger/OpenAPI, flow translators, external systems without registry access.

b. Bundled view

Goal: bundle all transitive dependencies into a single document — as JSON Schema embedded resources under $defs. Each embedded dependency keeps its $id; CORE URN $refs stay absolute and resolve against these $ids.

GET /api/v1/datastructures/views/bundled?id=…&depth=…
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:TrafficObservation:1.0.0",
"properties": {
"location": { "$ref": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0" }
},
"$defs": {
"GeoPoint": {
"$id": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0",
"type": "object",
"properties": { "lat": { "type": "number" }, "lon": { "type": "number" } }
}
}
}

Difference from dereferenced: the types remain named, identity-bearing references and are not inlined. The document is more compact, correct for schemas with types used multiple times, and stays self-contained even for dep→dep references (chains, diamonds, cycles).

Benefit: portable, valid JSON Schema 2020-12. Can be handed to external tools, IDEs and validators without registry access. Preserves type identity.

The resolve endpoint

Both views as well as metadata can be queried in a single call (JSON:API include / OData $expand style):

GET /api/v1/datastructures/resolve
?ids=urn:…:TrafficObservation:1.0.0,urn:…:GeoPoint:1.0.0
&include=schema,bundled,dereferenced,dependencies,transitive
include valueReturns
schemaRaw JSON Schema from the registry (default)
dereferencedAll $refs recursively resolved inline
bundledDependencies as embedded resources in $defs
dependenciesDirect dependencies (1 level)
transitiveAll transitive dependencies (BFS)

Fields that are not requested are omitted — the client only pays for what it needs. See the resolve endpoint for details.

Dependencies between artifacts

Not only schemas depend on each other — all artifact types have relationships:

Artifact relationships

The DataSet manifest is the bracketing document: it references all artifacts by their URNs but stores no content inline. On GET /api/v1/datasets?id=… the backend returns the manifest with all sub-resource URNs resolved from the registry.

Self-describing artifacts: $schema in all types

All stored artifacts carry a $schema field that declares their type — consistent with the JSON Schema convention. This makes every exported JSON file readable without context:

Artifact type$schemaIdentity
DataStructurehttps://json-schema.org/draft/2020-12/schema$id
DataSethttps://civitasconnect.digital/core-dataset/v1id
Mappinghttps://civitasconnect.digital/core/mapping/v1id
Pipelinehttps://civitasconnect.digital/core/pipeline/v1id
DataSourcehttps://civitasconnect.digital/core/datasource/v1id
DataSinkhttps://civitasconnect.digital/core/datasink/v1id

The meta-schemas themselves are served by the API:

GET /api/v1/core-ir/meta-schemas → list of all available types
GET /api/v1/core-ir/meta-schemas/{type} → JSON Schema for the type

They are also published as browsable HTML — see CORE-IR Schema Documentation in the API section of this site (also linked in the footer).