Model-Centric Data Flow
Model Management manages DataStructures as living artifacts — not as static files. Every schema has a stable global identity, is versionable, referenceable and can be projected into multiple consumable representations without modifying the original model.
The core principle: the schema is the contract — between producers (DataSources), transformations (Mappings) and consumers (DataSinks). All views are different windows onto the same contract, tailored to the respective consumer.
Each stage adds information without destroying the previous one: the original schema in the registry remains untouched; views are ephemerally computed projections.
JSON Schema as the system language
All entities in the system — sensor readings, road segments, observations — are
modelled as JSON Schema 2020-12 documents. Every schema carries a CORE URN
as its $id:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0",
"title": "GeoPoint",
"type": "object",
"required": ["lat", "lon"],
"properties": {
"lat": { "type": "number" },
"lon": { "type": "number" },
"elevation": { "type": "number" }
}
}
The URN is not an opaque UUID but a readable, stable identity from which the artifact type, logical identity, version and display name are derived mechanically — see URN Format.
Composition via $ref
Schemas reference other schemas through their CORE URN as the $ref value:
{
"$id": "urn:core:platform:civitas:datastructure:common:TrafficObservation:1.0.0",
"title": "TrafficObservation",
"properties": {
"location": { "$ref": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0" },
"timestamp": { "type": "string", "format": "date-time" },
"count": { "type": "integer" }
}
}
This is deliberately not a JSON-Schema-internal reference (#/$defs/…) but
an external registry reference. The dependency is explicit, globally addressable
and independent of the particular document: TrafficObservation can be stored
in any context — the reference to GeoPoint stays stable.
The import flow
POST /api/v1/datastructures runs four steps:
- Validate — networknt validates the document against the JSON Schema 2020-12
meta-schema (structure).
x-core-refis a non-validating annotation here; a registry-aware step then checks that every concretex-core-refforeign-key target exists in the registry (ReferenceExistenceValidator, diagnosticunresolved-core-ref) — skipped when no registry is configured. - Normalise — an existing CORE URN
$idis respected; otherwise the backend generates one fromtitle. All$refURNs are extracted. - Store in the registry — a transactional write: upsert the
artifact, insert a newartifact_version, and store the$refURNs asartifact_referenceedges (the full graph, including cycles). A content hash keeps re-imports of identical content idempotent. - Update the dependency graph — forward and reverse edges in memory.
The PostgreSQL registry is the primary persistence layer, not just a cache.
The dependency graph is reconstructed at backend startup from the schema
content ($ref URNs) and the stored artifact_reference edges.
Declare the CORE URN as $id in the source document (like the examples in
schemas/examples/). The document is then fully self-describing and independent
of the import context.
Validation on two levels
Level A — schema syntax (on import): is this JSON a valid JSON Schema?
Fails on invalid type, missing required, wrong format.
Level B — reference consistency: do the artifact's references point at things that actually exist? DataSet validation checks the manifest's structure against the bundled CORE DataSet schema:
POST /api/v1/datasets/validate (dry run, always 200 with {valid, diagnostics})
PUT /api/v1/datasets?id=… (same check on save, 400 on errors)
x-core-ref is a typed foreign-key annotation — it marks a string field as
holding the CORE URN of another artifact:
{
"stehtAn": {
"type": "string",
"x-core-ref": {
"type": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0"
}
}
}
It is not a validating JSON Schema keyword (it is a non-validating annotation).
Instead, on DataStructure import/update a registry-aware step checks that every
concrete x-core-ref target exists in the registry
(ReferenceExistenceValidator, diagnostic unresolved-core-ref); urn:core:type:<Kind>
category markers and the no-registry mode are skipped. The full semantics are in the
CORE-IR Reference.
The dependency graph
In parallel to persistence, the dependency graph keeps all dependencies in
memory — bidirectionally and version-precise: its nodes are versioned URNs
and the queries below report concrete versioned IDs (a logical or :latest query
resolves to the current version):
| Query | Description | HTTP endpoint |
|---|---|---|
| Dependencies | What does this schema need directly? | GET /api/v1/datastructures/dependencies?id=… |
| Dependents | Who directly depends on this schema? | GET /api/v1/datastructures/dependents?id=… |
| Transitive dependencies | All recursive dependencies (BFS forward) | GET /api/v1/datastructures/transitive-dependencies?id=… |
| Impact | All schemas transitively affected by a change (BFS backward) | GET /api/v1/datastructures/impact?id=… |
Delete protection: DELETE /api/v1/datastructures?id=… fails with HTTP 409
if dependents still exist. The response contains the list of blocking schemas.
The two views
Two qualitatively different views can be generated from the same stored schema — both on demand, without modifying the original .
a. Dereferenced view
Goal: a single, standalone document without external dependencies.
GET /api/v1/datastructures/views/dereferenced?id=…
All CORE URN $ref values are recursively replaced with the referenced schema:
{
"title": "TrafficObservation",
"properties": {
"location": {
"type": "object",
"properties": {
"lat": { "type": "number" },
"lon": { "type": "number" }
}
},
"timestamp": { "type": "string", "format": "date-time" },
"count": { "type": "integer" }
}
}
Cycle protection: if a URN reappears during inlining, the $ref node is kept.
Benefit: can be validated without a registry. Suitable for code generation, Swagger/OpenAPI, flow translators, external systems without registry access.
b. Bundled view
Goal: bundle all transitive dependencies into a single document — as JSON
Schema embedded resources under $defs. Each embedded dependency keeps its
$id; CORE URN $refs stay absolute and resolve against these $ids.
GET /api/v1/datastructures/views/bundled?id=…&depth=…
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:TrafficObservation:1.0.0",
"properties": {
"location": { "$ref": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0" }
},
"$defs": {
"GeoPoint": {
"$id": "urn:core:platform:civitas:datastructure:common:GeoPoint:1.0.0",
"type": "object",
"properties": { "lat": { "type": "number" }, "lon": { "type": "number" } }
}
}
}
Difference from dereferenced: the types remain named, identity-bearing references and are not inlined. The document is more compact, correct for schemas with types used multiple times, and stays self-contained even for dep→dep references (chains, diamonds, cycles).
Benefit: portable, valid JSON Schema 2020-12. Can be handed to external tools, IDEs and validators without registry access. Preserves type identity.
The resolve endpoint
Both views as well as metadata can be queried in a single call
(JSON:API include / OData $expand style):
GET /api/v1/datastructures/resolve
?ids=urn:…:TrafficObservation:1.0.0,urn:…:GeoPoint:1.0.0
&include=schema,bundled,dereferenced,dependencies,transitive
include value | Returns |
|---|---|
schema | Raw JSON Schema from the registry (default) |
dereferenced | All $refs recursively resolved inline |
bundled | Dependencies as embedded resources in $defs |
dependencies | Direct dependencies (1 level) |
transitive | All transitive dependencies (BFS) |
Fields that are not requested are omitted — the client only pays for what it needs. See the resolve endpoint for details.
Dependencies between artifacts
Not only schemas depend on each other — all artifact types have relationships:
The DataSet manifest is the bracketing document: it references all artifacts by
their URNs but stores no content inline. On GET /api/v1/datasets?id=… the
backend returns the manifest with all sub-resource URNs resolved from the registry.
Self-describing artifacts: $schema in all types
All stored artifacts carry a $schema field that declares their type —
consistent with the JSON Schema convention. This makes every exported JSON file
readable without context:
| Artifact type | $schema | Identity |
|---|---|---|
| DataStructure | https://json-schema.org/draft/2020-12/schema | $id |
| DataSet | https://civitasconnect.digital/core-dataset/v1 | id |
| Mapping | https://civitasconnect.digital/core/mapping/v1 | id |
| Pipeline | https://civitasconnect.digital/core/pipeline/v1 | id |
| DataSource | https://civitasconnect.digital/core/datasource/v1 | id |
| DataSink | https://civitasconnect.digital/core/datasink/v1 | id |
The meta-schemas themselves are served by the API:
GET /api/v1/core-ir/meta-schemas → list of all available types
GET /api/v1/core-ir/meta-schemas/{type} → JSON Schema for the type
They are also published as browsable HTML — see CORE-IR Schema Documentation in the API section of this site (also linked in the footer).