Skip to main content
Version: V2-Next

CORE-IR Syntax and Semantics

This document describes the CORE Intermediate Representation (CORE-IR), i.e. the JSON-based exchange format between Model Management, UI, registry and downstream generators.

The normative technical basis are the JSON Schemas in the backend (browsable in the CORE-IR Schema Documentation linked in the API section):

  • dataset.schema.json
  • artifact-envelope.schema.json
  • mapping.schema.json
  • pipeline.schema.json
  • datasource.schema.json
  • datasink.schema.json
  • core-schema-extensions.schema.json

core-ir.schema.json is the bundled view of these individual schemas. The individual schemas remain the maintained source of truth; the large schema is composed from them.

Basic model

CORE-IR consists of reusable artifacts:

ArtifactContentIdentity
DataSetComposition manifest for artifacts that belong togetherid
DataStructureJSON Schema or XSD-based structure definition$id
MappingField mapping between two DataStructuresid
PipelineGraph of sources, transformations and sinksid
DataSourceInbound connection and payload structureid
DataSinkOutbound connection and payload structureid
ArtifactEnvelopeTransport/import envelope for artifact metadata and contentartifactId + firstVersion.version

DataStructures are JSON Schema documents and therefore use the JSON Schema identity $id. The CORE domain artifacts use id.

All artifact IDs are versioned CORE URNs:

urn:core:{scope}:{owner}:{artifactType}:{namespace}:{name}:{version}

Examples:

urn:core:platform:civitas:dataset:common:air-quality:1.0.0
urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0
urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0

$schema, id and $id

$schema describes the schema/artifact syntax of the document. It is not the domain identity of the artifact.

{
"$schema": "https://civitasconnect.digital/core/mapping/v1",
"id": "urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0"
}

Rules:

  • CORE domain artifacts use id.
  • DataStructures use $id.
  • Every exported artifact contains $schema.
  • References point to the domain identity, i.e. to id or $id, not to $schema.
  • Short forms such as "SensorReading" are not valid references.

Artifact envelope

The artifact envelope is the uniform transport form for artifacts when metadata is needed in addition to the actual content. It separates the logical identity (artifactId), the artifact type, and the initial content (firstVersion.content) when creating an artifact.

CORE uses no separate top-level $id in the envelope. The unique versioned artifact URN is derived from artifactId and firstVersion.version, so the envelope does not duplicate identity.

{
"$schema": "https://civitasconnect.digital/core/artifact-envelope/v1",
"artifactId": "urn:core:platform:civitas:datastructure:common:AirQualityEnvelope",
"artifactType": "XSD",
"groupId": "datastructures",
"title": "AirQualityEnvelope",
"firstVersion": {
"version": "1.0.0",
"content": {
"contentType": "application/xml",
"content": "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\">...</xs:schema>"
}
}
}
FieldMeaning
$schemaSchema of the envelope: https://civitasconnect.digital/core/artifact-envelope/v1
artifactIdThe logical, version-free CORE URN that identifies the artifact. DataStructures always use the datastructure type — the format is not in the URN, so a :xsd: URN is rejected.
artifactTypeContent format of the embedded content (JSON_SCHEMA | XSD) — an import hint that selects the stored representation, not the URN's artifact-type segment.
groupIdGlobal artifact group, e.g. datastructures
firstVersion.versionVersion of the initial artifact content
firstVersion.content.contentTypeMedia type of the content
firstVersion.content.contentThe actual artifact content. JSON Schema as an object, XSD as an XML string.

The versioned CORE URN of the artifact is {artifactId}:{firstVersion.version}.

For JSON Schema, the content itself may also carry $id. In that case this $id must semantically match the versioned envelope URN derived above. For XSD there is no JSON $id in the content; there, artifactId + firstVersion.version is the authoritative CORE identity.

DataSet

A DataSet is the bracketing document for a coherent integration. It can appear as a stored manifest or as a resolved bundle.

Manifest form:

{
"$schema": "https://civitasconnect.digital/core-dataset/v1",
"id": "urn:core:platform:civitas:dataset:common:air-quality:1.0.0",
"title": "Air Quality",
"dataStructureRefs": [
"urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0"
],
"mappingRefs": [
"urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0"
],
"pipelineRefs": [
"urn:core:platform:civitas:pipeline:common:air-quality-ingest:1.0.0"
],
"dataSourceRefs": [
"urn:core:platform:civitas:datasource:common:aq-mqtt-source:1.0.0"
],
"dataSinkRefs": [
"urn:core:platform:civitas:datasink:common:aq-http-sink:1.0.0"
]
}

Resolved bundle form:

{
"$schema": "https://civitasconnect.digital/core-dataset/v1",
"id": "urn:core:platform:civitas:dataset:common:air-quality:1.0.0",
"dataStructures": {
"SensorReading": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0",
"type": "object"
}
},
"mappings": [],
"pipelines": [],
"dataSources": [],
"dataSinks": []
}

The API may return a DataSet in resolved form. Internally, or with active global persistence, the manifest with *Refs is sufficient. DataSet validation works on the resolved bundle form, because only there can local cross-member references be checked.

DataStructure

A DataStructure is a JSON Schema or a schema derived from XSD. The domain identity is in $id.

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0",
"title": "SensorReading",
"type": "object",
"properties": {
"sensorId": { "type": "string" },
"temperature": { "type": "number" }
},
"required": ["sensorId"]
}

If a DataStructure version is authored as XSD, its JSON Schema projection points back at the DataStructure's own URN via x-xsd-source — the version that carries the source XSD representation.

Mapping

A Mapping describes a declarative mapping between two DataStructures. source and target are global DataStructure URNs.

{
"$schema": "https://civitasconnect.digital/core/mapping/v1",
"id": "urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0",
"source": "urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0",
"target": "urn:core:platform:civitas:datastructure:common:Observation:1.0.0",
"fields": {
"$.result": { "op": "copy", "input": "$.temperature" },
"$.source": { "op": "const", "value": "mqtt" },
"$.name": {
"op": "concat",
"inputs": ["$.sensorId", "$.metric"],
"separator": ":"
}
}
}

Field operations:

OperationRequired fieldsSemantics
copyinputCopies a value from the source object
concatinputsConcatenation of multiple source values
constvalueWrites a constant value

As a short form, a field value may be a string. Semantically this is a copy mapping with the string as the source path.

DataSource and DataSink

DataSources and DataSinks describe technical connections and the DataStructure of the respective payload.

{
"$schema": "https://civitasconnect.digital/core/datasource/v1",
"id": "urn:core:platform:civitas:datasource:common:aq-mqtt-source:1.0.0",
"connectionType": "mqtt",
"brokerUrl": "tcp://mqtt.example.test:1883",
"topic": "city/air-quality/+/reading",
"dataStructure": "urn:core:platform:civitas:datastructure:common:SensorReading:1.0.0"
}
{
"$schema": "https://civitasconnect.digital/core/datasink/v1",
"id": "urn:core:platform:civitas:datasink:common:aq-http-sink:1.0.0",
"connectionType": "http",
"url": "https://data.example.test/observations",
"method": "POST",
"dataStructure": "urn:core:platform:civitas:datastructure:common:Observation:1.0.0"
}

dataStructure is a foreign key (x-core-ref): its value is the CORE URN ($id) of the DataStructure that describes the payload — a reference, not an embedded schema.

Pipeline

A Pipeline is a directed graph of nodes and edges.

{
"$schema": "https://civitasconnect.digital/core/pipeline/v1",
"id": "urn:core:platform:civitas:pipeline:common:air-quality-ingest:1.0.0",
"nodes": [
{ "id": "start", "kind": "start", "x-ui-position": { "x": 80, "y": 180 } },
{
"id": "source",
"kind": "source",
"sourceRef": "urn:core:platform:civitas:datasource:common:aq-mqtt-source:1.0.0",
"x-ui-position": { "x": 240, "y": 180 }
},
{
"id": "mapping",
"kind": "mapping",
"mappingRef": "urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0",
"x-ui-position": { "x": 420, "y": 180 }
},
{
"id": "sink",
"kind": "sink",
"sinkRef": "urn:core:platform:civitas:datasink:common:aq-http-sink:1.0.0",
"x-ui-position": { "x": 600, "y": 180 }
},
{ "id": "end", "kind": "end", "x-ui-position": { "x": 760, "y": 180 } }
],
"edges": [
{ "id": "e1", "source": "start", "target": "source", "kind": "control" },
{ "id": "e2", "source": "source", "target": "mapping", "kind": "data" },
{ "id": "e3", "source": "mapping", "target": "sink", "kind": "data" },
{ "id": "e4", "source": "sink", "target": "end", "kind": "control" }
]
}

Node kinds:

kindReference fieldsSemantics
startnoneStart point
endnoneEnd point
sourcesourceRefReads from a DataSource
filterexpressionFilters records
enrichlookupSourceRef, lookupKeyEnriches via a lookup source
mappingmappingRefApplies a mapping
sinksinkRefWrites to a DataSink
splitnoneSplits a flow into multiple records

Pipeline edges reference node IDs within the same pipeline. These node IDs are local graph IDs, not CORE URNs.

References between artifacts

CORE artifacts reference each other by CORE URN — a foreign key. The field is a plain string holding the target's URN; the target is not embedded. The x-core-ref annotation marks such a field and names the artifact type the URN points at.

This is the counterpart to a JSON Schema $ref, which embeds the referenced schema (the value at that position is an object shaped like the target):

  • $refcomposition: the value is (an instance of) the target.
  • x-core-refassociation / foreign key: the value names the target.

References are always global URNs; existence is checked by registry-aware services (see Validation). A reference may pin a concrete version (…:Foo:1.0.0) or use the latest token (…:Foo:latest) — see URN Format; it is stored verbatim and resolved to a concrete version on read.

Examples of x-core-ref fields: DataSet.dataStructureRefs[] / mappingRefs[] / …, Mapping.source / Mapping.target, DataSource.dataStructure, Pipeline.nodes[].sourceRef / mappingRef / sinkRef.

Example: a foreign key between DataStructures

A Baum (tree) references the Strasse (street) it stands on — by reference, not by embedding. stehtAn is a plain string field annotated as a foreign key:

// DataStructure: Strasse
{
"$id": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0",
"title": "Strasse",
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" }
}
}
// DataStructure: Baum — "stehtAn" is a foreign key to the Strasse DataStructure
{
"$id": "urn:core:platform:civitas:datastructure:common:Baum:1.0.0",
"title": "Baum",
"type": "object",
"properties": {
"name": { "type": "string" },
"stehtAn": {
"type": "string",
"x-core-ref": { "type": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0" }
}
}
}

The type names the concrete target DataStructure (Strasse), so the foreign key is strongly typed: x-core-ref references the Strasse DataStructure (artifact), and on import Model Management checks that this DataStructure exists in the registry (see Validation). In instance data the stehtAn field then holds an application-level key that identifies a Strasse record — the street is pointed at, not copied in:

// instance of Strasse
{ "$schema": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0",
"id": "42", "name": "BerlinerAllee" }

// instance of Baum — stehtAn references the Strasse above
{ "$schema": "urn:core:platform:civitas:datastructure:common:Baum:1.0.0",
"name": "Fichte", "stehtAn": "42" }

Note the two levels: the schema-level x-core-ref.type is a CORE artifact URN (the Strasse DataStructure — this is what the registry existence check resolves), while the instance value ("42") is the application's own join key into Strasse records and is not a CORE URN. Model Management validates the schema-level reference, not instance join keys.

Had stehtAn used $ref instead, a Baum instance would have to embed the whole Strasse object inline rather than reference it.

x-core-ref

x-core-ref is a CORE-specific JSON Schema annotation. It marks a string field as a foreign key: the value is the CORE URN of another artifact. JSON Schema alone cannot express this, so the keyword records the target type.

The only key is type — the CORE URN of the referenced artifact. It is either a concrete artifact URN (a strongly-typed foreign key, e.g. to a specific DataStructure — its existence is checked against the registry) or an urn:core:type:<Kind> category URN (used by the generic CORE-IR artifact meta-schemas, where the concrete target is not fixed):

{ "type": "urn:core:platform:civitas:datastructure:common:Strasse:1.0.0" }
{ "type": "urn:core:type:DataStructure" }
{ "type": "urn:core:type:Mapping" }
{ "type": "urn:core:type:Pipeline" }
{ "type": "urn:core:type:DataSource" }
{ "type": "urn:core:type:DataSink" }

There is no scope (every reference is a global URN) and no collection / key field. Not allowed (examples of rejected shapes):

{ "type": "#/$defs/DataStructure" }
{ "type": "DataStructure" }
{ "type": "urn:core:type:DataStructure", "scope": "global" }
{ "type": "urn:core:type:DataStructure", "collection": "dataStructures" }

Other x-* attributes

All allowed CORE extensions are declared in core-schema-extensions.schema.json. New x-* attributes must first be defined there as a shape.

x-ui-position

x-ui-position stores canvas coordinates for pipeline nodes.

{
"id": "mapping",
"kind": "mapping",
"mappingRef": "urn:core:platform:civitas:mapping:common:sensor-to-observation:1.0.0",
"x-ui-position": { "x": 420, "y": 180 }
}

Semantics: pure UI annotation, no domain effect on mapping, validation or generator logic. Replaces a domain-level position field — both must not exist at the same time.

x-xsd-source

x-xsd-source points from a JSON Schema projection back at the DataStructure URN whose version stores the source XSD representation it was converted from.

{
"$id": "urn:core:platform:civitas:datastructure:common:AirQualityJson:1.0.0",
"x-xsd-source": "urn:core:platform:civitas:datastructure:common:AirQuality:1.0.0",
"type": "object"
}

Semantics: documents the origin of the JSON Schema view and links the XSD and the JSON Schema projection at the domain level. It is not a substitute for $id.

Validation

x-core-ref is a typed annotation, not a JSON-Schema-level assertion:

  1. Structural validation — JSON Schema checks required fields, types, URN patterns (pattern: "^urn:") and allowed node/operation types. x-core-ref itself is a non-validating annotation here (it records the target type).
  2. Referential validation (registry-aware) — whether a referenced URN resolves to an existing artifact is the registry's concern, not pure JSON Schema. On DataStructure import/update Model Management checks every concrete x-core-ref target against the registry (ReferenceExistenceValidator, diagnostic code unresolved-core-ref); an unknown target is rejected. Targets created by the same request and urn:core:type:<Kind> category markers are not checked, and the check is skipped when no registry is configured.
  • Backend: x-core-ref is registered as an annotation keyword (no in-document validator); existence is enforced by ReferenceExistenceValidator.
  • Governance: a test verifies that all x-* attributes are declared in the extension schema.

Design rules

  • Use CORE URNs as reference values, not plain names.
  • x-core-ref (foreign key, by URN) and $ref (embedding) are distinct — pick reference vs. composition deliberately.
  • Use $id for DataStructures, id for CORE artifacts.
  • Use x-ui-position for canvas positions, not position.
  • Introduce new x-* attributes only if they are described and tested in the extension schema.
  • Keep DataSet manifests lean: refs instead of duplicated content, provided a global store is available.

Terms

TermMeaning
ArtifactPersistable CORE unit such as Mapping, Pipeline or DataStructure
DataSet manifestLean DataSet with *Refs to global artifacts
Resolved DataSetDataSet with embedded dataStructures, mappings, pipelines, dataSources, dataSinks
x-core-refForeign key: the field holds the CORE URN of another artifact
$refEmbedding: the referenced schema is composed in (the value is the target)
type (in x-core-ref)CORE URN of the referenced artifact: a concrete artifact URN (strongly-typed FK) or an urn:core:type:... category URN