Skip to main content

Persist & transform data

Who is this guide for?

Role: Data Architect, Data Steward

Goal: You want to build pipelines, transform your data, and prepare it for consumption.

Required Permissions: create Dataset, release Dataset

What you will achieve

After completing this guide, you will have:

  • Created a Dataset
  • Defined a Pipeline
  • Managed access permissions
  • Set the Dataset status to mark it as ready

Before you start

  • Verify that you can create Datasets. If this option is unavailable, contact your Tenant Admin to request access.

Understand your data flow

Before building a pipeline, make sure you understand how your data flows through the system. In CIVITAS/CORE, data is processed through pipelines.

This means:

  • data is loaded from a Data source
  • transformed step by step
  • and prepared for use in a Dataset
Example use case: Integrate Smart Meter energy usage data

In this guide, we demonstrate how to persist and transform data using this example use case. It shows how master data and measurement data are processed in pipelines and stored as Observations in a SensorThings API backend.

→ You will find additional context from the Smart Meter energy use case in the highlighted boxes throughout this guide.

Requirements

Two types of data are processed:

  • Master data → used to create SensorThings entities such as Things and Datastreams
  • Measurement data → continuous smart meter values received via MQTT

Pipelines are used to:

  • load data from Data sources
  • transform and map the data
  • provide the data for further use

Make sure you have:

  • available Data sources for master data and measurement data

→ Before measurement data can be stored, the required SensorThings entities must exist. In this guide, you will build pipelines to process both data flows.

Deep Dive for technical documentation of this use case

<< link needs to be updated (use case Repo)

Step-by-step guide

Step 1: Create a Dataset

  1. Go to Datasets

Screenshot: list of Datasets + expanded sidebar of nav

  1. Click Create Dataset
  2. Enter Name and hit Create and continue
  3. Add Base Information in the 1st section and hit Save
Example use case: Integrate Smart Meter energy usage data

Create a Dataset to combine and process smart meter data from different Data sources.

Example name:

  • Smart Meter energy usage

Step 2: Define a pipeline

  1. Click Load and provide data in the 2nd section Data flow

Screenshot: dataset detailview section data flow with &quot;load and provide data&quot; button

A pipeline is an automated sequence of steps used to ingest, transform, or provide data within the Platform.

The pipeline editor is a canvas-based interface that allows you to work freely. You can add, configure, and connect nodes in any order.

  1. Enter a name for the pipeline at the top of the canvas

Screenshot: empty pipeline editor with name selection

It helps you identify and manage multiple pipelines within a Dataset

  1. Add nodes to the canvas

Screenshot: pipeline with flow start + CRON node + Data source + mapping node + storage node + Flow end | all unconnected

Learn how pipelines work

In CIVITAS/CORE, every pipeline has a defined structure:

  • a Flow start → marks the beginning of the pipeline
  • a Flow end → marks the end of the pipeline

Between these nodes, you can define your data flow using different nodes.

Pipelines are flexible:

  • they can start with a Data source or be triggered by an event (e.g. schedule or API request)
  • they can include one or more transformation nodes (e.g. Mapping)
  • they can end by storing data or returning a response

What matters:

  • all nodes must be correctly connected
  • the pipeline must pass validation

Example pipelines:

  • Load and store data: Flow startData sourceMapping → Storage → Flow end
  • Process and return data: Flow start → API request → Storage → Mapping → API response → Flow end
  1. Click on a node to configure it based on your use case

Screenshot: node configuration panel

  1. Connect all nodes to complete the pipeline

Screenshot: all nodes connected

  1. Click Validate to ensure that all nodes are correctly configured and connected

  2. Fix any issues if needed, then hit Save and turn back to the detailview page of the Dataset

Example use case: Integrate Smart Meter energy usage data

This use case requires two pipelines within the same Dataset:

  • a master data pipeline
  • a measurement data pipeline

Pipeline 1:

Start with the master data pipeline and follow the steps above. Use the configuration below as a reference.

Example pipeline name: Smart Meter Master Data

Pipeline structure: Flow start → CRON → Data source → Mapping → Storage → Flow end

CRON node: Defines when the pipeline runs. For example every 30 seconds:

*/30 * * * * *

Data source node: Assign the PostgreSQL Data source created earlier.

Mapping node: Transforms master data into SensorThings entities.

root = {
"things": [
{
"name" : json("device_name"),
"description" : json("device_description"),
"properties" : {
"reference" : json("device_ext_id")
},
"Locations": [
{
"name": json("device_name"),
"description": json("device_description"),
"encodingType": "application/geo+json",
"location": json("device_location").parse_json()
}
],
"Datastreams": json("definitions").parse_json()
}
]
}

Storage node Stores the transformed data in the SensorThings API backend (e.g., FROST Server).

Pipeline 2:

Now, open a new pipeline tab and define the measurement data pipeline. Follow the same steps as above and use the configuration below as a reference.

Example pipeline name: Smart Meter Measurement

Pipeline structure: Flow start → Data source → Mapping → Storage → Flow end

Data source node: Assign the MQTT Data source created earlier.

Mapping node Transforms incoming messages into Observations.

let gateway = this.gatewayId

root.observations = this.measurements.map_each(m -> m.measuredValues.map_each(mv -> {
"phenomenonTime": m.measurementTimestamp,
"result": mv.value.number().catch(mv.value),
"resultTime": m.measurementTimestamp,
"parameters": {
"reference": $gateway,
"name": mv.obisCode
}
})
).flatten()

Storage node Stores the transformed Observations in the SensorThings API backend (e.g., FROST Server).

Step 4: Open and manage access permissions

  1. Click Edit Groups and Roles in the 3rd section Access Management
  2. Add a Group

Screenshot: Add Group Modal

  1. Assign a Role to the Group

Screenshot: eine gruppe mit zwei rollen, eine gruppe mit einer rolle

  1. Repeat these steps to add more Groups and Roles
  2. Hit Save and Turn back to the detail view of the Dataset

Step 5: Mark the Dataset as ready

→ Setting the status to Ready signals that the creation phase of the Dataset is complete. The Dataset is now prepared for review.

  1. Change the Status to Ready
  2. Contact a Data Owner or Data Gatekeeper to review and release it.

You can:

  • share a direct link to the Data source
  • or provide the name of the Data source

Outcome

The Dataset is Ready and can now be reviewed and released.

Summary

You have successfully:

  • Created a Dataset
  • Defined a Pipeline
  • Configured access permissions
  • Marked the Dataset as Ready

Next

You have completed the creation phase of your Dataset. In the next guide, you will:

  • review a Dataset
  • release it for use
  • make it available to others

→ Continue with Release data