Skip to content

Observability

What to monitor

  • historian file recording
  • Prometheus metrics export
  • allowlist-based variable recording
  • web-path exposure of observability endpoints

Operational evidence over time: historian files, metrics, signal allowlists, retention, and access policy.

Good first checks

  • confirm the endpoint exposing metrics is reachable
  • confirm only the expected variables are recorded or exposed
  • confirm historian retention/output paths are explicit

Success means the site can answer which signals are recorded, where the data is retained, who may read it, and how the evidence helps diagnose a runtime issue after the fact.

Worked tutorial

This tutorial enables runtime observability and verifies both persisted historian samples and Prometheus metrics export.

Why this tutorial exists

Many teams enable runtime logic and I/O first, then postpone observability until late commissioning. That increases startup risk because trend/alert/metrics paths are never validated under real runtime behavior.

What you will learn

  • how to enable [runtime.observability] safely
  • how to verify historian file output (history/historian.jsonl)
  • how to verify Prometheus endpoint export (/metrics)
  • how to scope recorded variables with allowlist mode

Prerequisites

  • complete Tutorial 13 first
  • one shell for runtime, one shell for verification commands

Step 1: Prepare isolated project copy

Why: observability tuning should not alter your baseline template project.

rm -rf /tmp/trust-observability
cp -R /tmp/trust-tutorial-13 /tmp/trust-observability
cd /tmp/trust-observability

Step 2: Enable web + observability in runtime.toml

Why: Prometheus export is served via web route and historian needs explicit recording policy.

Set/update these sections:

[runtime.web]
enabled = true
listen = "127.0.0.1:18084"
auth = "local"
tls = false

[runtime.observability]
enabled = true
sample_interval_ms = 1000
mode = "allowlist"
include = ["StartCmd", "RunLamp"]
history_path = "history/historian.jsonl"
max_entries = 20000
prometheus_enabled = true
prometheus_path = "/metrics"

Why these defaults:

  • allowlist avoids recording every symbol by accident.
  • explicit include makes retained telemetry intentional.
  • local bind (127.0.0.1) keeps first-run exposure minimal.

Step 3: Build and validate

Why: mode = "allowlist" requires a non-empty include, and validation catches that class of mistakes before launch.

trust-runtime build --project . --sources src
trust-runtime validate --project .

Step 4: Start runtime

Why: runtime startup confirms historian path setup and web binding.

trust-runtime run --project .

Leave this terminal running.

Step 5: Generate runtime activity

Why: historian and metrics should reflect real signal changes, not idle state.

Use runtime panel/Web UI and toggle mapped inputs (for example %IX0.0) for at least a few cycles so StartCmd/RunLamp values change.

Step 6: Verify historian file output

Why: persistent telemetry is the basis for post-event diagnostics.

In another terminal:

ls -l history/historian.jsonl
tail -n 10 history/historian.jsonl

Expected result:

  • file exists and grows over time,
  • lines are JSON objects with timestamped samples,
  • recorded variables match your allowlist scope.

Step 7: Verify Prometheus endpoint

Why: this confirms metrics scraping contract before CI/monitoring integration.

curl -s http://127.0.0.1:18084/metrics | head -n 40

Expected result:

  • endpoint responds with text exposition format,
  • runtime metrics are present,
  • historian counters are present when observability is enabled.

Step 8: Harden for production

Why: observability paths are part of security and storage posture.

  • change web listen/auth/TLS policy to production requirements,
  • set retention limits appropriate for device storage,
  • keep allowlist focused on operationally relevant symbols,
  • define archiving/rotation policy for historian file handling.

Common mistakes

  • enabling mode = "allowlist" with empty include
  • exposing /metrics broadly before network policy is ready
  • recording too many variables and exhausting storage budget
  • assuming observability works without testing non-idle signal changes

Completion checklist

  • [ ] observability enabled with explicit recording scope
  • [ ] historian file verified with live sample updates
  • [ ] /metrics endpoint verified locally
  • [ ] production storage/network hardening decisions captured