Observability¶
What to monitor¶
- historian file recording
- Prometheus metrics export
- allowlist-based variable recording
- web-path exposure of observability endpoints
Operational evidence over time: historian files, metrics, signal allowlists, retention, and access policy.
Good first checks¶
- confirm the endpoint exposing metrics is reachable
- confirm only the expected variables are recorded or exposed
- confirm historian retention/output paths are explicit
Success means the site can answer which signals are recorded, where the data is retained, who may read it, and how the evidence helps diagnose a runtime issue after the fact.
Worked tutorial¶
This tutorial enables runtime observability and verifies both persisted historian samples and Prometheus metrics export.
Why this tutorial exists¶
Many teams enable runtime logic and I/O first, then postpone observability until late commissioning. That increases startup risk because trend/alert/metrics paths are never validated under real runtime behavior.
What you will learn¶
- how to enable
[runtime.observability]safely - how to verify historian file output (
history/historian.jsonl) - how to verify Prometheus endpoint export (
/metrics) - how to scope recorded variables with allowlist mode
Prerequisites¶
- complete Tutorial 13 first
- one shell for runtime, one shell for verification commands
Step 1: Prepare isolated project copy¶
Why: observability tuning should not alter your baseline template project.
rm -rf /tmp/trust-observability
cp -R /tmp/trust-tutorial-13 /tmp/trust-observability
cd /tmp/trust-observability
Step 2: Enable web + observability in runtime.toml¶
Why: Prometheus export is served via web route and historian needs explicit recording policy.
Set/update these sections:
[runtime.web]
enabled = true
listen = "127.0.0.1:18084"
auth = "local"
tls = false
[runtime.observability]
enabled = true
sample_interval_ms = 1000
mode = "allowlist"
include = ["StartCmd", "RunLamp"]
history_path = "history/historian.jsonl"
max_entries = 20000
prometheus_enabled = true
prometheus_path = "/metrics"
Why these defaults:
allowlistavoids recording every symbol by accident.- explicit
includemakes retained telemetry intentional. - local bind (
127.0.0.1) keeps first-run exposure minimal.
Step 3: Build and validate¶
Why: mode = "allowlist" requires a non-empty include, and validation catches
that class of mistakes before launch.
trust-runtime build --project . --sources src
trust-runtime validate --project .
Step 4: Start runtime¶
Why: runtime startup confirms historian path setup and web binding.
trust-runtime run --project .
Leave this terminal running.
Step 5: Generate runtime activity¶
Why: historian and metrics should reflect real signal changes, not idle state.
Use runtime panel/Web UI and toggle mapped inputs (for example %IX0.0) for at
least a few cycles so StartCmd/RunLamp values change.
Step 6: Verify historian file output¶
Why: persistent telemetry is the basis for post-event diagnostics.
In another terminal:
ls -l history/historian.jsonl
tail -n 10 history/historian.jsonl
Expected result:
- file exists and grows over time,
- lines are JSON objects with timestamped samples,
- recorded variables match your allowlist scope.
Step 7: Verify Prometheus endpoint¶
Why: this confirms metrics scraping contract before CI/monitoring integration.
curl -s http://127.0.0.1:18084/metrics | head -n 40
Expected result:
- endpoint responds with text exposition format,
- runtime metrics are present,
- historian counters are present when observability is enabled.
Step 8: Harden for production¶
Why: observability paths are part of security and storage posture.
- change web listen/auth/TLS policy to production requirements,
- set retention limits appropriate for device storage,
- keep allowlist focused on operationally relevant symbols,
- define archiving/rotation policy for historian file handling.
Common mistakes¶
- enabling
mode = "allowlist"with emptyinclude - exposing
/metricsbroadly before network policy is ready - recording too many variables and exhausting storage budget
- assuming observability works without testing non-idle signal changes
Completion checklist¶
- [ ] observability enabled with explicit recording scope
- [ ] historian file verified with live sample updates
- [ ]
/metricsendpoint verified locally - [ ] production storage/network hardening decisions captured