BuildQuality of Service

Quality of Service (QoS) Integration Guide

SDK source (GitHub): https://github.com/tangle-network/blueprint/tree/v2/crates/qos

This guide explains how to integrate the Blueprint SDK Quality of Service (QoS) system for observability, monitoring, and dashboards. QoS combines heartbeats, metrics, logs, and Grafana dashboards into a single service that you can run alongside any Blueprint.

QoS Summary

The Blueprint QoS system provides a complete observability stack:

  • Heartbeat Service: submits periodic liveness signals to the status registry
  • Metrics Collection: exports system and job metrics via a Prometheus-compatible endpoint
  • Logging: streams logs to Loki (optional)
  • Dashboards: builds Grafana dashboards (optional)
  • Server Management: can run Grafana/Loki/Prometheus containers for you

What QoS Exposes

QoS always exposes a Prometheus-compatible metrics endpoint when metrics are enabled. Grafana and Loki are optional and can be managed by QoS or connected externally.

ComponentDefault EndpointNotes
Prometheus metricshttp://<host>:9090/metricsIncludes /health plus Prometheus v1 API routes like /api/v1/query.
Grafana UIhttp://<host>:3000Only when configured or managed by QoS.
Loki push APIhttp://<host>:3100/loki/api/v1/pushOnly when configured or managed by QoS.

Integrating QoS with BlueprintRunner

If you use BlueprintRunner, it wires the HTTP RPC endpoint, keystore URI, and status registry address into QoS for you:

let qos_config = blueprint_qos::default_qos_config();
let heartbeat_consumer = Arc::new(MyHeartbeatConsumer::new());
 
BlueprintRunner::builder(TangleEvmConfig::default(), env)
    .router(router)
    .qos_service(qos_config, Some(heartbeat_consumer))
    .run()
    .await?;

Note: BlueprintRunner::qos_service enables manage_servers(true) internally. If you want to avoid managed containers, pass a config with grafana_server: None and loki_server: None.

HeartbeatConsumer and Keystore Requirements

Heartbeats require a keystore with an ECDSA key. Use BLUEPRINT_KEYSTORE_URI or --keystore-path so QoS can sign heartbeats.

cargo tangle key --algo ecdsa --keystore ./keystore --name operator
export BLUEPRINT_KEYSTORE_URI="$(pwd)/keystore"

Implement the heartbeat consumer using the current trait signature:

use blueprint_qos::heartbeat::{HeartbeatConsumer, HeartbeatStatus};
use blueprint_qos::error::Result as QoSResult;
use std::future::Future;
use std::pin::Pin;
 
#[derive(Clone)]
struct MyHeartbeatConsumer;
 
impl HeartbeatConsumer for MyHeartbeatConsumer {
    fn send_heartbeat(
        &self,
        _status: &HeartbeatStatus,
    ) -> Pin<Box<dyn Future<Output = QoSResult<()>> + Send>> {
        Box::pin(async move { Ok(()) })
    }
}

Configuration Options

Default Configuration

let qos_config = blueprint_qos::default_qos_config();

This enables metrics, Loki logging, and Grafana integration. Whether containers start depends on manage_servers (BlueprintRunner forces it on; see note above).

Bring Your Own Observability Stack

Point QoS at your existing Grafana/Loki/Prometheus stack by overriding the configs and keeping manage_servers off:

let qos_config = QoSConfig {
    metrics: Some(MetricsConfig {
        prometheus_server: Some(PrometheusServerConfig {
            host: "0.0.0.0".into(),
            port: 9090,
            use_docker: false,
            ..Default::default()
        }),
        ..Default::default()
    }),
    grafana: Some(GrafanaConfig {
        url: "http://grafana.internal:3000".into(),
        api_key: Some(std::env::var("GRAFANA_API_KEY")?),
        prometheus_datasource_url: Some("http://prometheus.internal:9090".into()),
        ..Default::default()
    }),
    loki: Some(LokiConfig {
        url: "http://loki.internal:3100/loki/api/v1/push".into(),
        ..Default::default()
    }),
    manage_servers: false,
    ..blueprint_qos::default_qos_config()
};

Managed Observability Stack

QoS can spin up Grafana, Loki, and Prometheus containers for you. Make sure Docker is available.

let qos_config = QoSConfig {
    manage_servers: true,
    grafana_server: Some(GrafanaServerConfig {
        admin_user: "admin".into(),
        admin_password: "change-me".into(),
        allow_anonymous: false,
        data_dir: "/var/lib/grafana".into(),
        ..Default::default()
    }),
    loki_server: Some(LokiServerConfig {
        data_dir: "/var/lib/loki".into(),
        config_path: Some("./loki-config.yaml".into()),
        ..Default::default()
    }),
    prometheus_server: Some(PrometheusServerConfig {
        host: "0.0.0.0".into(),
        port: 9090,
        use_docker: true,
        config_path: Some("./prometheus.yml".into()),
        data_path: Some("./prometheus-data".into()),
        ..Default::default()
    }),
    docker_network: Some("blueprint-observability".into()),
    docker_bind_ip: Some("0.0.0.0".into()),
    ..blueprint_qos::default_qos_config()
};

Builder Pattern

Use the builder when you want explicit wiring for heartbeats or custom datasources:

let qos_service = QoSServiceBuilder::new()
    .with_heartbeat_config(HeartbeatConfig {
        service_id,
        blueprint_id,
        interval_secs: 60,
        jitter_percent: 10,
        max_missed_heartbeats: 3,
        status_registry_address,
    })
    .with_heartbeat_consumer(Arc::new(consumer))
    .with_http_rpc_endpoint(env.http_rpc_endpoint.to_string())
    .with_keystore_uri(env.keystore_uri.clone())
    .with_status_registry_address(status_registry_address)
    .with_metrics_config(MetricsConfig::default())
    .with_grafana_config(GrafanaConfig::default())
    .with_loki_config(LokiConfig::default())
    .with_prometheus_server_config(PrometheusServerConfig::default())
    .manage_servers(true)
    .build()
    .await?;

Recording Metrics and Events

Track job execution and errors in your handlers:

if let Some(qos) = &ctx.qos_service {
    qos.record_job_execution(
        JOB_ID,
        start_time.elapsed().as_secs_f64(),
        ctx.service_id,
        ctx.blueprint_id,
    );
}
if let Some(qos) = &ctx.qos_service {
    qos.record_job_error(JOB_ID, "complex_operation_failure");
}

Creating Grafana Dashboards

let mut qos_service = qos_service;
qos_service.create_dashboard("My Blueprint").await?;

The default dashboard template lives at crates/qos/config/grafana_dashboard.json in the SDK.

Accessing Metrics in Code

You can query the metrics provider directly (for custom metrics or status checks):

use blueprint_qos::metrics::types::MetricsProvider;
 
if let Some(qos) = &ctx.qos_service {
    if let Some(provider) = qos.provider() {
        let system_metrics = provider.get_system_metrics().await;
        let _cpu = system_metrics.cpu_usage;
        provider
            .add_custom_metric("custom.label".into(), "value".into())
            .await;
    }
}

Best Practices

✅ DO:

  • Initialize QoS early in your Blueprint startup sequence.
  • Use BlueprintRunner::qos_service(...) to auto-wire RPC + keystore + status registry.
  • Keep Prometheus reachable (bind to 0.0.0.0 if scraped externally).
  • Replace default Grafana credentials when using managed servers.

❌ DON’T:

  • Don’t enable heartbeats without setting BLUEPRINT_KEYSTORE_URI.
  • Don’t expose managed Grafana publicly without auth.
  • Don’t ignore QoS startup errors; they usually indicate misconfigured ports or credentials.

QoS Components Reference

ComponentPrimary StructConfigPurpose
Unified ServiceQoSServiceQoSConfigMain entry point for QoS integration
HeartbeatHeartbeatServiceHeartbeatConfigLiveness signals to the status registry
MetricsMetricsServiceMetricsConfigSystem + job metrics and Prometheus export
LoggingN/ALokiConfigLog aggregation via Loki
DashboardsGrafanaClientGrafanaConfigDashboards and datasources
Server ManagementServerManagerServer configsManages Docker containers for the stack