Quality of Service (QoS) Integration Guide
SDK source (GitHub): https://github.com/tangle-network/blueprint/tree/v2/crates/qos
This guide explains how to integrate the Blueprint SDK Quality of Service (QoS) system for observability, monitoring, and dashboards. QoS combines heartbeats, metrics, logs, and Grafana dashboards into a single service that you can run alongside any Blueprint.
QoS Summary
The Blueprint QoS system provides a complete observability stack:
- Heartbeat Service: submits periodic liveness signals to the status registry
- Metrics Collection: exports system and job metrics via a Prometheus-compatible endpoint
- Logging: streams logs to Loki (optional)
- Dashboards: builds Grafana dashboards (optional)
- Server Management: can run Grafana/Loki/Prometheus containers for you
What QoS Exposes
QoS always exposes a Prometheus-compatible metrics endpoint when metrics are enabled. Grafana and Loki are optional and can be managed by QoS or connected externally.
| Component | Default Endpoint | Notes |
|---|---|---|
| Prometheus metrics | http://<host>:9090/metrics | Includes /health plus Prometheus v1 API routes like /api/v1/query. |
| Grafana UI | http://<host>:3000 | Only when configured or managed by QoS. |
| Loki push API | http://<host>:3100/loki/api/v1/push | Only when configured or managed by QoS. |
Integrating QoS with BlueprintRunner
If you use BlueprintRunner, it wires the HTTP RPC endpoint, keystore URI, and status registry address into QoS for you:
let qos_config = blueprint_qos::default_qos_config();
let heartbeat_consumer = Arc::new(MyHeartbeatConsumer::new());
BlueprintRunner::builder(TangleEvmConfig::default(), env)
.router(router)
.qos_service(qos_config, Some(heartbeat_consumer))
.run()
.await?;Note: BlueprintRunner::qos_service enables manage_servers(true) internally. If you want to avoid managed containers, pass a config with grafana_server: None and loki_server: None.
HeartbeatConsumer and Keystore Requirements
Heartbeats require a keystore with an ECDSA key. Use BLUEPRINT_KEYSTORE_URI or --keystore-path so QoS can sign heartbeats.
cargo tangle key --algo ecdsa --keystore ./keystore --name operator
export BLUEPRINT_KEYSTORE_URI="$(pwd)/keystore"Implement the heartbeat consumer using the current trait signature:
use blueprint_qos::heartbeat::{HeartbeatConsumer, HeartbeatStatus};
use blueprint_qos::error::Result as QoSResult;
use std::future::Future;
use std::pin::Pin;
#[derive(Clone)]
struct MyHeartbeatConsumer;
impl HeartbeatConsumer for MyHeartbeatConsumer {
fn send_heartbeat(
&self,
_status: &HeartbeatStatus,
) -> Pin<Box<dyn Future<Output = QoSResult<()>> + Send>> {
Box::pin(async move { Ok(()) })
}
}Configuration Options
Default Configuration
let qos_config = blueprint_qos::default_qos_config();This enables metrics, Loki logging, and Grafana integration. Whether containers start depends on manage_servers (BlueprintRunner forces it on; see note above).
Bring Your Own Observability Stack
Point QoS at your existing Grafana/Loki/Prometheus stack by overriding the configs and keeping manage_servers off:
let qos_config = QoSConfig {
metrics: Some(MetricsConfig {
prometheus_server: Some(PrometheusServerConfig {
host: "0.0.0.0".into(),
port: 9090,
use_docker: false,
..Default::default()
}),
..Default::default()
}),
grafana: Some(GrafanaConfig {
url: "http://grafana.internal:3000".into(),
api_key: Some(std::env::var("GRAFANA_API_KEY")?),
prometheus_datasource_url: Some("http://prometheus.internal:9090".into()),
..Default::default()
}),
loki: Some(LokiConfig {
url: "http://loki.internal:3100/loki/api/v1/push".into(),
..Default::default()
}),
manage_servers: false,
..blueprint_qos::default_qos_config()
};Managed Observability Stack
QoS can spin up Grafana, Loki, and Prometheus containers for you. Make sure Docker is available.
let qos_config = QoSConfig {
manage_servers: true,
grafana_server: Some(GrafanaServerConfig {
admin_user: "admin".into(),
admin_password: "change-me".into(),
allow_anonymous: false,
data_dir: "/var/lib/grafana".into(),
..Default::default()
}),
loki_server: Some(LokiServerConfig {
data_dir: "/var/lib/loki".into(),
config_path: Some("./loki-config.yaml".into()),
..Default::default()
}),
prometheus_server: Some(PrometheusServerConfig {
host: "0.0.0.0".into(),
port: 9090,
use_docker: true,
config_path: Some("./prometheus.yml".into()),
data_path: Some("./prometheus-data".into()),
..Default::default()
}),
docker_network: Some("blueprint-observability".into()),
docker_bind_ip: Some("0.0.0.0".into()),
..blueprint_qos::default_qos_config()
};Builder Pattern
Use the builder when you want explicit wiring for heartbeats or custom datasources:
let qos_service = QoSServiceBuilder::new()
.with_heartbeat_config(HeartbeatConfig {
service_id,
blueprint_id,
interval_secs: 60,
jitter_percent: 10,
max_missed_heartbeats: 3,
status_registry_address,
})
.with_heartbeat_consumer(Arc::new(consumer))
.with_http_rpc_endpoint(env.http_rpc_endpoint.to_string())
.with_keystore_uri(env.keystore_uri.clone())
.with_status_registry_address(status_registry_address)
.with_metrics_config(MetricsConfig::default())
.with_grafana_config(GrafanaConfig::default())
.with_loki_config(LokiConfig::default())
.with_prometheus_server_config(PrometheusServerConfig::default())
.manage_servers(true)
.build()
.await?;Recording Metrics and Events
Track job execution and errors in your handlers:
if let Some(qos) = &ctx.qos_service {
qos.record_job_execution(
JOB_ID,
start_time.elapsed().as_secs_f64(),
ctx.service_id,
ctx.blueprint_id,
);
}if let Some(qos) = &ctx.qos_service {
qos.record_job_error(JOB_ID, "complex_operation_failure");
}Creating Grafana Dashboards
let mut qos_service = qos_service;
qos_service.create_dashboard("My Blueprint").await?;The default dashboard template lives at crates/qos/config/grafana_dashboard.json in the SDK.
Accessing Metrics in Code
You can query the metrics provider directly (for custom metrics or status checks):
use blueprint_qos::metrics::types::MetricsProvider;
if let Some(qos) = &ctx.qos_service {
if let Some(provider) = qos.provider() {
let system_metrics = provider.get_system_metrics().await;
let _cpu = system_metrics.cpu_usage;
provider
.add_custom_metric("custom.label".into(), "value".into())
.await;
}
}Best Practices
✅ DO:
- Initialize QoS early in your Blueprint startup sequence.
- Use
BlueprintRunner::qos_service(...)to auto-wire RPC + keystore + status registry. - Keep Prometheus reachable (bind to
0.0.0.0if scraped externally). - Replace default Grafana credentials when using managed servers.
❌ DON’T:
- Don’t enable heartbeats without setting
BLUEPRINT_KEYSTORE_URI. - Don’t expose managed Grafana publicly without auth.
- Don’t ignore QoS startup errors; they usually indicate misconfigured ports or credentials.
QoS Components Reference
| Component | Primary Struct | Config | Purpose |
|---|---|---|---|
| Unified Service | QoSService | QoSConfig | Main entry point for QoS integration |
| Heartbeat | HeartbeatService | HeartbeatConfig | Liveness signals to the status registry |
| Metrics | MetricsService | MetricsConfig | System + job metrics and Prometheus export |
| Logging | N/A | LokiConfig | Log aggregation via Loki |
| Dashboards | GrafanaClient | GrafanaConfig | Dashboards and datasources |
| Server Management | ServerManager | Server configs | Manages Docker containers for the stack |