跳到主要内容

设计监控系统

问题

如何用 Rust 设计一个监控指标收集和暴露系统?

答案

架构设计

基于 metrics crate 的实现

use axum::{Router, routing::get, response::IntoResponse};
use metrics::{counter, gauge, histogram};
use metrics_exporter_prometheus::PrometheusBuilder;

async fn setup_metrics() -> Router {
// 初始化 Prometheus 指标导出器
let builder = PrometheusBuilder::new();
let handle = builder
.install_recorder()
.expect("failed to install recorder");

// 创建路由
Router::new()
.route("/metrics", get(move || {
let handle = handle.clone();
async move { handle.render() }
}))
}

// 在业务代码中记录指标
async fn handle_request() {
// 计数器:请求总数
counter!("http_requests_total", "method" => "GET", "path" => "/api").increment(1);

// 直方图:请求延迟
let start = std::time::Instant::now();
// ... 业务逻辑 ...
let duration = start.elapsed();
histogram!("http_request_duration_seconds").record(duration.as_secs_f64());

// 仪表盘:当前连接数
gauge!("active_connections").set(42.0);
}

中间件自动收集 HTTP 指标

use axum::{middleware, extract::Request, response::Response};
use std::time::Instant;

async fn metrics_middleware(
req: Request,
next: middleware::Next,
) -> Response {
let method = req.method().to_string();
let path = req.uri().path().to_string();
let start = Instant::now();

let response = next.run(req).await;

let status = response.status().as_u16().to_string();
let duration = start.elapsed().as_secs_f64();

counter!("http_requests_total",
"method" => method.clone(),
"path" => path.clone(),
"status" => status
).increment(1);

histogram!("http_request_duration_seconds",
"method" => method,
"path" => path
).record(duration);

response
}

四类黄金指标

指标类型含义
请求速率(Rate)CounterQPS/RPM
错误率(Errors)Counter4xx/5xx 占比
延迟(Duration)HistogramP50/P95/P99
饱和度(Saturation)Gauge连接数、队列长度

常见面试问题

Q1: Counter vs Gauge vs Histogram?

答案

  • Counter:只增不减,例如请求总数、错误总数
  • Gauge:可增可减,例如当前连接数、内存使用量
  • Histogram:采样分布,例如请求延迟的 P50/P95/P99

相关链接