设计监控系统
问题
如何用 Rust 设计一个监控指标收集和暴露系统?
答案
架构设计
基于 metrics crate 的实现
use axum::{Router, routing::get, response::IntoResponse};
use metrics::{counter, gauge, histogram};
use metrics_exporter_prometheus::PrometheusBuilder;
async fn setup_metrics() -> Router {
// 初始化 Prometheus 指标导出器
let builder = PrometheusBuilder::new();
let handle = builder
.install_recorder()
.expect("failed to install recorder");
// 创建路由
Router::new()
.route("/metrics", get(move || {
let handle = handle.clone();
async move { handle.render() }
}))
}
// 在业务代码中记录指标
async fn handle_request() {
// 计数器:请求总数
counter!("http_requests_total", "method" => "GET", "path" => "/api").increment(1);
// 直方图:请求延迟
let start = std::time::Instant::now();
// ... 业务逻辑 ...
let duration = start.elapsed();
histogram!("http_request_duration_seconds").record(duration.as_secs_f64());
// 仪表盘:当前连接数
gauge!("active_connections").set(42.0);
}
中间件自动收集 HTTP 指标
use axum::{middleware, extract::Request, response::Response};
use std::time::Instant;
async fn metrics_middleware(
req: Request,
next: middleware::Next,
) -> Response {
let method = req.method().to_string();
let path = req.uri().path().to_string();
let start = Instant::now();
let response = next.run(req).await;
let status = response.status().as_u16().to_string();
let duration = start.elapsed().as_secs_f64();
counter!("http_requests_total",
"method" => method.clone(),
"path" => path.clone(),
"status" => status
).increment(1);
histogram!("http_request_duration_seconds",
"method" => method,
"path" => path
).record(duration);
response
}
四类黄金指标
| 指标 | 类型 | 含义 |
|---|---|---|
| 请求速率(Rate) | Counter | QPS/RPM |
| 错误率(Errors) | Counter | 4xx/5xx 占比 |
| 延迟(Duration) | Histogram | P50/P95/P99 |
| 饱和度(Saturation) | Gauge | 连接数、队列长度 |
常见面试问题
Q1: Counter vs Gauge vs Histogram?
答案:
- Counter:只增不减,例如请求总数、错误总数
- Gauge:可增可减,例如当前连接数、内存使用量
- Histogram:采样分布,例如请求延迟的 P50/P95/P99