HTTP 客户端
问题
Python 中有哪些 HTTP 客户端库?它们有什么区别?
答案
requests(同步)
最流行的 HTTP 客户端,简单易用:
import requests
# GET 请求
resp = requests.get("https://api.example.com/users", params={"page": 1})
resp.raise_for_status() # 非 2xx 抛异常
data = resp.json()
# POST 请求
resp = requests.post(
"https://api.example.com/users",
json={"name": "Alice"},
headers={"Authorization": "Bearer token"},
timeout=10,
)
# Session 复用连接(推荐)
with requests.Session() as s:
s.headers.update({"Authorization": "Bearer token"})
s.get("https://api.example.com/users")
s.get("https://api.example.com/posts")
httpx(同步 + 异步)
httpx 是 requests 的现代替代,支持 async 和 HTTP/2:
import httpx
# 同步使用(兼容 requests API)
resp = httpx.get("https://api.example.com/users")
# 异步使用
async with httpx.AsyncClient(base_url="https://api.example.com") as client:
resp = await client.get("/users")
data = resp.json()
# HTTP/2 支持
async with httpx.AsyncClient(http2=True) as client:
resp = await client.get("https://api.example.com/users")
aiohttp(异步)
高性能异步 HTTP 客户端/服务器:
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get("https://api.example.com/users") as resp:
data = await resp.json()
对比
| 特性 | requests | httpx | aiohttp |
|---|---|---|---|
| 同步 | ✅ | ✅ | ❌ |
| 异步 | ❌ | ✅ | ✅ |
| HTTP/2 | ❌ | ✅ | ❌ |
| 连接池 | Session | Client | ClientSession |
| 流式下载 | ✅ | ✅ | ✅ |
| API 友好度 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
常见面试问题
Q1: requests 和 httpx 怎么选?
答案:
- requests:同步项目、脚本、简单爬虫
- httpx:需要异步、HTTP/2、或者想用统一 API 处理同步/异步的项目
- aiohttp:纯异步项目、高并发爬虫
Q2: 如何做并发请求?
答案:
import asyncio
import httpx
async def fetch_all(urls: list[str]) -> list[dict]:
async with httpx.AsyncClient() as client:
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return [r.json() for r in responses]
Q3: 如何处理超时和重试?
答案:
from httpx import Timeout
from tenacity import retry, stop_after_attempt, wait_exponential
# httpx 超时配置
timeout = Timeout(connect=5.0, read=10.0, write=5.0, pool=5.0)
client = httpx.AsyncClient(timeout=timeout)
# 使用 tenacity 重试
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
async def fetch_with_retry(url: str) -> dict:
async with httpx.AsyncClient() as client:
resp = await client.get(url)
resp.raise_for_status()
return resp.json()