设计聊天机器人
问题
如何用 Python 设计一个基于 LLM 的聊天机器人?如何管理对话上下文?
答案
架构
基础对话系统
chatbot/core.py
from openai import OpenAI
from dataclasses import dataclass, field
@dataclass
class Message:
role: str # "system" | "user" | "assistant"
content: str
class ChatBot:
def __init__(self, system_prompt: str, max_history: int = 20):
self.client = OpenAI()
self.max_history = max_history
self.history: list[Message] = [Message(role="system", content=system_prompt)]
def chat(self, user_input: str) -> str:
self.history.append(Message(role="user", content=user_input))
# 截断历史(保留 system prompt + 最近 N 条)
self._truncate_history()
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
temperature=0.7,
)
reply = response.choices[0].message.content
self.history.append(Message(role="assistant", content=reply))
return reply
def _truncate_history(self):
if len(self.history) > self.max_history + 1:
# 保留 system prompt + 最近 N 条
self.history = [self.history[0]] + self.history[-(self.max_history):]
Function Calling(工具调用)
chatbot/tools.py
import json
TOOLS = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "城市名"},
},
"required": ["city"],
},
},
},
]
def get_weather(city: str) -> str:
# 调用天气 API
return f"{city}:晴,25°C"
TOOL_MAP = {"get_weather": get_weather}
class ToolChatBot(ChatBot):
def chat(self, user_input: str) -> str:
self.history.append(Message(role="user", content=user_input))
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
tools=TOOLS,
)
msg = response.choices[0].message
# 模型决定调用工具
if msg.tool_calls:
for call in msg.tool_calls:
fn = TOOL_MAP[call.function.name]
args = json.loads(call.function.arguments)
result = fn(**args)
# 工具结果回传给模型
self.history.append(Message(role="tool", content=result))
# 模型根据工具结果生成最终回复
final = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
)
reply = final.choices[0].message.content
else:
reply = msg.content
self.history.append(Message(role="assistant", content=reply))
return reply
RAG 增强
chatbot/rag.py
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
class RAGChatBot(ChatBot):
def __init__(self, system_prompt: str, knowledge_dir: str):
super().__init__(system_prompt)
self.vectorstore = Chroma(
embedding_function=OpenAIEmbeddings(),
persist_directory=knowledge_dir,
)
def chat(self, user_input: str) -> str:
# 检索相关文档
docs = self.vectorstore.similarity_search(user_input, k=3)
context = "\n".join(doc.page_content for doc in docs)
# 将检索结果注入 prompt
augmented_input = (
f"参考以下资料回答用户问题:\n\n{context}\n\n用户问题:{user_input}"
)
return super().chat(augmented_input)
流式输出
chatbot/stream.py
async def stream_chat(history: list[dict], user_input: str):
"""FastAPI SSE 流式聊天"""
history.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model="gpt-4o",
messages=history,
stream=True,
)
full_reply = ""
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
full_reply += delta
yield f"data: {json.dumps({'content': delta})}\n\n"
history.append({"role": "assistant", "content": full_reply})
常见面试问题
Q1: 如何解决上下文过长导致的 Token 超限?
答案:
- 滑动窗口:只保留最近 N 轮
- 摘要压缩:每隔几轮让 LLM 总结之前的对话
- 检索式上下文:将历史存入向量库,按相关性检索
- Token 计数:用 tiktoken 统计,超限时主动截断
Q2: 如何防止 Prompt 注入?
答案:
- 输入过滤:检测并移除可疑指令
- System Prompt 加固:明确拒绝指令修改请求
- 输入输出审核:用审核 API 检查敏感内容
- 权限隔离:工具调用的权限最小化
Q3: 多轮对话中如何追踪用户意图?
答案:
- Slot Filling:提取关键槽位(城市、日期),跨轮累积
- 对话状态机:显式管理对话阶段
- LLM 原生:依赖上下文和 Tool Calling 自动推理