跳到主要内容

设计聊天机器人

问题

如何用 Python 设计一个基于 LLM 的聊天机器人?如何管理对话上下文?

答案

架构

基础对话系统

chatbot/core.py
from openai import OpenAI
from dataclasses import dataclass, field

@dataclass
class Message:
role: str # "system" | "user" | "assistant"
content: str

class ChatBot:
def __init__(self, system_prompt: str, max_history: int = 20):
self.client = OpenAI()
self.max_history = max_history
self.history: list[Message] = [Message(role="system", content=system_prompt)]

def chat(self, user_input: str) -> str:
self.history.append(Message(role="user", content=user_input))
# 截断历史(保留 system prompt + 最近 N 条)
self._truncate_history()

response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
temperature=0.7,
)
reply = response.choices[0].message.content
self.history.append(Message(role="assistant", content=reply))
return reply

def _truncate_history(self):
if len(self.history) > self.max_history + 1:
# 保留 system prompt + 最近 N 条
self.history = [self.history[0]] + self.history[-(self.max_history):]

Function Calling(工具调用)

chatbot/tools.py
import json

TOOLS = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "城市名"},
},
"required": ["city"],
},
},
},
]

def get_weather(city: str) -> str:
# 调用天气 API
return f"{city}:晴,25°C"

TOOL_MAP = {"get_weather": get_weather}

class ToolChatBot(ChatBot):
def chat(self, user_input: str) -> str:
self.history.append(Message(role="user", content=user_input))

response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
tools=TOOLS,
)
msg = response.choices[0].message

# 模型决定调用工具
if msg.tool_calls:
for call in msg.tool_calls:
fn = TOOL_MAP[call.function.name]
args = json.loads(call.function.arguments)
result = fn(**args)
# 工具结果回传给模型
self.history.append(Message(role="tool", content=result))

# 模型根据工具结果生成最终回复
final = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": m.role, "content": m.content} for m in self.history],
)
reply = final.choices[0].message.content
else:
reply = msg.content

self.history.append(Message(role="assistant", content=reply))
return reply

RAG 增强

chatbot/rag.py
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

class RAGChatBot(ChatBot):
def __init__(self, system_prompt: str, knowledge_dir: str):
super().__init__(system_prompt)
self.vectorstore = Chroma(
embedding_function=OpenAIEmbeddings(),
persist_directory=knowledge_dir,
)

def chat(self, user_input: str) -> str:
# 检索相关文档
docs = self.vectorstore.similarity_search(user_input, k=3)
context = "\n".join(doc.page_content for doc in docs)

# 将检索结果注入 prompt
augmented_input = (
f"参考以下资料回答用户问题:\n\n{context}\n\n用户问题:{user_input}"
)
return super().chat(augmented_input)

流式输出

chatbot/stream.py
async def stream_chat(history: list[dict], user_input: str):
"""FastAPI SSE 流式聊天"""
history.append({"role": "user", "content": user_input})

stream = client.chat.completions.create(
model="gpt-4o",
messages=history,
stream=True,
)
full_reply = ""
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
full_reply += delta
yield f"data: {json.dumps({'content': delta})}\n\n"

history.append({"role": "assistant", "content": full_reply})

常见面试问题

Q1: 如何解决上下文过长导致的 Token 超限?

答案

  1. 滑动窗口:只保留最近 N 轮
  2. 摘要压缩:每隔几轮让 LLM 总结之前的对话
  3. 检索式上下文:将历史存入向量库,按相关性检索
  4. Token 计数:用 tiktoken 统计,超限时主动截断

Q2: 如何防止 Prompt 注入?

答案

  • 输入过滤:检测并移除可疑指令
  • System Prompt 加固:明确拒绝指令修改请求
  • 输入输出审核:用审核 API 检查敏感内容
  • 权限隔离:工具调用的权限最小化

Q3: 多轮对话中如何追踪用户意图?

答案

  • Slot Filling:提取关键槽位(城市、日期),跨轮累积
  • 对话状态机:显式管理对话阶段
  • LLM 原生:依赖上下文和 Tool Calling 自动推理

相关链接