设计聊天机器人

问题

如何用 Python 设计一个基于 LLM 的聊天机器人？如何管理对话上下文？

答案

架构

基础对话系统

chatbot/core.py
from openai import OpenAI
from dataclasses import dataclass, field

@dataclass
class Message:
    role: str  # "system" | "user" | "assistant"
    content: str

class ChatBot:
    def __init__(self, system_prompt: str, max_history: int = 20):
        self.client = OpenAI()
        self.max_history = max_history
        self.history: list[Message] = [Message(role="system", content=system_prompt)]

    def chat(self, user_input: str) -> str:
        self.history.append(Message(role="user", content=user_input))
        # 截断历史（保留 system prompt + 最近 N 条）
        self._truncate_history()

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": m.role, "content": m.content} for m in self.history],
            temperature=0.7,
        )
        reply = response.choices[0].message.content
        self.history.append(Message(role="assistant", content=reply))
        return reply

    def _truncate_history(self):
        if len(self.history) > self.max_history + 1:
            # 保留 system prompt + 最近 N 条
            self.history = [self.history[0]] + self.history[-(self.max_history):]

Function Calling（工具调用）

chatbot/tools.py
import json

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "城市名"},
                },
                "required": ["city"],
            },
        },
    },
]

def get_weather(city: str) -> str:
    # 调用天气 API
    return f"{city}：晴，25°C"

TOOL_MAP = {"get_weather": get_weather}

class ToolChatBot(ChatBot):
    def chat(self, user_input: str) -> str:
        self.history.append(Message(role="user", content=user_input))

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": m.role, "content": m.content} for m in self.history],
            tools=TOOLS,
        )
        msg = response.choices[0].message

        # 模型决定调用工具
        if msg.tool_calls:
            for call in msg.tool_calls:
                fn = TOOL_MAP[call.function.name]
                args = json.loads(call.function.arguments)
                result = fn(**args)
                # 工具结果回传给模型
                self.history.append(Message(role="tool", content=result))

            # 模型根据工具结果生成最终回复
            final = self.client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": m.role, "content": m.content} for m in self.history],
            )
            reply = final.choices[0].message.content
        else:
            reply = msg.content

        self.history.append(Message(role="assistant", content=reply))
        return reply

RAG 增强

chatbot/rag.py
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

class RAGChatBot(ChatBot):
    def __init__(self, system_prompt: str, knowledge_dir: str):
        super().__init__(system_prompt)
        self.vectorstore = Chroma(
            embedding_function=OpenAIEmbeddings(),
            persist_directory=knowledge_dir,
        )

    def chat(self, user_input: str) -> str:
        # 检索相关文档
        docs = self.vectorstore.similarity_search(user_input, k=3)
        context = "\n".join(doc.page_content for doc in docs)

        # 将检索结果注入 prompt
        augmented_input = (
            f"参考以下资料回答用户问题：\n\n{context}\n\n用户问题：{user_input}"
        )
        return super().chat(augmented_input)

流式输出

chatbot/stream.py
async def stream_chat(history: list[dict], user_input: str):
    """FastAPI SSE 流式聊天"""
    history.append({"role": "user", "content": user_input})

    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=history,
        stream=True,
    )
    full_reply = ""
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        full_reply += delta
        yield f"data: {json.dumps({'content': delta})}\n\n"

    history.append({"role": "assistant", "content": full_reply})

常见面试问题

Q1: 如何解决上下文过长导致的 Token 超限？

答案：

滑动窗口：只保留最近 N 轮
摘要压缩：每隔几轮让 LLM 总结之前的对话
检索式上下文：将历史存入向量库，按相关性检索
Token 计数：用 tiktoken 统计，超限时主动截断

Q2: 如何防止 Prompt 注入？

答案：

输入过滤：检测并移除可疑指令
System Prompt 加固：明确拒绝指令修改请求
输入输出审核：用审核 API 检查敏感内容
权限隔离：工具调用的权限最小化

Q3: 多轮对话中如何追踪用户意图？

答案：

Slot Filling：提取关键槽位（城市、日期），跨轮累积
对话状态机：显式管理对话阶段
LLM 原生：依赖上下文和 Tool Calling 自动推理

问题​

答案​

架构​

基础对话系统​

Function Calling（工具调用）​

RAG 增强​

流式输出​

常见面试问题​

Q1: 如何解决上下文过长导致的 Token 超限？​

Q2: 如何防止 Prompt 注入？​

Q3: 多轮对话中如何追踪用户意图？​

相关链接​

问题

答案

架构

基础对话系统

Function Calling（工具调用）

RAG 增强

流式输出

常见面试问题

Q1: 如何解决上下文过长导致的 Token 超限？

Q2: 如何防止 Prompt 注入？

Q3: 多轮对话中如何追踪用户意图？

相关链接