跳到主要内容

Hugging Face 生态

问题

Hugging Face 的 Transformers 库核心功能是什么?如何做模型推理和微调?

答案

Pipeline(快速推理)

from transformers import pipeline

# 文本分类
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("I love this movie!")
# [{'label': 'POSITIVE', 'score': 0.9998}]

# 文本生成
generator = pipeline("text-generation", model="gpt2")
output = generator("Python is", max_length=50)

# 问答
qa = pipeline("question-answering")
result = qa(question="What is Python?", context="Python is a programming language.")

Tokenizer + Model(精细控制)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-chinese", num_labels=3)

# 编码
inputs = tokenizer("Python 是最好的编程语言", return_tensors="pt", padding=True, truncation=True)

# 推理
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted = torch.argmax(probs, dim=-1)

微调(Fine-tuning)

from transformers import Trainer, TrainingArguments
from datasets import load_dataset

# 加载数据集
dataset = load_dataset("imdb")

# 数据预处理
def tokenize(examples):
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)

tokenized = dataset.map(tokenize, batched=True)

# 训练配置
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=32,
learning_rate=2e-5,
weight_decay=0.01,
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
fp16=True, # 混合精度
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized["train"],
eval_dataset=tokenized["test"],
)

trainer.train()

Datasets 库

from datasets import load_dataset, Dataset

# 加载 Hugging Face Hub 数据集
ds = load_dataset("squad", split="train[:1000]")

# 从本地数据创建
ds = Dataset.from_pandas(df)
ds = Dataset.from_csv("data.csv")

# 数据处理(惰性 map,支持多进程)
ds = ds.map(preprocess, batched=True, num_proc=4)
ds = ds.filter(lambda x: len(x["text"]) > 100)

常见面试问题

Q1: Hugging Face Hub 是什么?

答案

Hugging Face Hub 是模型和数据集的托管平台(类似 GitHub for ML):

  • Models:10 万+ 预训练模型
  • Datasets:数万个公开数据集
  • Spaces:部署 ML Demo(Gradio/Streamlit)

Q2: 如何量化模型减少显存?

答案

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# 4-bit 量化(QLoRA)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=bnb_config,
device_map="auto",
)

4-bit 量化可以将 7B 模型的显存需求从 28GB 降到 ~4GB。

Q3: LoRA 微调的原理?

答案

LoRA(Low-Rank Adaptation)冻结原始权重,只训练低秩分解的增量矩阵:

  • 原始权重 WRd×kW \in \mathbb{R}^{d \times k} 不变
  • 增加 ΔW=BA\Delta W = BA,其中 BRd×rB \in \mathbb{R}^{d \times r}ARr×kA \in \mathbb{R}^{r \times k}rdr \ll d
  • 只需训练 AABB,参数量大幅减少
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || 0.06%

相关链接