LoRA学习
LoRA概念
-
论文:[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models
-
LoRA: Low Rank Adaptation,低秩适应,一种高效的微调技术,在原有LLM基础上额外增加少量可训练的参数,而非对整个LLM进行训练。训练速度更快,适合资源有限场景。
-
lora_rank: 简写成r,低秩,一般很小,r <= 32。将原有矩阵参数,转换成$W = W_0 + \Delta W$,其中$\Delta W=A \times B$。举例说明权重
[2048, 1024] + [2048, 32] x [32, 1024]
,其中32就是lora_rank
,W的参数量2M,$\Delta W$的参数量96K。也就是LoRA只用训练96K即可。 -
lora_alpha: 是LoRA的缩放系数,起到调整更新 $\Delta W$ 幅度的作用,这时$\Delta{W} = \frac{lora_alpha}{r}A\times B$
-
前向推导公式:$ h = W_0x + \Delta Wx = W_0x + \frac{lora_alpha}{r}BAx $
LoRALinear
来源:Fine-Tuning Llama2 with LoRA — torchtune 0.4 documentation
从以下简化的源码可以看出LoRA的计算逻辑:
import torch
from torch import nn
class LoRALinear(nn.Module):
def __init__(self,in_dim: int,out_dim: int,rank: int,alpha: float,dropout: float):
# These are the weights from the original pretrained model
self.linear = nn.Linear(in_dim, out_dim, bias=False)
# These are the new LoRA params. In general rank << in_dim, out_dim
self.lora_a = nn.Linear(in_dim, rank, bias=False)
self.lora_b = nn.Linear(rank, out_dim, bias=False)
# Rank and alpha are commonly-tuned hyperparameters
self.rank = rank
self.alpha = alpha
# Most implementations also include some dropout
self.dropout = nn.Dropout(p=dropout)
# The original params are frozen, and only LoRA params are trainable.
self.linear.weight.requires_grad = False
self.lora_a.weight.requires_grad = True
self.lora_b.weight.requires_grad = True
def forward(self, x: torch.Tensor) -> torch.Tensor:
# This would be the output of the original model
frozen_out = self.linear(x)
# lora_a projects inputs down to the much smaller self.rank,
# then lora_b projects back up to the output dimension
lora_out = self.lora_b(self.lora_a(self.dropout(x)))
# Finally, scale by the alpha parameter (normalized by rank)
# and add to the original model's outputs
return frozen_out + (self.alpha / self.rank) * lora_out
查看原始模型,与Lora模型的不同:
# Print the first layer's self-attention in the usual Llama2 model
>>> print(base_model.layers[0].attn)
MultiHeadAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(output_proj): Linear(in_features=4096, out_features=4096, bias=False)
(pos_embeddings): RotaryPositionalEmbeddings()
)
# Print the same for Llama2 with LoRA weights
>>> print(lora_model.layers[0].attn)
MultiHeadAttention(
(q_proj): LoRALinear(
(dropout): Dropout(p=0.0, inplace=False)
(lora_a): Linear(in_features=4096, out_features=8, bias=False)
(lora_b): Linear(in_features=8, out_features=4096, bias=False)
)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): LoRALinear(
(dropout): Dropout(p=0.0, inplace=False)
(lora_a): Linear(in_features=4096, out_features=8, bias=False)
(lora_b): Linear(in_features=8, out_features=4096, bias=False)
)
(output_proj): Linear(in_features=4096, out_features=4096, bias=False)
(pos_embeddings): RotaryPositionalEmbeddings()
)
PEFT
Parameter Efficient Fine Tuning, 参数高效微调技术,包含LoRA、IA3、LoKr、Prefix tuning等等多种微调方法
安装:pip3 install peft
配置文件
配置文件为config.json
文件,内容参考如下:
{
"base_model_name_or_path": "facebook/opt-350m", #base model to apply LoRA to
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"lora_alpha": 32,
"lora_dropout": 0.05,
"modules_to_save": null,
"peft_type": "LORA", #PEFT method type
"r": 16,
"revision": null,
"target_modules": [
"q_proj", #model modules to apply LoRA to (query and value projection layers)
"v_proj"
],
"task_type": "CAUSAL_LM" #type of task to train model on
}
配置中比较关键的参数说明如下:
base_model_name_or_path
: 表示基础模型路径peft_type
: 表示微调算法类型lora_alpha
: 表示缩放系数r
: 表示lora_rank
target_modules
: 表示目标模块
模型训练
第一步,加载模型, 并根据lora配置转换成PeftModel, 如下:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
from peft import get_peft_model
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "v_proj"],
inference_mode=False, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
lora_dropout=0.1# Dropout 比例
)
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
"trainable params: 1,572,864 || all params: 332,769,280 || trainable%: 0.472659014678278"
第二步, 训练PeftModel
args = TrainingArguments(
output_dir="facebook/opt-350m-lora",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
logging_steps=10,
num_train_epochs=3,
save_steps=10, # 为了快速演示,这里设置10,建议你设置成100
learning_rate=1e-4,
save_on_each_node=True,
gradient_checkpointing=True
)
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_id,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()
第三步, 保存权重,如下:
lora_model.save_pretrained("your-name/opt-350m-lora")
模型推理
方法一:
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")
方法二:
from peft import AutoPeftModelForCausalLM
lora_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")