LoRA学习

April 17, 2024 10 分钟阅读

LoRA概念
LoRALinear
PEFT

LoRA概念

论文：[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low Rank Adaptation，低秩适应，一种高效的微调技术，在原有LLM基础上额外增加少量可训练的参数，而非对整个LLM进行训练。训练速度更快，适合资源有限场景。
lora_rank: 简写成r，低秩，一般很小，r <= 32。将原有矩阵参数，转换成$W = W_0 + \Delta W$，其中$\Delta W=A \times B$。举例说明权重[2048, 1024] + [2048, 32] x [32, 1024]，其中32就是lora_rank，W的参数量2M，$\Delta W$的参数量96K。也就是LoRA只用训练96K即可。
lora_alpha: 是LoRA的缩放系数，起到调整更新 $\Delta W$ 幅度的作用，这时$\Delta{W} = \frac{lora_alpha}{r}A\times B$
前向推导公式：$ h = W_0x + \Delta Wx = W_0x + \frac{lora_alpha}{r}BAx $

LoRALinear

来源：Fine-Tuning Llama2 with LoRA — torchtune 0.4 documentation

从以下简化的源码可以看出LoRA的计算逻辑：

import torch
from torch import nn

class LoRALinear(nn.Module):
  def __init__(self,in_dim: int,out_dim: int,rank: int,alpha: float,dropout: float):
    # These are the weights from the original pretrained model
    self.linear = nn.Linear(in_dim, out_dim, bias=False)
    # These are the new LoRA params. In general rank << in_dim, out_dim
    self.lora_a = nn.Linear(in_dim, rank, bias=False)
    self.lora_b = nn.Linear(rank, out_dim, bias=False)

    # Rank and alpha are commonly-tuned hyperparameters
    self.rank = rank
    self.alpha = alpha

    # Most implementations also include some dropout
    self.dropout = nn.Dropout(p=dropout)

    # The original params are frozen, and only LoRA params are trainable.
    self.linear.weight.requires_grad = False
    self.lora_a.weight.requires_grad = True
    self.lora_b.weight.requires_grad = True

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    # This would be the output of the original model
    frozen_out = self.linear(x)

    # lora_a projects inputs down to the much smaller self.rank,
    # then lora_b projects back up to the output dimension
    lora_out = self.lora_b(self.lora_a(self.dropout(x)))

    # Finally, scale by the alpha parameter (normalized by rank)
    # and add to the original model's outputs
    return frozen_out + (self.alpha / self.rank) * lora_out

查看原始模型，与Lora模型的不同：

# Print the first layer's self-attention in the usual Llama2 model
>>> print(base_model.layers[0].attn)
MultiHeadAttention(
  (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (output_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (pos_embeddings): RotaryPositionalEmbeddings()
)

# Print the same for Llama2 with LoRA weights
>>> print(lora_model.layers[0].attn)
MultiHeadAttention(
  (q_proj): LoRALinear(
    (dropout): Dropout(p=0.0, inplace=False)
    (lora_a): Linear(in_features=4096, out_features=8, bias=False)
    (lora_b): Linear(in_features=8, out_features=4096, bias=False)
  )
  (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (v_proj): LoRALinear(
    (dropout): Dropout(p=0.0, inplace=False)
    (lora_a): Linear(in_features=4096, out_features=8, bias=False)
    (lora_b): Linear(in_features=8, out_features=4096, bias=False)
  )
  (output_proj): Linear(in_features=4096, out_features=4096, bias=False)
  (pos_embeddings): RotaryPositionalEmbeddings()
)

PEFT

来源：Huggingface PEFT

Parameter Efficient Fine Tuning, 参数高效微调技术，包含LoRA、IA3、LoKr、Prefix tuning等等多种微调方法

安装：pip3 install peft

配置文件

配置文件为config.json文件，内容参考如下：

{
  "base_model_name_or_path": "facebook/opt-350m", #base model to apply LoRA to
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "modules_to_save": null,
  "peft_type": "LORA", #PEFT method type
  "r": 16,
  "revision": null,
  "target_modules": [
    "q_proj", #model modules to apply LoRA to (query and value projection layers)
    "v_proj"
  ],
  "task_type": "CAUSAL_LM" #type of task to train model on
}

配置中比较关键的参数说明如下：

base_model_name_or_path: 表示基础模型路径
peft_type: 表示微调算法类型
lora_alpha: 表示缩放系数
r: 表示lora_rank
target_modules: 表示目标模块

模型训练

第一步，加载模型, 并根据lora配置转换成PeftModel, 如下:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")

from peft import get_peft_model

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "v_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1# Dropout 比例
)

lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
"trainable params: 1,572,864 || all params: 332,769,280 || trainable%: 0.472659014678278"

第二步, 训练PeftModel

args = TrainingArguments(
    output_dir="facebook/opt-350m-lora",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    logging_steps=10,
    num_train_epochs=3,
    save_steps=10, # 为了快速演示，这里设置10，建议你设置成100
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()

第三步, 保存权重，如下:

lora_model.save_pretrained("your-name/opt-350m-lora")

模型推理

方法一：

from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")

方法二：

from peft import AutoPeftModelForCausalLM

lora_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")

HarmonyHu