Qwen3.5解析

February 21, 2026 少于 1 分钟阅读

RMSNorm
Linear Attention

源码：qwen3.5

Qwen3.5包含了VIT进行图像处理，所以可以与Qwen3-VL做对比：

RMSNorm稍有区别
LinerAttention与FullAttention混合，比例是3:1
LinearAttention与FullAttention加入了一步Gated操作

RMSNorm

rmsnorm采用zero-centered rmsnorm，参考：https://github.com/huggingface/transformers/pull/29402，

也就是weight基础上会加1。

Linear Attention

Cache的管理

用B代指batch_size，S代指seq_len.

FullAttention的Cache包括key_cache和value_cache，shape均为[B,num_head,S,head_dim]，对应qwen3.5的配置，具体为[B,2,S,256].

LinearAttention的Cache包括conv_states和recurrent_states，shape分别为[B, d_inner, d_conv]和[B, d_inner, d_state]. 对应qwen3.5的配置，具体为[B,12288,4]和[B,64,128,128].

Prefill阶段

其中chunk_gated_delta_rule待更新.

Decode阶段

其中recurrent_gated_delta_rule实现如下：

recurrent_state更新算法如下：

\[S_t = α_tS_{t-1} + β_tk_t(v_t-S^T_{t-1}k_t)^T \\ o_t = S^T_tq_t\]

其中S对应recurrent_state, α对应g，β对应beta，k对应key，v对应value，q对应query.

HarmonyHu

RMSNorm

Linear Attention

Cache的管理

Prefill阶段

Decode阶段