update readme

2026-02-11 17:55:59 +08:00
parent cfc0614191
commit bc9ae6a58a
1 changed files with 9 additions and 43 deletions
--- a/README.md
+++ b/README.md
@@ -4,57 +4,23 @@

 ## 版本更新记录

-### v0.0.6.2 — 2026-02-11 · Llama4 模型支持
+**v0.0.6.2** — 2026-02-11 · Llama4 模型支持，含 sigmoid routing MoE、QK Norm、交替 dense/MoE 层；由于 MLU370（capability=3）限制，MoE 改为 dense 模式解决 graph capture 兼容性（⚠️ 计算量增大，DeepSeek V2/V3 不受影响）

- 新建 `Llama4ForCausalLM` 模型实现：复合 config 处理、sigmoid routing MoE、QK Normalization、交替 dense/MoE 层
- 新建 MLU hijack 适配：SparseMoeMlp MoE 替换、embedding dtype 修复
- 处理 `Llama4Config` 嵌套 `text_config` 的 architectures 提取问题
- ⚠️ MoE dense 模式（影响所有 MoE 模型）：原始 `forward_experts_nofused` 含 graph capture 不兼容操作，改为 dense 模式解决兼容性，但计算量增大。DeepSeek V2/V3 不受影响（有独立 MLU MoE hijack）
+**v0.0.6.1** — 2026-02-11 · DeepSeek V3 MTP 推测解码，新建 MTP draft model 复用 DeepseekV2DecoderLayer，自动检测并启用 MTP speculative decoding

-### v0.0.6.1 — 2026-02-11 · DeepSeek V3 MTP 推测解码
+**v0.0.6** — 2026-02-11 · DeepSeek V3 模型支持，复用 V2 实现，新增 `noaux_tc` 路由，修复 MLA unpaged 缓存算子

- 新建 `deepseek_mtp.py` 实现 MTP draft model，复用 DeepseekV2DecoderLayer，以 EAGLE 为模板适配
- `SpeculativeConfig` 自动检测 `num_nextn_predict_layers` 并改写 draft config
- target worker 为 MTP 返回 hidden states（仅 target，不影响 draft worker）
- MLU config 三处 model_type 判断扩展支持 `deepseek_mtp`，匹配 MLA cache 格式
+**v0.0.5** — 2026-02-10 · Qwen3MoE 模型支持，修复 FusedMoE `forward_mlu` 签名 bug

-### v0.0.6 — 2026-02-11 · DeepSeek V3 模型支持
+**v0.0.4.1** — 2026-02-10 · Gemma3 rope 兼容性修复，适配 MLU rotary_emb 接口

- 注册 `DeepseekV3ForCausalLM`（复用 V2 实现），扩展 MLU MLA config 判断支持 `deepseek_v3`
- 实现 `noaux_tc` 路由方式（`e_score_correction_bias`）
- 跳过 MTP 层权重加载
- 修复 MLA unpaged 缓存路径使用错误的 paged cache 算子（prefill + decode 均替换为 `reshape_linear_cache`）
+**v0.0.4** — 2026-02-10 · Gemma3 模型支持，含 QK Norm、per-layer rope、滑动窗口

-### v0.0.5 — 2026-02-10 · Qwen3MoE 模型支持
+**v0.0.3.1** — 2026-02-06 · CNNL Tensor 溢出修复，KV cache 元素数 int32 上限防护

- 新增 `Qwen3MoeForCausalLM` 模型实现（QK Normalization、ReplicatedLinear shared_expert_gate）
- 修复 FusedMoE `forward_mlu` 签名缺少 `layer` 参数的已有 bug（影响所有 MLU 上的 MoE 模型）
+**v0.0.3** — 2026-02-06 · Transformers 通用后端，支持 `auto_map` 加载自定义 HF 模型

-### v0.0.4.1 — 2026-02-10 · Gemma3 rope 兼容性修复
-
- 修复新版 transformers `Gemma3TextConfig` 缺少 `rope_theta` 属性的问题，从 `rope_parameters` 字典兼容提取
- 修复 `rope_scaling` 嵌套字典导致 `get_rope` 缓存 unhashable 的问题
- 适配 MLU `forward_mlu` 接口，将 q/k 合并为单张量调用 rotary_emb 后再拆分
-
-### v0.0.4 — 2026-02-10 · Gemma3 模型支持
-
- 新增 `Gemma3ForCausalLM` 模型实现（QK Normalization、per-layer rope 配置、layer_types 滑动窗口）
- 修复 `patch_rope_scaling_dict` 在 rope_scaling 缺少 `rope_type` 键时崩溃的问题
-
-### v0.0.3.1 — 2026-02-06 · CNNL Tensor 溢出修复
-
- 解决极小模型在大显存设备上部署时 KV cache 元素数超过 int32 限制的问题
- 在 mlu_worker 和 cache_engine 中添加双重防护
-
-### v0.0.3 — 2026-02-06 · Transformers 通用后端
-
- 支持通过 `auto_map` 加载任意自定义 HuggingFace 模型
- 新增 registry 回退逻辑、Linear 返回值处理、RMSNorm 维度恢复等
-
-### v0.0.2 — 2026-02-04 · Qwen3 模型支持
-
- 实现 QK Normalization 架构适配
- 修复 rope_scaling 和 tokenizer 兼容性问题，解决张量连续性导致的 view 操作失败
+**v0.0.2** — 2026-02-04 · Qwen3 模型支持，QK Norm 适配，修复 rope/tokenizer 兼容性

 ---