[Doc][Misc] Correcting the document and uploading the model deployment template (#8287)

### What this PR does / why we need it? Correcting the document and uploading the model deployment template ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-15 16:03:11 +08:00
parent 147b589f62
commit 95726d20eb
31 changed files with 536 additions and 308 deletions
--- a/docs/source/user_guide/feature_guide/Fine_grained_TP.md
+++ b/docs/source/user_guide/feature_guide/Fine_grained_TP.md
@@ -1,4 +1,4 @@
-# Fine-Grained Tensor Parallelism (Finegrained TP)
+# Fine-Grained Tensor Parallelism (Fine-grained TP)

 ## Overview

@@ -8,12 +8,12 @@ This capability supports heterogeneous parallelism strategies within a single mo

 ---

-## Benefits of Finegrained TP
+## Benefits of Fine-grained TP

 Fine-Grained Tensor Parallelism delivers two primary performance advantages through targeted weight sharding:

 - **Reduced Per-Device Memory Footprint**:  
-  Fine-grained TP shards large weight matrices(e.g., LM Head, o_proj)across devices, lowering peak memory usage and enabling larger batches or deployment on memory-limited hardware—without quantization.
+  Fine-grained TP shards large weight matrices (e.g., LM Head, o_proj) across devices, lowering peak memory usage and enabling larger batches or deployment on memory-limited hardware—without quantization.
  
 - **Faster Memory Access in GEMMs**:  
  In decode-heavy workloads, GEMM performance is often memory-bound. Weight sharding reduces per-device weight fetch volume, cutting DRAM traffic and improving bandwidth efficiency—especially for latency-sensitive layers like LM Head and o_proj.
@@ -53,7 +53,7 @@ The Fine-Grained TP size for any component must:

 ---

-## How to Use Finegrained TP
+## How to Use Fine-grained TP

 ### Configuration Format