### What this PR does / why we need it?
This PR makes the following modifications:
1.delete the `user_guide/feature_guide/quantization-llm-compressor.md`
and merge it into `user_guide/feature_guide/quantization.md`.
2.update the content of `user_guide/feature_guide/quantization.md`.
3.add guidance `developer_guide/feature_guide/quantization.md' on the
adaptation of quantization algorithms and quantized models.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
7157596103
---------
Signed-off-by: IncSec <1790766300@qq.com>
Signed-off-by: InSec <1790766300@qq.com>
24 lines
348 B
Markdown
24 lines
348 B
Markdown
# Feature Guide
|
|
|
|
This section provides a detailed usage guide of vLLM Ascend features.
|
|
|
|
:::{toctree}
|
|
:caption: Feature Guide
|
|
:maxdepth: 1
|
|
graph_mode
|
|
quantization
|
|
sleep_mode
|
|
structured_output
|
|
lora
|
|
eplb_swift_balancer
|
|
netloader
|
|
dynamic_batch
|
|
kv_pool
|
|
external_dp
|
|
large_scale_ep
|
|
ucm_deployment
|
|
Fine_grained_TP
|
|
speculative_decoding
|
|
context_parallel
|
|
:::
|