xc-llm-ascend/docs/source/user_guide/suppoted_features.md

# Feature Support

| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
| LoRA | ✗ | Plan in 2025 Q1 |
| Prompt adapter | ✗ | Plan in 2025 Q1 |
| Speculative decoding | ✗ | Plan in 2025 Q1 |
| Pooling | ✅ | |
| Enc-dec | ✗ | Plan in 2025 Q2 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs | ✅ ||
| Prompt logProbs | ✅ ||
| Async output | ✅ ||
| Multi step scheduler | ✗ | Plan in 2025 Q1 |
| Best of | ✅ ||
| Beam search | ✅ ||
| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
| Tensor Parallel | ✅ | Only "mp" supported now |
| Pipeline Parallel | ✅ | Only "mp" supported now |
[Docs] Add official doc index (#29) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-02-11 12:00:27 +08:00			`# Feature Support`

			`\| Feature \| Supported \| Note \|`
			`\|---------\|-----------\|------\|`
			`\| Chunked Prefill \| ✗ \| Plan in 2025 Q1 \|`
[doc] fix feature support (#70) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com> 2025-02-17 15:43:37 +08:00			`\| Automatic Prefix Caching \| ✅ \| Improve performance in 2025 Q2 \|`
[Docs] Add official doc index (#29) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-02-11 12:00:27 +08:00			`\| LoRA \| ✗ \| Plan in 2025 Q1 \|`
[doc] fix feature support (#70) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com> 2025-02-17 15:43:37 +08:00			`\| Prompt adapter \| ✗ \| Plan in 2025 Q1 \|`
			`\| Speculative decoding \| ✗ \| Plan in 2025 Q1 \|`
[Core] Support pooling (#229) This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create an LLM. model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) # Generate embedding. The output is a list of EmbeddingRequestOutputs. outputs = model.encode(prompts) # Print the outputs. for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: https://github.com/vllm-project/vllm-ascend/issues/200 ## Known issue The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-03-04 15:59:34 +08:00			`\| Pooling \| ✅ \| \|`
[doc] fix feature support (#70) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com> 2025-02-17 15:43:37 +08:00			`\| Enc-dec \| ✗ \| Plan in 2025 Q2 \|`
[Docs] Add official doc index (#29) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-02-11 12:00:27 +08:00			`\| Multi Modality \| ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)\| Add more model support in 2025 Q1 \|`
			`\| LogProbs \| ✅ \|\|`
			`\| Prompt logProbs \| ✅ \|\|`
			`\| Async output \| ✅ \|\|`
[doc] fix feature support (#70) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com> 2025-02-17 15:43:37 +08:00			`\| Multi step scheduler \| ✗ \| Plan in 2025 Q1 \|`
[Docs] Add official doc index (#29) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-02-11 12:00:27 +08:00			`\| Best of \| ✅ \|\|`
			`\| Beam search \| ✅ \|\|`
[Doc] Update Feature Support doc (#234) ### What this PR does / why we need it? Update Feature Support doc. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. --------- Signed-off-by: Shanshan Shen <467638484@qq.com> 2025-03-04 14:18:32 +08:00			`\| Guided Decoding \| ✅ \| Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) \|`
[doc] fix feature support (#70) Check and update the feature support table. - both multi-step and speculative decoding require adaptation of corresponding workers - prompt adapter (finetune method) require adaption in worker.py and model_runner.py Signed-off-by: MengqingCao <cmq0113@163.com> 2025-02-17 15:43:37 +08:00			`\| Tensor Parallel \| ✅ \| Only "mp" supported now \|`
			`\| Pipeline Parallel \| ✅ \| Only "mp" supported now \|`