[Core] Support pooling (#229)

This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create an LLM. model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) # Generate embedding. The output is a list of EmbeddingRequestOutputs. outputs = model.encode(prompts) # Print the outputs. for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: https://github.com/vllm-project/vllm-ascend/issues/200 ## Known issue The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-04 15:59:34 +08:00
parent 8fda31cafe
commit ae49bfd13a
7 changed files with 258 additions and 71 deletions
--- a/docs/source/user_guide/suppoted_features.md
+++ b/docs/source/user_guide/suppoted_features.md
@@ -7,7 +7,7 @@
 | LoRA | ✗ | Plan in 2025 Q1 |
 | Prompt adapter | ✗ | Plan in 2025 Q1 |
 | Speculative decoding | ✗ | Plan in 2025 Q1 |
-| Pooling | ✗ | Plan in 2025 Q2 |
+| Pooling | ✅ | |
 | Enc-dec | ✗ | Plan in 2025 Q2 |
 | Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
 | LogProbs | ✅ ||