[ModelRunner] Support embedding inputs (#916)
### What this PR does / why we need it?
- Adds support for passing prompt_embeds to LLM.generate as
```bash
llm.generate({"prompt_embeds": input_embeds}, sampling_params)
```
or
```bash
llm.generate(
[{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)
```
- Add `prompt_embeds` to examples
### How was this patch tested?
CI passed with new added/existing test.
and I have test with the example script in this pr, and the output seems
looks good:
```bash
[Single Inference Output]
------------------------------
The capital of France is Paris. Paris is the largest city in France and is
------------------------------
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s]
[Batch Inference Outputs]
------------------------------
Q1: Please tell me about the capital of France.
A1: The capital of France is Paris. It is located in the northern part of the
Q2: When is the day longest during the year?
A2: The day is longest during the year at the summer solstice. This typically occurs
Q3: Where is bigger, the moon or the sun?
A3: The sun is significantly bigger than the moon.
The sun has a diameter of
------------------------------
```
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
4
.github/workflows/vllm_ascend_test.yaml
vendored
4
.github/workflows/vllm_ascend_test.yaml
vendored
@@ -136,12 +136,14 @@ jobs:
|
||||
pytest -sv tests/singlecard/test_camem.py
|
||||
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
|
||||
pytest -sv tests/singlecard/test_ascend_config.py
|
||||
pytest -sv tests/singlecard/test_prompt_embedding.py
|
||||
pytest -sv tests/singlecard/ \
|
||||
--ignore=tests/singlecard/test_offline_inference.py \
|
||||
--ignore=tests/singlecard/test_scheduler.py \
|
||||
--ignore=tests/singlecard/test_guided_decoding.py \
|
||||
--ignore=tests/singlecard/test_camem.py \
|
||||
--ignore=tests/singlecard/test_ascend_config.py
|
||||
--ignore=tests/singlecard/test_ascend_config.py \
|
||||
--ignore=tests/singlecard/test_prompt_embedding.py
|
||||
else
|
||||
pytest -sv tests/multicard/test_ilama_lora_tp2.py
|
||||
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.
|
||||
|
||||
Reference in New Issue
Block a user