Commit Graph

88 Commits

Author SHA1 Message Date
whx
feb6bdb12e [Platform][Model Runner] Add hash of request_ids; Change blocksize back to 128. (#293)
This PR changes the initial value of blocksize back to 128 and adds hash
value of request id list in model runner for implementing sampling param
cache in sampler.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
2025-03-11 18:50:28 +08:00
Yikun Jiang
007aeaa48b [Doc] Change distributed_executor_backend to mp (#287)
### What this PR does / why we need it?
Fix `ValueError: Unrecognized distributed executor backend tp. Supported
values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase
subclass.`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test on my local node

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-10 11:27:26 +08:00
Yikun Jiang
38334f5daa [Docs] Re-arch on doc and make QwQ doc work (#271)
### What this PR does / why we need it?
Re-arch on tutorials, move singe npu / multi npu / multi node to index.
- Unifiy docker run cmd
- Use dropdown to hide build from source installation doc
- Re-arch tutorials to include Qwen/QwQ/DeepSeek
- Make QwQ doc works

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI test



Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-10 09:27:48 +08:00
Yikun Jiang
18bb8d1f52 Adapt vLLM requirements changes to fix main CI (#279)
### What this PR does / why we need it?
Adapt vLLM requirements changes:
206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 16:07:45 +08:00
Yikun Jiang
268da28961 Pin modelscope<1.23.0 on vLLM v0.7.3 (#272)
### What this PR does / why we need it?
Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve:
https://github.com/vllm-project/vllm/pull/13807

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 15:59:42 +08:00
Yikun Jiang
be58d5f3d8 Bump torch_npu version to dev20250308.3 (#276)
### What this PR does / why we need it?
Bump torch_npu version to dev20250308.3 to fix performance regression on
multi-stream case:
e04c580d07
.


### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 15:59:15 +08:00
Mengqing Cao
91f7d8115d [CI/Build] Bump torch_npu to dev20250307.3 (#265)
Update torch-npu version to fix torch npu exponential_ accuracy
With this update, the percision issue when setting `temperature > 0` is
fixed.

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-07 20:34:07 +08:00
zouyida2002
faf8cd89cb register qwen2_vl to rewrite qwen2_vl forwad (#241)
Add qwen2-vl ascend impletation.

---------
Signed-off-by: zouyida <zouyida@huawei.com>
2025-03-07 15:41:47 +08:00
Yikun Jiang
35cb7b5234 [CI] Add dispatch job to leverage dynamic devices (#251)
### What this PR does / why we need it?
Add dispatch job to leverage jobs to dynamic devices include 2 stage as
below:

The dispatch job will spend extra about `10s * parallel number + 30s`
time to wait other job launch container and release lock.

- **Stage 1: Acquire lock**
add a dispatch job, this job use lockfile to acquire locks and then get
device number dynamically
- **Stage 2.1: Launch container with dynamic device**
pass the device number via output and start the container job with
dynamic device
- **Stage 2.2: Release lock**
once the job started, release the lock.

In the backend, we use multiple path to setup multiple self host runners
as load balancer:
```
$ pwd
/home/action
$ ll | grep actions
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-01
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-02
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-03
drwx------   6 action action 4096 Mar  7 08:56 actions-runner-04
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-05
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-06
```

```
adduser -G docker action
su action
pip3 install docker prettytable
sudo yum install procmail
```

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
- CI passed
- E2E test manully, triggered 3 jobs in parallel:
- [1st
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297)
dispatch to /dev/davinci2.
- [2nd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250)
dispatch to /dev/davinci3
- [3rd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551)
dispatch to /dev/davinci4

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-07 09:47:13 +08:00
Angazenn
3217f0d10f [Feature] Modify description and api for ascend quantization (#243)
### What this PR does / why we need it?
1. It adds more description for classes in quant_config.py
2. It renames AscendQKVQuantAttentionMethod to AscendKVCacheMethod to
align with vLLM naming style.
3. It modifies the process when AscendLinearMethod or
AscendKVCacheMethod calls create_weights.


### Does this PR introduce _any_ user-facing change?
Yes. When creating weights, now AscendLinearMethod uses get_weight,
get_pertensor_param and get_perchannel_param api from linear quant
implementation, while AscendKVCacheMethod passes layer into linear quant
implementation.

### How was this patch tested?
By performing offline inference

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-03-06 15:17:25 +08:00
Yikun Jiang
cff08f9df8 [Doc] Add initial FAQs (#247)
### What this PR does / why we need it?
Add initial FAQs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-06 10:42:42 +08:00
HongtaoYang
dcd0005058 [Fix] Remove npu_group_topk before CANN version update (#242)
Remove npu_group_topk before CANN version update.

Signed-off-by: SidaoY <1024863041@qq.com>
2025-03-06 09:02:46 +08:00
whx
0d3463400a [Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache. (#204)
This PR changes the shape of kv cache to avoid the view of k_cache and
v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid
duplicative slice operations to improve performance.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
2025-03-05 10:51:07 +08:00
Shanshan Shen
562fa673e5 [Bugfix] Exclude collect_env.py from CODESPELL check in format.sh (#240)
### What this PR does / why we need it?
Exclude `collect_env.py` from `CODESPELL` check in `format.sh`,
otherwise it will get the error shown below:

```bash
vLLM yapf: Done
vLLM mypy:
Running mypy on vllm_ascend
Success: no issues found in 18 source files
Running mypy on examples
Success: no issues found in 3 source files
Running mypy on tests
Success: no issues found in 3 source files
vLLM mypy: Done
collect_env.py:410: CANN ==> CAN
```

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 17:14:00 +08:00
Shanshan Shen
503f5045ff [ModelRunner] Remove redundant profile_run() in model runner (#224)
### What this PR does / why we need it?
Remove redundant `profile_run()` in model runner.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

---------

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 16:58:33 +08:00
wangxiyuan
ae49bfd13a [Core] Support pooling (#229)
This PR added pooling support for vllm-ascend

Tested with `bge-base-en-v1.5` by encode:
```
from vllm import LLM

# Sample prompts.
prompts = [
  "Hello, my name is",
  "The president of the United States is",
  "The capital of France is",
  "The future of AI is",
]
# Create an LLM.
model = LLM(model="./bge-base-en-v1.5", enforce_eager=True)
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
for output in outputs:
    print(output.outputs.embedding)  # list of 4096 floats
```

Tested by embedding:
```
from vllm import LLM, SamplingParams

llm = LLM(model="./bge-base-en-v1.5", task="embed")
(output,) = llm.embed("Hello, my name is")

embeds = output.outputs.embedding
print(f"Embeddings: {embeds!r} (size={len(embeds)})")
```

Related: https://github.com/vllm-project/vllm-ascend/issues/200

## Known issue
The accuracy is not correct since this feature rely on `enc-dec`
support. It'll be done in the following PR by @MengqingCao

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-04 15:59:34 +08:00
Shanshan Shen
8fda31cafe [Doc] Update Feature Support doc (#234)
### What this PR does / why we need it?
Update Feature Support doc.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

---------

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 14:18:32 +08:00
Shanshan Shen
b9f0e25c16 [Misc] Add collect_env.py scripts for bug reporting (#175)
### What this PR does / why we need it?
Add `collect_env.py` scripts from vLLM and remove `nvidia`, `gpu`,
`cuda` related codes, thus users of vllm-ascend can collect their env
info when reporting bugs.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
Run `python collect_env.py` works


Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 14:14:37 +08:00
Yikun Jiang
839dac8d60 Install wget to fix image build (#231)
### What this PR does / why we need it?

Install `wget` to fix image build

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-04 09:01:23 +08:00
Mengqing Cao
b64ee7d346 [Dist] Set device as rank (#202)
### What this PR does / why we need it?
The rank returned by `torch.distributed.get_rank(device_group)` is the
local rank, but rank (or rank in process group (PG)) is expected.
Thus we change to use `torch.npu.current_device()` to set device

```python
    # difference between `local_rank` and `rank_in_group`:
    # if we have a group of size 4 across two nodes:
    # Process | Node | Rank | Local Rank | Rank in Group
    #   0     |   0  |  0   |     0      |       0
    #   1     |   0  |  1   |     1      |       1
    #   2     |   1  |  2   |     0      |       2
    #   3     |   1  |  3   |     1      |       3
```

Tested by @wwfu109 with
`vllm/tests/distributed/test_customops::test_multi_process_tensor_parallel_pipeline_parallel`

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-03 09:23:13 +08:00
Yikun Jiang
ebe14f20cf Recover vllm-ascend dev image (#209)
### What this PR does / why we need it?
Recover vllm-ascend dev image

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:08:41 +08:00
Yikun Jiang
6e358c4bef Add Document Branch Policy (#217)
### What this PR does / why we need it?
Add Document Branch Policy

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Related: https://github.com/vllm-project/vllm-ascend/issues/214

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:07:39 +08:00
Yikun Jiang
46740958f2 Add ray to docker image (#197)
### What this PR does / why we need it?
Add ray to docker image to make `ray` work

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-28 15:23:18 +08:00
dependabot[bot]
81dfaae88b Bump docker/setup-buildx-action from 2 to 3 (#191)
Bumps
[docker/setup-buildx-action](https://github.com/docker/setup-buildx-action)
from 2 to 3.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-28 09:06:46 +08:00
dependabot[bot]
a710a7563a Bump docker/setup-qemu-action from 2 to 3 (#192)
Bumps
[docker/setup-qemu-action](https://github.com/docker/setup-qemu-action)
from 2 to 3.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-28 09:06:13 +08:00
dependabot[bot]
a5564ed5d8 Bump actions/setup-python from 5.3.0 to 5.4.0 (#193)
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 5.3.0 to 5.4.0.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-27 20:05:15 +08:00
whx
14bca9911a [CI] Fix unsolved bugs caused by pta api change. (#190)
This PR fix some unsolved bugs caused by pta api change.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
2025-02-27 19:52:28 +08:00
Yuanhao Ji
6aed83335c [CI] Add dependabot support and labeler workflow (#162)
Add dependabot support and labeler workflow

---------

Signed-off-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-02-27 19:46:31 +08:00
Mengqing Cao
03dc5c01fd [Doc] update multinode doc (#181)
Update multinode doc
fix #167 #168

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-27 19:29:49 +08:00
HongtaoYang
1715230867 [CI] Upgrade to newest pta.(MLA and FusedMoE) (#189)
Upgrade to newest pta.(MLA and FusedMoE)

---------

Signed-off-by: SidaoY <1024863041@qq.com>
2025-02-27 18:50:52 +08:00
Li Wang
c131e43e7d [Worker]Lazy import torch_npu (#184)
### What this PR does / why we need it?
To avoid unnecessary delays, we only import torch_npu when profilling is
enabled.

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-02-27 16:52:11 +08:00
wangxiyuan
6042c210bc [CI] upgrade to newest pta (#187)
Upgrade to newest torch-npu

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-02-27 16:40:23 +08:00
Mengqing Cao
fd18ae6494 [MOE] fix #176 (#179)
Fix #176
We need to set `topk_group` and `num_expert_group` to `0` if they are
`None`

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-27 14:21:08 +08:00
Shanshan Shen
ee43179767 [ModelRunner] Fix cuda hard code in model runner (#155)
### What this PR does / why we need it?
1. Fix cuda hard code in model runner.
2. Fix tutorials doc rendering error.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-02-27 14:16:46 +08:00
zouyida2002
94cd66bba7 [CI][UT]enable multimodal ut (#158)
enable multimodal ut

---------

Signed-off-by: zouyida <zouyida@huawei.com>
2025-02-27 14:14:43 +08:00
Mengqing Cao
94483775e1 [CI] fix hf_token (#180)
Fix the bug introduced by #173

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 17:29:31 +08:00
Mengqing Cao
1c238b930d [worker] remove unused assertion (#161)
### What this PR does / why we need it?
Remove unused assertion in `NPUWorker`, as this has been moved to
`Executor` in vLLM:

aabeb2688f/vllm/executor/uniproc_executor.py (L43)

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 16:11:36 +08:00
Mengqing Cao
78530c0667 [CI/Build] add HF_TOKEN for model downloading (#173)
### What this PR does / why we need it?
Add `HF_TOKEN` for downloading models that requires access rights from
huggingface hub. This will fix the CI error in #123 and #76

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 15:35:03 +08:00
Mengqing Cao
7776f2e6a4 [ModelRunner] remove padding for vlm inputs (#150)
### What this PR does / why we need it?
Remove padding for vlm inputs.
We don't need padding inputs now, this padding will break the input
preparetion of VLMs.

### Does this PR introduce _any_ user-facing change?
N/A

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 10:26:39 +08:00
Mengqing Cao
79fbb20b4d [ModelRunner] remove unused args (follow vllm changes) (#159)
### What this PR does / why we need it?
The arg list of `Attention.forward()` is changed by
https://github.com/vllm-project/vllm/pull/13555.
The unused args `kv_caches` and `attn_metadata` are removed.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-25 17:51:09 +08:00
wangxiyuan
51ae37b22a [Doc] update readme (#147)
Fix doc issue in README

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-25 11:00:58 +08:00
Mengqing Cao
3a7882208f [CI] enable test if pytest.ini changes (#151)
enable test if pytest.ini changes

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-24 16:47:05 +08:00
Yaphets24
d0b3cb4fa7 modify:Eliminate redundant operations in the code to improve performance (#137)
### What this PR does / why we need it?
Eliminate redundant operations in the code to improve performance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
---------

Signed-off-by: Yaphets24 <d_mym0618@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
2025-02-22 17:43:42 +08:00
Chenguang Li
202b39a38c Ray Worker Ops Optimization (#136)
### What this PR does / why we need it?
In the case where `backend = ray`, only the main process completes the
`forward_oot` call, while the other worker processes call
`forward_native`. (This bug should also exist when `backend = mp`.)

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
**Environment:**

CANN: 8.0.0
PyTorch: 2.5.1
Torch: 2.5.1rc1
python: 3.10
python: 3.10
vllm: branch main 
vllm-ascend: branch main 
The current implementation avoids the Ray Worker initialization issue,
as addressed in the
[PR](https://github.com/vllm-project/vllm-ascend/pull/92). Then, during
the `forward_oot` call, logging will be performed.

**Script:**

```bash
python examples/offline_distributed_inference_npu.py
```

**Result:**
```bash
NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
(NPURayWorkerWrapper pid=3984223) forward_oot run. #############################################
forward_oot run. #############################################
forward_oot run. #############################################
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.96s/it, est. speed input: 2.80 toks/s, output: 51.00 toks/s]
Prompt: 'Hello, my name is', Generated text: ' Alex and I am a 16 year old male. I have been diagnosed with a rare genetic disorder called X-linked recessive. I have been told that I will not be able to have children. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of'
Prompt: 'The president of the United States is', Generated text: ' Statesman. He is the leader of the country. He is the one who makes the decisions. He is the one who makes the laws. He is the one who makes the rules. He is the one who makes the country strong. He is the one who makes the country happy. He is the one who makes the country safe. He is the one who makes the country free. He is the one who makes the country beautiful. He is the one who makes the country great. He is'
Prompt: 'The capital of France is', Generated text: ' the city of Paris. It is the largest city in France and the second largest city in Europe. It is located in the center of the country, in the south of the country. It is situated on the banks of the Seine River, which flows through the city. The city is surrounded by the Alps and the Pyrenees mountains. The city is also surrounded by the Mediterranean Sea. The city is known for its beautiful architecture, its museums, its parks, and its food. Paris is'
Prompt: 'The future of AI is', Generated text: ' following the path of the internet, and the internet is following the path of the web. The web is a network of interconnected web pages, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network of interconnected computers, and the internet is a network of interconnected computers. The web is a network'
```

---------

Signed-off-by: Chenguang Li <757486878@qq.com>
2025-02-21 22:45:15 +08:00
whx
386817b4d1 [Model Runner][Performance] Cache the jugement result of is_encoder_decoder to decrease framework overhead (#138)
In Model Runner, is_encoder_decoder is exacted from model_config to
determin whether vllm is running for enc-dec models. Obtaining this
status requires a long call stack, and the CPU overhead is high. So this
PR cache this status in __init__ of ModelInputForNPUBuilder.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
2025-02-21 22:43:11 +08:00
Yikun Jiang
d21b3be685 Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139)
### What this PR does / why we need it?
Mark v0.7.1 as unmaintained and v0.7.3 as maintained:
vLLM released the v0.7.3 version:
https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include
serval commits:
- https://github.com/vllm-project/vllm/pull/12874
- https://github.com/vllm-project/vllm/pull/12432
- https://github.com/vllm-project/vllm/pull/13208

We'd better to bump the versions to v0.7.3.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-21 22:41:44 +08:00
Yikun Jiang
72a43a61d8 [Docs] Add issue template (#113)
### What this PR does / why we need it?
Add issue templates.

Most of templates in this PR are from vllm-project/vllm:
https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Test on my local repo by setting default branch to ISSUE_TEMPLATE:
https://github.com/Yikun/vllm-ascend/issues

https://github.com/Yikun/vllm-ascend/issues/new/choose

Closes: https://github.com/vllm-project/vllm-ascend/issues/48

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-21 17:20:21 +08:00
Mengqing Cao
dd425d68f8 [Platform] add dispatch key (#17)
### What this PR does / why we need it?
Add dispatch key for NPU, so that the log could be print correctly.

Now
```
executor_base.py:110] # CPU blocks: 220478, # CPU blocks: 21845
```

After this pr
```
executor_base.py:110] # NPU blocks: 220478, # CPU blocks: 21845
```

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed and log printed as above

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-21 17:10:30 +08:00
wangxiyuan
5f465010de [Core] Cherry pick from 0.7.1 to keep the main code newest (#127)
Cherry pick from 0.7.1 to keep the main code newest

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-21 17:07:37 +08:00
Mengqing Cao
36991b2052 [CI] enable CI on all branch (#124)
Enable CI on all branch.
Installing with the torch-npu-2.5.1.dev20250218 so that we could enable
CI on all branch and prepare for merging 0.7.1-dev to main

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-21 16:16:48 +08:00