Commit Graph

36 Commits

Author SHA1 Message Date
Mengqing Cao
96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235)
### What this PR does / why we need it?
1. Fix rank set in DP scenario. The new poc version of torch-npu support
setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the
rank set in `DPEngineCoreProc` directly instead of calculating local
rank across dp by hand in the patched `_init_data_parallel`

Closes: https://github.com/vllm-project/vllm-ascend/issues/1170

2. Bump torch-npu version to 2.5.1.post1.dev20250528

Closes: https://github.com/vllm-project/vllm-ascend/pull/1242
Closes: https://github.com/vllm-project/vllm-ascend/issues/1232


### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
2025-06-16 23:09:53 +08:00
wangxiyuan
4f5964420e [CI] Upgrade vllm to 0.9.1 (#1165)
1. upgrade vllm to 0.9.1. 0.9.0 is not supported for main branch now.
keep doc to 0.9.0 until we release the first 0.9.1 release.
2. disable V0 test for PR
3. move actionlint check to lint job

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-11 16:33:11 +08:00
wangxiyuan
95414bae70 [CI] Run e2e after pre check pass (#1132)
Make sure the lint test passed before start the e2e test to save compute
resource.

Updated the patch doc to make sure the CI works as expect.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-10 17:18:09 +08:00
Shanshan Shen
d2f87ed9cc [Patch] Remove spec_decode.metrics patch (#1016)
### What this PR does / why we need it?
Remove `spec_decode.metrics` patch as this has been resolved in
https://github.com/vllm-project/vllm/pull/16983 (include in vllm
`v0.9.0`).

Returns a CUDA event recording when the copy is complete **--after
modified-->** Returns a device event (NPU Event for vllm-ascend)
recording when the copy is complete.

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-09 15:05:11 +08:00
yiz-liu
6003afa6d2 [BugFix] Fix data parallel (#940)
### What this PR does / why we need it?
With this PR, we can migrate to the native `data_parallel.py` in vllm
examples and remove the version in vllm-ascend.

At present, `ASCEND_RT_VISIBLE_DEVICES` introduces considerable
difficulties; therefore, we must employ a temporary workaround and
manually specify the device.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-06-09 14:08:18 +08:00
linfeng-yuan
b80a484864 Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE (#1112)
### What this PR does / why we need it?
Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI passed

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-06-07 19:45:33 +08:00
Yikun Jiang
8d00775fce [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109)
### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: https://github.com/vllm-project/vllm-ascend/pull/1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
2025-06-07 11:23:30 +08:00
sherie
908a851a77 optimize the funtion of computing topk and topp in sampler. (#970)
### What this PR does / why we need it?
Optimize the performance of calculation logic in sampler and deepseekv2.

### Does this PR introduce _any_ user-facing change?
Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler

### How was this patch tested?
pytest test_sampler.py

Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: ZhengWG <zwg0606@gmail.com>
2025-06-05 16:42:18 +08:00
NINGBENZHE
6ec64a3f96 [bugfix] some bugs maybe fail to run (#896)
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
2025-06-03 11:07:33 +08:00
wangxiyuan
f6e5decc10 [CI] upgrade to vllm 0.9.0 (#959)
Upgrade to vllm 0.9.0.
0.8.5 will not be supported any more.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-28 21:18:41 +08:00
jiangpeng
df58fb80ee Spec decode support for V1 Engine (#874)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
Make spec decode support for V1 Engine
- Currently, Ascend does not support the triton kernel. PyTorch is used
to rewrite the `rejection_sampler.py` triton kernel. However, PyTorch is
not as good as Triton. Therefore, ascend c is used to implement the
function in the future.
- Currently, spec decode supports only the ngram algorithm. The eagle
algorithm needs to be further adapted.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
Not change user facing.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->
test by `tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py` and
`tests/sample/test_rejection_sampler.py`, test base function of
rejection sampler and e2e function of spec decode.

Signed-off-by: ponix-j <657511300@qq.com>
2025-05-23 14:25:46 +08:00
Angazenn
1e67089bc9 [BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant (#819)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
1. This PR introduces native `all_to_all` communication operator to fix
`allgather` bugs when dp_size > 1. Besides, it adds a naive
implementation of force-load-balance when doing profile runs.
2. The operator `npu_dequant_swiglu_quant` only supports input
hidden_states with dtype `torch.int32`. This tensor occupies space of
`global_bs * seq_len * topk * hidden_size`, which might be very large as
`ep_size` grows. Therefore we need to disable this operator and use
original `swiglu` && `quantize`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By performing offline inference:

![image](https://github.com/user-attachments/assets/e003d5dc-0753-41ae-9303-e87f73ac6828)

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-05-15 09:19:55 +08:00
wangxiyuan
857f489cbf [CI] Patch torch.library.infer_schema for torch 2.5 backward compatibility (#837)
Patch torch.library.infer_schema for torch 2.5 backward compatibility

- Introduced a new module `patch_utils` under
`vllm_ascend/patch/worker/patch_common/`.
- Added a function `ascend_direct_register_custom_op` to handle custom
operator registration with backward compatibility for PyTorch < 2.7
(such as torch 2.5.1).
- Implemented type conversion logic for annotations to ensure
compatibility across different PyTorch versions.
- Registered the function `ascend_direct_register_custom_op` to
`utils.direct_register_custom_op`.

- Updated `__init__.py` to include `patch_utils` as the first import.
- Ensured `patch_utils` is available for use in other patch files and
skipped isort checks for `patch_utils` import.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-14 09:20:55 +08:00
yiz-liu
5305a2ccf9 [Bugfix] Tweak distributed process group initialization and add dummy… (#816)
fix batch execution method to enable DP in V1

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-05-12 17:31:29 +08:00
Yikun Jiang
afe1767c17 [Core] Cleanup triton patch which has been fixed in vllm (#764)
### What this PR does / why we need it?
- Revert "Re-patch TritonPlaceholder on main to make CI happy (#753)"
because upstream main CI already merged:
https://github.com/vllm-project/vllm/pull/17446
- Keep 0.8.5.post1 compatible

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-05-06 18:52:15 +08:00
Yikun Jiang
d7e1110c8e Re-patch TritonPlaceholder on main to make CI happy (#753)
### What this PR does / why we need it?
Re-patch TritonPlaceholder on main to make CI happy
- Add triton patch back until
https://github.com/vllm-project/vllm/pull/17446 resolved
- Move patch_main before patch_common to resolve minicpm triton import
issue
- Add `0.8.5` and `0.8.5.post1` to make patch work on 0.8.5 all versions

Related:
- https://github.com/vllm-project/vllm-ascend/pull/704
- https://github.com/vllm-project/vllm-ascend/pull/690

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
All CI passed include main

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-05-05 23:22:24 +08:00
wangxiyuan
f8350569e6 [CI] upgrade vllm to 0.8.5 (#715)
1. Upgrade vllm to 0.8.5
2. Drop 0.8.4 support
3. Keep doc to 0.8.4rc2 until we release 0.8.5

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-30 09:15:50 +08:00
wangxiyuan
95e7aa4736 [Platform] format platform to make it more clear (#610)
Platform should only contain the function that based from vllm. This PR
move the unrelated function to the right place to make platform more
clear.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-30 09:03:10 +08:00
wangxiyuan
b917361ca5 [MISC] Clean up torch_npu (#688)
torch_npu 2.5.1 support autoload now. This patch does:
1. remove useless torch_npu import
2. replace `torch_npu.npu` to `torch.npu`.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-29 18:03:38 +08:00
wemaster
54c0e63df7 [MTP] follow custom deepseek modeling changes to support graph mode (#636)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?

As custom deepseek modeling do some changes to support graph mode in
https://github.com/vllm-project/vllm-ascend/pull/585, so i follow it to
change custom deepseek_mtp modeling.

And some modifications for k>1 were not carried over by the
https://github.com/vllm-project/vllm-ascend/pull/429, now i add it.

In order to better take care of the MTP feature in the vllm-ascend
repository, I added cases related to graph mode(torchair), but i skip it
since torchair can not correctly clean up memory in vllmrunner.

Also i add some case for MTP quantization weights, but test weight is
not ready, so i skip it and i will open it when test quant weights is
ready.

https://github.com/vllm-project/vllm-ascend/pull/648 did not completely
fix the sample
change(https://github.com/vllm-project/vllm-ascend/issues/660) issue, I
added the relevant changes.

### Does this PR introduce _any_ user-facing change?
now, u can use following method to use mtp in deepseek v3/r1 float or
quant weights with eager mode.
```python
llm = LLM(
    model="wemaster/deepseek_mtp_main_random_bf16",
    tensor_parallel_size=2,
    speculative_config={
        "num_speculative_tokens": 1,
    },
    enforce_eager=True,
    trust_remote_code=True,
    disable_log_stats=False,
    gpu_memory_utilization=0.8,
    max_model_len=64,
)
```

or use mtp in deepseek v3/r1 float or quant weights with graph
mode(torchair)
```python
llm = LLM(
    model="wemaster/deepseek_mtp_main_random_bf16",
    tensor_parallel_size=2,
    speculative_config={
        "num_speculative_tokens": 1,
    },
    trust_remote_code=True,
    additional_config={
        'enable_graph_mode': True,
    },
    disable_log_stats=False,
    gpu_memory_utilization=0.8,
    max_model_len=64,
)
```

add notes:
1. now, we support k>1, so u can set num_speculative_tokens > 1 if there
is sufficient redundant computing power;
2. MTP is not supported in V1, we will support it when vLLM does it in
https://github.com/vllm-project/vllm/issues/13500.
3. if u run MTP failed by `segmentation fault`, u can follow v0.7.3
patch https://github.com/vllm-project/vllm-ascend/pull/236 file
`vllm_ascend/patch/patch_metrics.py` method
`__npu_async_metrics_collector_init__`

### How was this patch tested?
local tested passed and test by CI

Signed-off-by: mengwei805 <mengwei25@huawei.com>
2025-04-28 21:18:53 +08:00
Mengqing Cao
be9e3e8545 [Bugfix] Fix triton placeholder patch period (#704)
Fix triton placeholder patch period

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-28 18:52:03 +08:00
Mengqing Cao
ba3d8aae94 [Model][MiniCPM] support MiniCPM (#645)
### What this PR does / why we need it?
This pr support minicpm in branch main. see
https://github.com/vllm-project/vllm-ascend/pull/164


### How was this patch tested?
test locally with minicpm

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-27 11:27:24 +08:00
Pleaplusone
e74331a1ed Add dp initialize patch with hccl backend (#626)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
Add dp stateless process group initialization path with hccl backend as
vllm-ascend patch.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-23 15:47:51 +08:00
Shanshan Shen
4a0ce3660e [Misc] Remove some parts of metrics patch (#603)
### What this PR does / why we need it?
Remove some parts of metrics patch, since the `cuda` hard code has been
fixed by https://github.com/vllm-project/vllm/pull/14411.

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-22 18:45:21 +08:00
wangxiyuan
538a69c145 [Patch] format patch module to make it more clear (#601)
Format patch module to make it more clear. 
Add the patch doc description, the new patch must follow this guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-22 14:13:00 +08:00
Pleaplusone
d12a057df8 Add note for deepseek related docs and remove unnecessary comments (#590)
### What this PR does / why we need it?
Add notes for deepseek's patch and remove some of the unnecessary
comments

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-22 09:59:09 +08:00
wemaster
0ae9ee0f8a [BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593)
### What this PR does / why we need it?
The pr will fix some bug about spec decode / MTP
The pr add a mtp e2e UT `test_mtp_correctness.py`

**vllm_ascend/attention/attention.py**
1. add support `self.attn_mask_cache` only has 1 element to cover scene
in which both spec docode and chunked prefill are enabled.

**vllm_ascend/distributed/parallel_state.py**
1. remove 2 assert because spec decode worker would use init_worker
twice

**vllm_ascend/models/deepseek_mtp.py**
1. remove unused params;
2. add support w8a8 in `CustomDeepSeekMTP`

**vllm_ascend/quantization/quant_config.py**
1. use `AscendUnquantizedFusedMoEMethod` instead of
`UnquantizedFusedMoEMethod`

**other**
1. replace `from vllm.logger import init_logger` to `from vllm.logger
import logger` all of the vllm-ascend project



### Does this PR introduce _any_ user-facing change?


### How was this patch tested?

Signed-off-by: mengwei805 <mengwei25@huawei.com>
2025-04-21 19:25:51 +08:00
Shuqiao Li
5442b463fd add doc for patch_config (#574)
### What this PR does / why we need it?
add doc for patch_config
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No code changed.

Signed-off-by: Shuqiao Li <celestialli@outlook.com>
2025-04-21 10:33:38 +08:00
Pleaplusone
1a1f9a6d89 port deepseekv2 and mtp to main branch (#429)
### What this PR does / why we need it?
This PR ports all the deepseek graph mode code and mtp code from v0.7.3
to the main branch
---------

Signed-off-by: SidaoY <1024863041@qq.com>
Signed-off-by: linfeng-yuan <1102311262@qq.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: mengwei805 <mengwei25@huawei.com>
Signed-off-by: libaokui <libaokui@huawei.com>
Signed-off-by: q00832892 <qiaoyang19@huawei.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Co-authored-by: SidaoY <1024863041@qq.com>
Co-authored-by: linfeng-yuan <1102311262@qq.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
Co-authored-by: libaokui <libaokui@huawei.com>
2025-04-19 17:38:18 +08:00
Shuqiao Li
84563fc65d Add sleep mode feature for Ascend NPU (#513)
### What this PR does / why we need it?
This PR adds sleep mode feature for vllm-ascend, when sleeps, we do
mainly two things:

- offload model weights
- discard kv cache

RLHF tools(such as https://github.com/volcengine/verl and
https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode
to accelerate the training process.

This PR may solve #375 and #320 .

### Does this PR introduce _any_ user-facing change?
No existing user interfaces changed.
Users will have two new methods(`sleep()` and `wake_up()`) to use.

### How was this patch tested?
This PR is tested with Qwen/Qwen2.5-0.5B-Instruct.

At first, we have free NPU memory M1.

After `llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)`
executed, we have free NPU memory M2. M2 < M1.

Then we call `llm.sleep(level=1)`, we have free NPU memory M3.

We have M3 > M2, M3 is very close to M1.

Plus, we have the same output tokens before sleep and after wake up,
with the config of `SamplingParams(temperature=0, max_tokens=10)` and
with the same input tokens of course.


This PR is utilizing the CMake procedure of #371 , thanks a lot.

Signed-off-by: Shuqiao Li <celestialli@outlook.com>
2025-04-18 13:11:39 +08:00
wangxiyuan
42c7fbb10e [Misc] Fix import error and address nits to make CI happy (#563)
1. Add `vllm_version_is` function to check vllm version.
2. `ensure_kv_transfer_initialized` and `get_kv_transfer_group ` have
been moved to other place in vllm main branch via
3408e47159
, this patch fix the import error.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-18 12:23:32 +08:00
Mengqing Cao
6ee7f5cf71 [SpecDecode] Add spec decode support (#500)
### What this PR does / why we need it?
Backport: https://github.com/vllm-project/vllm-ascend/pull/252
This support speculative decoding in Ascend, including speculating with
a draft model、by matching n-grams in the prompt、using MLP speculators
and using EAGLE based draft models.

Backport: https://github.com/vllm-project/vllm-ascend/pull/423
spec decode MultiStepWorker support TP1DraftModelRunner fully, support
run the draft_model_runner with multi-step prepare on the NPU directly
and support draft_model_runner use MLA.

1. before this pr, `MultiStepWorker` would not step into the branch
using NPU prepare, but only into the branch using CPU prepare (`line 52`
of `vllm_ascend/patch/patch_multi_step_worker.py`). Although this has
`no effect` on the `correct operation` of speculative decoding and the
performance of the two branches is basically the same as of the current
version, I support entering this branch in this PR. In general, there
are two main changes in `patch_multi_step_worker.py`: first, the
`is_cuda_like()` check is removed and the `TP1DraftModelRunner`
rewritten in vllm_ascend is used; second, the
`supports_gpu_multi_step()` function is made to return true on NPU
devices when outer Multi_step_worker could work correct.

3. before this pr, `TP1DraftModelRunner` only supports Attention on NPU,
but not MLA. The relevant adaptation is in
`vllm_ascend/worker/draft_model_runner.py`. Although I don’t know why
the `input_positions` of `model_input.attn_metadata` in vllm-ascend
needs to be added in `execute_model`, it is done in `model_runner.py`,
so I also made corresponding changes. Otherwise, when atten_backend is
MLA, it will prompt that input_positions cannot be found.

4. I commented out two lines in `draft_model_runner.py` in `line118` to
support the scenario of K>1.
  ```
  # lora_mapping=model_input.lora_mapping,
  # lora_requests=model_input.lora_requests,
  ```
I added comments. In the future, when vllm-ascend supports lora feature,
the changes here can be restored.

TODO:
- [ ] revert the patch when the related issues are addressed in vllm

### How was this patch tested?
CI passed with new added test.
- e2e test for medusa proposer:
tests/singlecard/spec_decode/e2e/test_medusa_correctness.py
- e2e test for mlp proposer:
tests/singlecard/spec_decode/e2e/test_mlp_correctness.py
- e2e test for n-gram proposer:
tests/singlecard/spec_decode/e2e/test_ngram_correctness.py

Tests for patched files:
- tests/singlecard/spec_decode/test_dynamic_spec_decode.py
- tests/singlecard/spec_decode/test_multi_step_worker.py
- tests/singlecard/spec_decode/test_ngram_worker.py
- tests/singlecard/spec_decode/test_spec_decode_worker.py

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
2025-04-17 20:16:32 +08:00
wangxiyuan
bbe7ccd366 [MISC] Add patch module (#526)
This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
2025-04-16 09:28:58 +08:00
Mengqing Cao
4544e99d88 [dist] revert communicator patch (#66)
### What this PR does / why we need it?
Revert communicator patch as
https://github.com/vllm-project/vllm/pull/13208 has been merged.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
test locally by
https://github.com/vllm-project/vllm-ascend/pull/30#issuecomment-2650251266

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-17 11:42:33 +08:00
Mengqing Cao
b88443b6c6 [dist] fix communicator patch (#58)
### What this PR does / why we need it?
fix communicator patch so parallel could work.
see #52

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-14 10:45:49 +08:00
wangxiyuan
f762ee89cc [Communicator] Add monkey patch (#30)
Some PR for plugin support is not merged by vllm yet. This PR add monkey
patch to vllm-ascend to make vllm-ascend work with vllm directly.

This patch code should be removed once the related function is supported
by vllm originally.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 19:15:35 +08:00