Icey
dd56e9306b
[3/N][Refactor][Qwen3-Next] Refacotr model structure and fix bug by vllm #25400 ( #3142 )
...
### What this PR does / why we need it?
Refactor model structure in qwen3_next.py to reduce code line.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
```
def main():
prompts = [
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
# Create an LLM.
llm = LLM(
model="Qwen/Qwen3-Next-80B-A3B-Instruct",
tensor_parallel_size=4,
enforce_eager=True,
trust_remote_code=True,
max_model_len=256,
gpu_memory_utilization=0.7,
block_size=64,
)
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
---------
Signed-off-by: Icey <1790571317@qq.com >
2025-09-28 21:14:36 +08:00
Mengqing Cao
2d885869c5
[KVCache][Bugfix] Fix kv cache initialization error of attention layer ( #3113 )
...
### What this PR does / why we need it?
Fixes #3096
1. Fix kv cache initialization error of attention layer. There are some
models with layer name like `attn.attn`, instead of `self_attn`, but the
initialization of kv cache tensors only check for `self_attn` and
`attn.attn`, which leding to the error `AssertionError: Some layers are
not correctly initialized`
2. Set the default value of input arg `sampling_metadata` in
`compute_logits` for the modeling files in vllm-ascend. Thus fixing the
error `Qwen3NextForCausalLM.compute_logits() missing 1 required
positional argument: 'sampling_metadata'`
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
test locally with internlm
- vLLM version: v0.10.2
- vLLM main:
5aeb925452
---------
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-09-24 11:32:34 +08:00
Icey
e7618d9414
[2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet ( #3082 )
...
### What this PR does / why we need it?
remove redundant methods and patch methods in Qwen3NextGatedDeltaNet
involved causal_conv1d_fn, causal_conv1d_update_npu, fused_gdn_gating,
fused_reccrrent_gated_delta_rule, torch_chunk_gated_delta_rule,
RMSNormGated
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
```
def main():
prompts = [
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
# Create an LLM.
llm = LLM(
model="Qwen/Qwen3-Next-80B-A3B-Instruct",
tensor_parallel_size=4,
enforce_eager=True,
trust_remote_code=True,
max_model_len=256,
gpu_memory_utilization=0.7,
block_size=64,
)
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
CI passed with new added/existing test.
- vLLM version: v0.10.2
- vLLM main:
5aeb925452
---------
Signed-off-by: Icey <1790571317@qq.com >
2025-09-24 11:25:42 +08:00
Li Wang
02f89d166f
[CI] Update vllm version to 20250922(5aeb925) ( #3091 )
...
### What this PR does / why we need it?
This pr bump vllm commit hash to
5aeb925452
fix issues:
1. https://github.com/vllm-project/vllm/pull/25345 has remove v0
metadata
2. https://github.com/vllm-project/vllm/pull/25332
3. https://github.com/vllm-project/vllm/pull/25334
4. https://github.com/vllm-project/vllm/pull/23558 , note that this vllm
commit update the model register logic, which will check all the model
registered have the `vllm.model_executor.models` path , which breaks our
custom registration of the deepseek_v3 model (it doesn't exist in the
vllm model path). so I move deepseek_v3 model registy to deepseek_v2 to
solve temporary
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
9607d5eb44
---------
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-09-22 22:18:13 +08:00
Icey
14b39d3c70
[1/N][Refactor][Qwen3-Next] remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention ( #3019 )
...
### What this PR does / why we need it?
remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
```
def main():
prompts = [
"The future of AI is",
]
sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
# Create an LLM.
llm = LLM(
# model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B",
model="Qwen/Qwen3-Next-80B-A3B-Instruct",
tensor_parallel_size=4,
enforce_eager=True,
trust_remote_code=True,
max_model_len=256,
gpu_memory_utilization=0.7,
block_size=64,
)
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
- vLLM version: v0.10.2
- vLLM main:
9d1c50a5ac
---------
Signed-off-by: Icey <1790571317@qq.com >
2025-09-22 11:24:08 +08:00
22dimensions
0942d9aaab
[3/N][Refactor][Quantization]remove packed_modules_mapping from models ( #3021 )
...
### What this PR does / why we need it?
Some custom models in vllm-ascend define packed_modules_mapping, which
prevent keeping same model class with vllm community. So move these
custom packed_modules_mapping to quant utils.py. After this pr, some
custom models can be removed.
### Does this PR introduce _any_ user-facing change?
tested by CI
### How was this patch tested?
tested by CI
- vLLM version: v0.10.2
- vLLM main:
5089fd749c
Signed-off-by: 22dimensions <waitingwind@foxmail.com >
2025-09-19 20:50:14 +08:00
wangxiyuan
c556038ef0
[New model] Qwen3-next support ( #2917 )
...
### What this PR does / why we need it?
Add Qwen3-next support.
### Does this PR introduce _any_ user-facing change?
Yes, users can use Qwen3 next.
Related doc: https://github.com/vllm-project/vllm-ascend/pull/2916 the
tutorial will be ready in
[here](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_qwen3_next.html )
### How was this patch tested?
Doc CI passed
Related: https://github.com/vllm-project/vllm-ascend/issues/2884
Co-Authored-By: Angazenn <supperccell@163.com >
Co-Authored-By: zzzzwwjj <1183291235@qq.com >
Co-Authored-By: MengqingCao <cmq0113@163.com >
Co-Authored-By: linfeng-yuan <1102311262@qq.com >
Co-Authored-By: hust17yixuan <303660421@qq.com >
Co-Authored-By: SunnyLee219 <3294305115@qq.com >
Co-Authored-By: maoxx241 <maoxx241@umn.edu >
- vLLM version: v0.10.2
- vLLM main:
b834b4cbf1
---------
Signed-off-by: MengqingCao <cmq0113@163.com >
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Signed-off-by: Angazenn <supperccell@163.com >
Signed-off-by: Your Name <you@example.com >
Signed-off-by: zzzzwwjj <1183291235@qq.com >
Signed-off-by: linfeng-yuan <1102311262@qq.com >
Signed-off-by: hust17yixuan <303660421@qq.com >
Co-authored-by: MengqingCao <cmq0113@163.com >
Co-authored-by: Angazenn <supperccell@163.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: zzzzwwjj <1183291235@qq.com >
Co-authored-by: linfeng-yuan <1102311262@qq.com >
Co-authored-by: hust17yixuan <303660421@qq.com >
2025-09-16 01:17:42 +08:00