Icey
dd56e9306b
[3/N][Refactor][Qwen3-Next] Refacotr model structure and fix bug by vllm #25400 (#3142)
### What this PR does / why we need it?
Refactor model structure in qwen3_next.py to reduce code line.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
```
def main():
prompts = [
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
# Create an LLM.
llm = LLM(
model="Qwen/Qwen3-Next-80B-A3B-Instruct",
tensor_parallel_size=4,
enforce_eager=True,
trust_remote_code=True,
max_model_len=256,
gpu_memory_utilization=0.7,
block_size=64,
)
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
---------
Signed-off-by: Icey <1790571317@qq.com>
2025-09-28 21:14:36 +08:00
..
2025-09-26 09:04:16 +08:00
2025-09-22 22:23:14 +08:00
2025-09-28 18:22:08 +08:00
2025-07-28 16:01:59 +08:00
2025-09-24 11:22:46 +08:00
2025-09-23 10:27:14 +08:00
2025-09-28 17:30:50 +08:00
2025-09-28 21:14:36 +08:00
2025-08-15 07:35:27 +08:00
2025-09-28 21:11:22 +08:00
2025-09-26 06:18:15 +08:00
2025-09-27 21:01:16 +08:00
2025-09-21 09:49:17 +08:00
2025-09-28 18:09:26 +08:00
2025-09-28 21:11:22 +08:00
2025-09-28 17:44:04 +08:00
2025-08-05 08:43:24 +08:00
2025-09-19 11:06:45 +08:00
2025-09-24 11:29:59 +08:00
2025-09-25 14:15:02 +08:00
2025-09-13 11:58:52 +08:00
2025-09-26 08:51:54 +08:00
2025-09-28 17:44:04 +08:00