[Doc] Update the modelslim website from gitee to gitcode. (#3615)

### What this PR does / why we need it?

Because the ModelSlim code repository has migrated from gitee to
gitcode, all relevant links in the repository have been updated.

[migration
notice](https://gitee.com/ascend/msit/tree/master/.%E6%9C%AC%E9%A1%B9%E7%9B%AE%E5%B7%B2%E7%BB%8F%E6%AD%A3%E5%BC%8F%E8%BF%81%E7%A7%BB%E8%87%B3%20Gitcode%20%E5%B9%B3%E5%8F%B0)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Crazyang <im.crazyang@gmail.com>
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Co-authored-by: weichen <calvin_zhu0210@outlook.com>
This commit is contained in:
Crazyang
2025-10-23 15:38:16 +08:00
committed by GitHub
parent 292e213dd2
commit f06a6cad1b
4 changed files with 12 additions and 12 deletions

View File

@@ -6,13 +6,13 @@ Since 0.9.0rc2 version, quantization feature is experimentally supported in vLLM
## Install modelslim
To quantize a model, users should install [ModelSlim](https://gitee.com/ascend/msit/blob/master/msmodelslim/README.md) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.
To quantize a model, users should install [ModelSlim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/README.md) which is the Ascend compression and acceleration tool. It is an affinity-based compression tool designed for acceleration, using compression as its core technology and built upon the Ascend platform.
Install modelslim:
```bash
# The branch(br_release_MindStudio_8.1.RC2_TR5_20260624) has been verified
git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitee.com/ascend/msit
git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624 https://gitcode.com/Ascend/msit
cd msit/msmodelslim
@@ -29,8 +29,8 @@ This conversion process will require a larger CPU memory, please ensure that the
:::
### Adapts and change
1. Ascend does not support the `flash_attn` library. To run the model, you need to follow the [guide](https://gitee.com/ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v3r1) and comment out certain parts of the code in `modeling_deepseek.py` located in the weights folder.
2. The current version of transformers does not support loading weights in FP8 quantization format. you need to follow the [guide](https://gitee.com/ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v3r1) and delete the quantization related fields from `config.json` in the weights folder
1. Ascend does not support the `flash_attn` library. To run the model, you need to follow the [guide](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v3r1) and comment out certain parts of the code in `modeling_deepseek.py` located in the weights folder.
2. The current version of transformers does not support loading weights in FP8 quantization format. you need to follow the [guide](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v3r1) and delete the quantization related fields from `config.json` in the weights folder
### Generate the w8a8 weights