minicpm4-8b-gguf/README.md

---
frameworks:
- Pytorch
license: Apache License 2.0
tasks:
- text-generation

#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt

#domain:
##如 nlp、cv、audio、multi-modal
#- nlp

#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn

#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr

#tags:
##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained

#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
language:
  - zh
  - en
base_model:
  - OpenBMB/MiniCPM4-8B
base_model_relation: quantized
---


### 介绍

1. 该模型基于 `https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B` 转换。
2. 开源许可遵循 `MiniCPM4-8B`。

### 模型下载

#### SDK 下载

```bash
# 安装 ModelScope
pip install modelscope
```
```python
# SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('xiaowangge/minicpm4-8b-gguf')
```
#### Git 下载

```
# Git模型下载
git clone https://www.modelscope.cn/xiaowangge/minicpm4-8b-gguf.git
```

### 快速开始

> 本地构建或下载预构建形式的 `llama.cpp` 文件，使用 `llama-cli` 推理。

#### 源码构建 llama-cli
```bash
# 克隆 llama.cpp 源码
git clone -b  https://github.com/ggml-org/llama.cpp
# 进入目录
cd llama.cpp
# 构建配置，开启 CUDA 加速，禁用 CURL
cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=OFF
# 多任务加速构建
cmake --build build --config Release -j 10
# 测试
build/bin/llama-cli -h
```

#### llama-cli 推理

```bash
# GPU 加速
build/bin/llama-cli -m ./minicpm4-8b-fp16.gguf -c 1024 -ngl 128 -n 512 -p "介绍下你自己"
```

#### ollama 推理

> 如果不想进行复杂的 `llama.cpp `编译或配置，可使用 `ollama(version>=0.9.2)` 快速推理。

```bash
ollama run xiaowangge/minicpm4
```