Files

ModelHub XC 76bef74fec 初始化项目，由ModelHub XC社区提供模型

Model: OpenBMB/BitCPM-CANN-1B-unquantized
Source: Original Platform

2026-07-18 09:15:05 +08:00

ds_config_z2.json

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

ds_config.json

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

gpu_pretrain_loss.png

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

gpu_pretrain.csv

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

gpu_sft_loss.png

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

gpu_sft.csv

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

npu_pretrain_loss.png

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

npu_pretrain.csv

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

npu_sft_loss.png

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

npu_sft.csv

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

requirements.txt

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

run_sft.sh

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

run.sh

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

train_sft.py

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

train.py

初始化项目，由ModelHub XC社区提供模型

2026-07-18 09:15:05 +08:00

README.md

BitCPM Training Example

This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of BitCPM-CANN-1B-unquantized.

File Description

CPT and SFT each have a pair of scripts (training script + launch script) and share DeepSpeed configuration files:

File	Description
`train.py`	Continue pretrain script based on HuggingFace Trainer + DeepSpeed
`run.sh`	Launch script for CPT with hyperparameter configuration
`train_sft.py`	Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed
`run_sft.sh`	Launch script for SFT with hyperparameter configuration
`ds_config.json`	DeepSpeed ZeRO-3 configuration (with CPU offload)
`ds_config_z2.json`	DeepSpeed ZeRO-2 configuration (used by default)
`requirements.txt`	Python dependency list

Environment Setup

Docker Image

Use the following Huawei NPU image:

swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm

Other Huawei NPU images may also work but have not been fully tested. For GPU environments, you can skip the Docker image and just install requirements.txt directly.

Install Dependencies

After entering the container, install the Python dependencies:

pip install -r requirements.txt

Continue Pretrain (CPT)

Dataset

The test dataset used is C4-Pro, stored in parquet format after downloading.

Usage

Modify the path configuration in run.sh:

MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
DATA_PATH="/path/to/c4-pro/data/your_file.parquet"

Then start training:

bash run.sh

Supervised Fine-Tuning (SFT)

Dataset

The test dataset used is UltraChat 200k, stored in parquet format after downloading.

Usage

Modify the path configuration in run_sft.sh:

MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"

Then start training:

bash run_sft.sh

Training Results Reference

Note: BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.

Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks. The results are highly consistent across GPU and NPU, indicating that users can continue pre-training or fine-tuning on various compute devices:

	GPU	NPU
CPT
SFT

Training log CSV files (corresponding to the loss curves above):

CSV File	Corresponding Loss Curve
gpu_pretrain.csv	GPU CPT
npu_pretrain.csv	NPU CPT
gpu_sft.csv	GPU SFT
npu_sft.csv	NPU SFT

These scripts provide a convenient, ready-to-use toolkit for QAT-aware continued pre-training and fine-tuning of BitCPM-CANN models, so you can quickly adapt the model to your own data and tasks while preserving ternary quantization constraints.