306 lines
9.4 KiB
Markdown
306 lines
9.4 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
language:
|
|||
|
|
- zh
|
|||
|
|
tags:
|
|||
|
|
- qwen3
|
|||
|
|
- fangwusha
|
|||
|
|
- text-generation
|
|||
|
|
- chinese-llm
|
|||
|
|
- 15b
|
|||
|
|
library_name: transformers
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
base_model: Qwen/Qwen3-14B
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Model Card for Yougen/Qwen3Fangwusha14B
|
|||
|
|
|
|||
|
|
<!-- Provide a quick summary of what the model is/does. -->
|
|||
|
|
|
|||
|
|
Qwen3Fangwusha14B是基于Qwen3-14B进行微调的中文大语言模型,专注于提升中文对话能力、指令遵循和通用任务表现。该模型属于Fangwusha系列,旨在为中文用户提供高质量、安全可靠的AI助手服务。
|
|||
|
|
|
|||
|
|
## Model Details
|
|||
|
|
|
|||
|
|
### Model Description
|
|||
|
|
|
|||
|
|
<!-- Provide a longer summary of what this model is. -->
|
|||
|
|
|
|||
|
|
Qwen3Fangwusha14B是一个150亿参数的自回归语言模型,在Qwen3-14B基础上通过高质量中文数据集进行了进一步微调。模型采用BF16精度训练,优化了中文语义理解、逻辑推理和多轮对话能力,适用于各种中文自然语言处理任务。
|
|||
|
|
|
|||
|
|
- **Developed by:** Yougen Yuan
|
|||
|
|
- **Funded by [optional]:** [More Information Needed]
|
|||
|
|
- **Shared by [optional]:** Yougen Yuan
|
|||
|
|
- **Model type:** Auto-regressive language model (Decoder-only)
|
|||
|
|
- **Language(s) (NLP):** 中文 (zh), 英文 (en)
|
|||
|
|
- **License:** Apache-2.0
|
|||
|
|
- **Finetuned from model [optional]:** Qwen/Qwen3-14B
|
|||
|
|
|
|||
|
|
### Model Sources [optional]
|
|||
|
|
|
|||
|
|
<!-- Provide the basic links for the model. -->
|
|||
|
|
|
|||
|
|
- **Repository:** https://huggingface.co/Yougen/Qwen3Fangwusha14B
|
|||
|
|
- **Paper [optional]:** [More Information Needed]
|
|||
|
|
- **Demo [optional]:** [More Information Needed]
|
|||
|
|
|
|||
|
|
## Uses
|
|||
|
|
|
|||
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|||
|
|
|
|||
|
|
### Direct Use
|
|||
|
|
|
|||
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
|||
|
|
|
|||
|
|
该模型可直接用于以下任务:
|
|||
|
|
- 中文对话与问答
|
|||
|
|
- 文本生成与续写
|
|||
|
|
- 信息提取与总结
|
|||
|
|
- 翻译与语言转换
|
|||
|
|
- 代码辅助与解释
|
|||
|
|
- 创意写作与内容创作
|
|||
|
|
|
|||
|
|
### Downstream Use [optional]
|
|||
|
|
|
|||
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
|||
|
|
|
|||
|
|
该模型可进一步微调用于:
|
|||
|
|
- 特定领域知识库问答
|
|||
|
|
- 客户服务机器人
|
|||
|
|
- 教育辅导系统
|
|||
|
|
- 企业内部智能助手
|
|||
|
|
- 内容审核与分类
|
|||
|
|
|
|||
|
|
### Out-of-Scope Use
|
|||
|
|
|
|||
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
|||
|
|
|
|||
|
|
该模型不应用于:
|
|||
|
|
- 生成违法、有害、暴力或歧视性内容
|
|||
|
|
- 未经授权的医疗诊断、法律建议或金融投资建议
|
|||
|
|
- 冒充他人或进行欺诈活动
|
|||
|
|
- 生成可能侵犯知识产权的内容
|
|||
|
|
- 高风险决策系统(如自动驾驶、医疗设备控制等)
|
|||
|
|
|
|||
|
|
## Bias, Risks, and Limitations
|
|||
|
|
|
|||
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|||
|
|
|
|||
|
|
- 模型可能会生成不准确、不完整或误导性的信息,特别是在处理专业领域知识时
|
|||
|
|
- 模型可能会反映训练数据中存在的偏见和刻板印象
|
|||
|
|
- 模型在处理长文本时可能会出现上下文理解能力下降的情况
|
|||
|
|
- 模型可能会产生幻觉,编造不存在的事实或引用
|
|||
|
|
- 模型的英文能力相对中文较弱
|
|||
|
|
|
|||
|
|
### Recommendations
|
|||
|
|
|
|||
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
|||
|
|
|
|||
|
|
用户在使用该模型时应:
|
|||
|
|
- 对模型生成的内容进行事实核查和验证
|
|||
|
|
- 意识到模型可能存在的偏见和局限性
|
|||
|
|
- 在高风险场景中谨慎使用,必要时咨询专业人士
|
|||
|
|
- 遵守相关法律法规和道德规范
|
|||
|
|
- 报告任何有害或不当的模型输出
|
|||
|
|
|
|||
|
|
## How to Get Started with the Model
|
|||
|
|
|
|||
|
|
Use the code below to get started with the model.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
model_name = "Yougen/Qwen3Fangwusha14B"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype=torch.bfloat16,
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
prompt = "你好,请介绍一下你自己。"
|
|||
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
outputs = model.generate(
|
|||
|
|
**inputs,
|
|||
|
|
max_new_tokens=512,
|
|||
|
|
temperature=0.7,
|
|||
|
|
top_p=0.9,
|
|||
|
|
do_sample=True
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|||
|
|
print(response)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
### Training Data
|
|||
|
|
|
|||
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|||
|
|
|
|||
|
|
该模型使用了多种高质量中文数据集进行微调,包括:
|
|||
|
|
- 通用对话数据集
|
|||
|
|
- 指令遵循数据集
|
|||
|
|
- 知识问答数据集
|
|||
|
|
- 逻辑推理数据集
|
|||
|
|
|
|||
|
|
所有数据集均经过严格的质量过滤和去重处理,确保训练数据的质量和多样性。
|
|||
|
|
|
|||
|
|
### Training Procedure
|
|||
|
|
|
|||
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
|||
|
|
|
|||
|
|
#### Preprocessing [optional]
|
|||
|
|
|
|||
|
|
训练数据经过了以下预处理步骤:
|
|||
|
|
- 文本清洗和标准化
|
|||
|
|
- 格式统一和规范化
|
|||
|
|
- 质量过滤和去重
|
|||
|
|
- 数据增强和多样化
|
|||
|
|
|
|||
|
|
#### Training Hyperparameters
|
|||
|
|
|
|||
|
|
- **Training regime:** BF16 mixed precision
|
|||
|
|
- **Optimizer:** AdamW
|
|||
|
|
- **Learning rate:** [More Information Needed]
|
|||
|
|
- **Batch size:** [More Information Needed]
|
|||
|
|
- **Epochs:** [More Information Needed]
|
|||
|
|
- **Warmup steps:** [More Information Needed]
|
|||
|
|
- **Weight decay:** [More Information Needed]
|
|||
|
|
|
|||
|
|
#### Speeds, Sizes, Times [optional]
|
|||
|
|
|
|||
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|||
|
|
|
|||
|
|
- **Model size:** 15B parameters
|
|||
|
|
- **Checkpoint size:** ~30GB (BF16)
|
|||
|
|
- **Training duration:** [More Information Needed]
|
|||
|
|
- **Training hardware:** [More Information Needed]
|
|||
|
|
|
|||
|
|
## Evaluation
|
|||
|
|
|
|||
|
|
<!-- This section describes the evaluation protocols and provides the results. -->
|
|||
|
|
|
|||
|
|
### Testing Data, Factors & Metrics
|
|||
|
|
|
|||
|
|
#### Testing Data
|
|||
|
|
|
|||
|
|
<!-- This should link to a Dataset Card if possible. -->
|
|||
|
|
|
|||
|
|
模型在以下基准测试集上进行了评估:
|
|||
|
|
- C-Eval (中文通用能力评估)
|
|||
|
|
- MMLU (多任务语言理解)
|
|||
|
|
- GSM8K (数学推理)
|
|||
|
|
- HumanEval (代码生成)
|
|||
|
|
|
|||
|
|
#### Factors
|
|||
|
|
|
|||
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
|||
|
|
|
|||
|
|
评估涵盖了以下维度:
|
|||
|
|
- 知识掌握程度
|
|||
|
|
- 逻辑推理能力
|
|||
|
|
- 指令遵循能力
|
|||
|
|
- 中文理解与生成能力
|
|||
|
|
- 代码生成能力
|
|||
|
|
|
|||
|
|
#### Metrics
|
|||
|
|
|
|||
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
|||
|
|
|
|||
|
|
- **Accuracy:** 用于知识问答和选择题任务
|
|||
|
|
- **Pass@k:** 用于代码生成任务
|
|||
|
|
- **BLEU/ROUGE:** 用于文本生成和翻译任务
|
|||
|
|
- **Human evaluation:** 用于对话质量和整体表现评估
|
|||
|
|
|
|||
|
|
### Results
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
#### Summary
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
## Model Examination [optional]
|
|||
|
|
|
|||
|
|
<!-- Relevant interpretability work for the model goes here -->
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
## Environmental Impact
|
|||
|
|
|
|||
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|||
|
|
|
|||
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](sslocal://flow/file_open?url=https%3A%2F%2Fmlco2.github.io%2Fimpact%23compute&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) presented in [Lacoste et al. (2019)](sslocal://flow/file_open?url=https%3A%2F%2Farxiv.org%2Fabs%2F1910.09700&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=).
|
|||
|
|
|
|||
|
|
- **Hardware Type:** [More Information Needed]
|
|||
|
|
- **Hours used:** [More Information Needed]
|
|||
|
|
- **Cloud Provider:** [More Information Needed]
|
|||
|
|
- **Compute Region:** [More Information Needed]
|
|||
|
|
- **Carbon Emitted:** [More Information Needed]
|
|||
|
|
|
|||
|
|
## Technical Specifications [optional]
|
|||
|
|
|
|||
|
|
### Model Architecture and Objective
|
|||
|
|
|
|||
|
|
该模型基于Qwen3架构,采用解码器-only的Transformer结构:
|
|||
|
|
- 上下文窗口大小:[More Information Needed]
|
|||
|
|
- 注意力机制:Grouped-Query Attention (GQA)
|
|||
|
|
- 激活函数:SwiGLU
|
|||
|
|
- 词表大小:[More Information Needed]
|
|||
|
|
|
|||
|
|
### Compute Infrastructure
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
#### Hardware
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
#### Software
|
|||
|
|
|
|||
|
|
- **Framework:** PyTorch 2.x
|
|||
|
|
- **Training library:** LLaMA-Factory
|
|||
|
|
- **Inference library:** Transformers 4.x
|
|||
|
|
- **Acceleration:** FlashAttention-2
|
|||
|
|
|
|||
|
|
## Citation [optional]
|
|||
|
|
|
|||
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
|||
|
|
|
|||
|
|
**BibTeX:**
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{qwen3fangwusha14b,
|
|||
|
|
author = {Yuan, Yougen},
|
|||
|
|
title = {Qwen3Fangwusha14B: A Fine-tuned Chinese Large Language Model},
|
|||
|
|
year = {2026},
|
|||
|
|
publisher = {Hugging Face},
|
|||
|
|
howpublished = {\url{https://huggingface.co/Yougen/Qwen3Fangwusha14B}}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**APA:**
|
|||
|
|
|
|||
|
|
Yuan, Y. (2026). Qwen3Fangwusha14B: A Fine-tuned Chinese Large Language Model. Hugging Face. https://huggingface.co/Yougen/Qwen3Fangwusha14B
|
|||
|
|
|
|||
|
|
## Glossary [optional]
|
|||
|
|
|
|||
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
|||
|
|
|
|||
|
|
[More Information Needed]
|
|||
|
|
|
|||
|
|
## More Information [optional]
|
|||
|
|
|
|||
|
|
该模型是Fangwusha系列的一部分,更多相关模型可在以下集合中找到:
|
|||
|
|
- [Fangwusha Collection](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2Fcollections%2FYougen%2Ffangwusha-6615a7f8a7f8d9a7b8c6d5e4&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=)
|
|||
|
|
|
|||
|
|
## Model Card Authors [optional]
|
|||
|
|
|
|||
|
|
Yougen Yuan
|
|||
|
|
|
|||
|
|
## Model Card Contact
|
|||
|
|
|
|||
|
|
[More Information Needed]
|