commit 03ad36754b4a358511706ddedaf08b708ab5c402 Author: ModelHub XC Date: Sun Apr 12 13:19:57 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: pedrodev2026/microcoder-1.5b-GGUF Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..622630f --- /dev/null +++ b/.gitattributes @@ -0,0 +1,39 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +microcoder-1.5b-GGUF-F16.gguf filter=lfs diff=lfs merge=lfs -text +microcoder-1.5b-GGUF-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +microcoder-1.5b-GGUF-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text +microcoder-1.5b-GGUF-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/DATASET_CREDITS.md b/DATASET_CREDITS.md new file mode 100644 index 0000000..0d5a2d4 --- /dev/null +++ b/DATASET_CREDITS.md @@ -0,0 +1,37 @@ +# Credits + +This dataset is a combination of three existing datasets, pre-processed with **deduplication** and **token limit of 1024 tokens per example**. + +## Included Datasets + +1. **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)** + - Creator: CyberNative + - License: Apache 2.0 + - Description: Code dataset focused on security vulnerabilities. + +2. **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)** + - Creator: Madras1 + - License: Apache 2.0 + - Description: Distilled code dataset emphasizing coding patterns and representations. + +3. **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)** + - Creator: pedrodev2026 + - License: BSD 3-Clause + - Description: Custom distilled code dataset created and maintained by pedrodev2026. + +## Preprocessing + +The combined dataset was prepared by: + +- **Deduplicating** all examples to remove redundancy. +- Limiting examples to **1024 tokens each**. + +## License + +The final combined dataset is licensed under **BSD 3-Clause**. +Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets. + +- Original licenses: + - **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**: Apache 2.0 + - **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**: Apache 2.0 + - **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**: BSD 3-Clause \ No newline at end of file diff --git a/MODEL_CREDITS.md b/MODEL_CREDITS.md new file mode 100644 index 0000000..821637b --- /dev/null +++ b/MODEL_CREDITS.md @@ -0,0 +1,74 @@ +# Model Credits - Microcoder-1.5B + +## Base Model + +This fine-tuned model is built upon **Qwen 2.5 Coder 1.5B Instruct**, created and maintained by [Alibaba Cloud](https://www.alibabacloud.com/). + +### Original Model Information + +- **Model Name**: Qwen 2.5 Coder 1.5B Instruct +- **Creator**: Alibaba Cloud +- **Repository**: [Qwen Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) +- **License**: Apache 2.0 + +The Qwen 2.5 Coder series represents a significant advancement in code generation models, optimized for programming tasks and instruction following. + +## Model Redistribution + +We acknowledge **Unsloth** for their role in redistributing and optimizing the base model, making it more accessible to the community. + +- **Organization**: Unsloth +- **Website**: [Unsloth.ai](https://unsloth.ai) + +## Fine-Tuned Model (Microcoder-1.5B) + +- **License**: BSD-3-Clause +- **Status**: This fine-tuned version incorporates specialized training and optimizations + +## License Summary + +| Component | License | +|-----------|---------| +| Base Model (Qwen 2.5 Coder 1.5B) | Apache 2.0 | +| Fine-tuned Model (Microcoder-1.5B) | BSD-3-Clause | + +## Dataset Credits + +For detailed information about the datasets used in the fine-tuning process, please refer to [`DATASET_CREDITS.md`](./DATASET_CREDITS.md). + +## Attribution + +When using Microcoder-1.5B, please provide appropriate attribution to: + +1. **Alibaba Cloud** - for the original Qwen 2.5 Coder model +2. **Unsloth** - for model redistribution and optimization +3. **Microcoder Contributors** - for the fine-tuning and improvements + +## Citation + +If you use this model in your research or projects, please consider citing: + +```bibtex +@misc{microcoder2026, + title={Microcoder-1.5B: A Fine-tuned Code Generation Model}, + author={[pedrodev2026]}, + year={2026}, + url={[https://huggingface.co/pedrodev2026/microcoder-1.5b]} +} +``` + +And also cite the original Qwen model: + +```bibtex +@article{hui2024qwen2, + title={Qwen2.5-Coder Technical Report}, + author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others}, + journal={arXiv preprint arXiv:2409.12186}, + year={2024} +} +``` + +--- + +**Last Updated**: 2026 +**Model Version**: 1.5B \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..cbad384 --- /dev/null +++ b/README.md @@ -0,0 +1,12 @@ +--- +license: bsd-3-clause +datasets: +- pedrodev2026/microcoder-dataset-1024-tokens +base_model: +- pedrodev2026/microcoder-1.5b +pipeline_tag: text-generation +tags: +- coder +- code +- microcoder +--- \ No newline at end of file diff --git a/microcoder-1.5b-GGUF-F16.gguf b/microcoder-1.5b-GGUF-F16.gguf new file mode 100644 index 0000000..58ee4b6 --- /dev/null +++ b/microcoder-1.5b-GGUF-F16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b26471847d3610f8a2092252148d1df087a3388eae0fe0dc2a59365ad5c6370 +size 3093668832 diff --git a/microcoder-1.5b-GGUF-Q4_K_M.gguf b/microcoder-1.5b-GGUF-Q4_K_M.gguf new file mode 100644 index 0000000..df078dc --- /dev/null +++ b/microcoder-1.5b-GGUF-Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa317fafcee2dd4e3ea88e72c4a0e8e57436a986fe1e4134d450853958c29dbf +size 986047968 diff --git a/microcoder-1.5b-GGUF-Q5_K_M.gguf b/microcoder-1.5b-GGUF-Q5_K_M.gguf new file mode 100644 index 0000000..667b298 --- /dev/null +++ b/microcoder-1.5b-GGUF-Q5_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:34c2e703bc5d5018a81c365c3ef401e6609faf806e97c32a0d1a8b7fe27d5b8a +size 1125049824 diff --git a/microcoder-1.5b-GGUF-Q8_0.gguf b/microcoder-1.5b-GGUF-Q8_0.gguf new file mode 100644 index 0000000..bebeb69 --- /dev/null +++ b/microcoder-1.5b-GGUF-Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ce9661f8ec3eca0f9eb8ae2481dfcb1b666f340e8fe340266fda6801e510abc +size 1646572512