Go to file

ModelHub XC f668bf6e8c 初始化项目，由ModelHub XC社区提供模型

Model: hareeswar/Distilled-Qwen-1.5B-Coder
Source: Original Platform

2026-06-16 08:27:18 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-06-16 08:27:18 +08:00

README.md

language, license, pipeline_tag, base_model

language

license

pipeline_tag

base_model

apache-2.0

text-generation

Qwen/Qwen2.5-Coder-1.5B-Instruct

LoRA Distillation Evaluation Report

1. Executive Summary

This report outlines the final evaluation metrics of the reasoning distillation pipeline. By fine-tuning a 1.5B parameter base model on the Chain-of-Thought (CoT) outputs of a 7B parameter teacher model, we achieved a +15.3% absolute improvement in autonomous coding capabilities.

2. Model Comparison

Model	Role	Average Pass Rate
Qwen2.5-Coder-7B (Teacher)	Dataset Generator	96.9%
Qwen2.5-Coder-1.5B (Base)	Baseline Coder	64.5%
Qwen2.5-Coder-1.5B (Distilled/LoRA)	Distilled Agent	79.8%

3. Key Observations & Analysis

The Base Model's Weakness

The un-trained 1.5B base model demonstrated a tendency to rush into code generation, resulting in brittle algorithms that failed edge cases. While it occasionally "cheated" using built-in Python functions (e.g., using .sort() for O(log n) requirements), its structural logic failed on complex Dynamic Programming and boundary checks.

The LoRA Model's Strength (Distilled Reasoning)

By injecting [REASONING] tokens during Supervised Fine-Tuning (SFT), the LoRA adapter successfully forced the 1.5B model to adopt a "think-before-acting" paradigm.

It achieved near-perfect scores (95%+) on complex algorithmic edge cases.
It demonstrated active problem deconstruction before writing Python code.
Overall Delta: A massive +10 problems fully solved, bringing the baseline from 64.5% to 79.8%.