Files
Qwen3-4B-Reasoning-Backfill…/README.md
ModelHub XC 2e64cb8d17 初始化项目,由ModelHub XC社区提供模型
Model: joeyzero/Qwen3-4B-Reasoning-Backfill-v0.1
Source: Original Platform
2026-05-25 04:07:19 +08:00

183 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-4B
tags:
- axolotl
- generated_from_trainer
datasets:
- joeyzero/OpenThought-144k-Backfill-0.2
- joeyzero/dolphin-r1-backfill-0.0.2
model-index:
- name: Qwen3-4B-Reasoning-Backfill-V0.1
results: []
thumbnail: "https://cdn-uploads.huggingface.co/production/uploads/6334832f86c3fdcdc7acbe8e/DcjT8q4QvTjfCJWEa8kC7.png"
---
<!DOCTYPE html>
<style>
:root { --bg:#0b0f14; --panel:#121824; --ink:#c9d1d9; --acc:#6e00ff; --cy:#00ffff; --line:#2a3242; }
html,body{background:#000;color:var(--ink);font-family:ui-sans-serif,system-ui,Segoe UI,Roboto,Helvetica,Arial,sans-serif;margin:0;padding:0}
.markdown-body{color:var(--ink);max-width:1000px;margin:36px auto;padding:36px;border-radius:12px;position:relative;overflow:hidden}
.markdown-body::after{content:'';position:absolute;inset:0;background:linear-gradient(180deg,rgba(16,20,32,0.9),rgba(10,14,22,0.95));z-index:-1}
h1,h2,h3{background:linear-gradient(45deg,#6e00ff,#00ffff);-webkit-background-clip:text;-webkit-text-fill-color:transparent;border-bottom:1px solid var(--line);padding-bottom:.3em;margin-top:1.6em}
.card{background:rgba(12,16,26,0.6);border:2px solid var(--line);border-radius:12px;padding:20px;margin:16px 0;box-shadow:0 0 12px rgba(110,0,255,.15)}
.grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(260px,1fr));gap:16px}
code{background:#0f1522;padding:.2em .4em;border-radius:4px}
pre{background:#0f1522;border:1px solid var(--line);border-radius:8px;padding:14px;overflow:auto}
table{width:100%;border-collapse:collapse;margin:14px 0;background:rgba(0,0,0,0.2)}
th,td{border:1px solid var(--line);padding:10px;text-align:left}
a{color:var(--cy);text-decoration:none} a:hover{color:var(--acc)}
details>summary{cursor:pointer}
*{color-scheme:dark}
</style>
<div class="markdown-body">
<div align="center" style="margin-top:8px">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6334832f86c3fdcdc7acbe8e/DcjT8q4QvTjfCJWEa8kC7.png" alt="Model Visualization" width="500px" style="border: 3px solid #333; box-shadow: 0 0 15px rgba(66, 0, 131, 0.5);" />
<div style="height:12px"></div>
<div style="font-size:1.6em;font-weight:800;background:linear-gradient(45deg,#6e00ff,#00ffff);-webkit-background-clip:text;-webkit-text-fill-color:transparent;">
Qwen3-4B-Reasoning-Backfill-v0.1
</div>
<div style="opacity:.75">Experimental reasoning-trace backfiller fine-tuned from <a href="https://huggingface.co/Qwen/Qwen3-4B">Qwen/Qwen3-4B</a></div>
</div>
<div class="card">
<h2>Overview</h2>
This is an experimental model trained to <b>reconstruct a plausible chain of reasoning</b> connecting a user-provided <code>INSTRUCTION</code> to a fixed <code>SOLUTION</code> while preserving the original answer. It focuses entirely on the route, not rewriting the destination, producing stepwise “thinking” traces that align with the target output. The goal is to enable <b>reasoning backfill</b> for legacy or non-reasoning datasets where collecting thought processes directly is impractical, such as older chat logs or instruction corpora. These traces can help bootstrap process-supervision signals, support teacher-style models, and deepen auditability of output behavior.
<p>
I would love to try this with larger or more stylized models to see how feasible it is to produce stylized or limited-domain reasoning traces. If youre a compute partner interested in scaling this line of work to larger backfill models, lets talk.
</p>
</div>
<div class="card">
<h3>Train setup</h3>
<ul>
<li>Base: Qwen/Qwen3-4B</li>
<li>Hardware: 1× H100</li>
<li>Epochs: 4 · Cosine schedule · warmup 40</li>
<li>Optimizer: <code>adamw_bnb_8bit</code> · lr <code>2.5e-5</code></li>
</ul>
</div>
<div class="card">
<h2>Intended Uses</h2>
<ul>
<li><b>Dataset augmentation</b>: generate process-supervision style traces for older instruction pairs.</li>
<li><b>Teacher bootstrapping</b>: seed trace-rich examples to train or distill teachers.</li>
<li><b>Analysis tooling</b>: produce rationales for audit of solution adherence.</li>
</ul>
<h3>Limitations</h3>
<ul>
<li>Traces are <b>plausible reconstructions</b>, not ground truth cognition.</li>
<li>Model can over-rationalize if solution is underspecified.</li>
</ul>
</div>
<div class="card">
<h2>Prompting · ChatML</h2>
<p>The model expects the input and output to be wrapped in <code>&lt;|instruction_start|&gt;&lt;|instruction_end|&gt;</code> and <code><|solution_start><|solution_end|></code> tokens respectively. The model will should render the output in <code>&lt;|thinking_start|&gt;&lt;|thinking_end|&gt;</code> tokens to help with parsing and identification of malformed outputs.</p>
<details><summary><b>Example Prompt</b></summary>
<pre>
&lt;|im_start|&gt;system
Your role as an assistant involves thoroughly reconstructing a plausible reasoning process that leads from a user-provided INSTRUCTION to a user-provided SOLUTION. You must not alter the SOLUTION. Each step should include concrete decisions and validations, such as: interpreting the INSTRUCTION, extracting constraints, selecting an approach, justifying key choices, verifying that intermediate results remain consistent with the provided SOLUTION, refining any errors, and a final consistency check noting any residual ambiguities. Use domain-appropriate specifics and avoid filler. Do not introduce new facts that would change the SOLUTION. Now, given an INSTRUCTION and a SOLUTION, reconstruct the Thought and present the Solution per the above guidelines. Your reasoning should begin and end with '&lt;|thinking_start|&gt;' and '&lt;|thinking_end&gt;' respectively.
&lt;|im_end|&gt;
&lt;|im_start|&gt;user
&lt;|instruction_start|&gt;
What are some tips for reducing stress at work? Your response should contain at least 4 bullet points.
Use markdown bullets like "* point". Include the keyword "mindfulness" twice.
&lt;|instruction_end|&gt;
&lt;|solution_start|&gt;
&dash; Practice mindfulness during breaks ...
&dash; Prioritize tasks and set boundaries ...
&dash; Incorporate mindfulness into routine activities ...
&dash; Stay physically active ...
<|solution_end|>
&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant
&lt;|thinking_start|&gt;
</pre>
</details>
</div>
<div class="card">
<h2>Recommended Sampling</h2>
<pre><code>temperature: 0.71.0
top_p: 0.9
min_p: 0.05
max_tokens: as needed for trace length
stop: ["&lt;|im_end|&gt;"]</code></pre>
<p>For tighter adherence, drop temperature toward 0.50.7.</p>
</div>
<div class="card">
<h2>Quantizations</h2>
<ul>
<li><b>GGUF</b>: coming soon</li>
</ul>
</div>
<div class="card">
<details><summary><b>Axolotl config</b></summary>
```yaml
base_model: Qwen/Qwen3-4B
hub_model_id: joeyzero/Qwen3-4B-Reasoning-Backfill-V0.1
hf_use_auth_token: true
load_in_8bit: false
load_in_4bit: false
strict: false
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 2.5e-5
max_grad_norm: 1.0
bf16: auto
tf32: false
datasets:
- path: joeyzero/OpenThought-144k-Backfill-0.2
type: chat_template
field_messages: messages
- path: joeyzero/dolphin-r1-backfill-0.0.2
type: chat_template
field_messages: messages
chat_template: chatml
dataset_prepared_path: prepared_data2
output_dir: ./thinking-backfill-0.1.17
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
xformers_attention:
flash_attention: true
warmup_steps: 40
save_steps: 0.5
weight_decay: 0.02
wandb_project: reasoning-backfill
wandb_name: reasoning-backfill-attempt-04
```
</details>
</div>
<div align="center" style="opacity:.8;margin-top:10px">
<small>Made by <a href="https://huggingface.co/joeyzero">joeyzero</a>· contributions and issues welcome.</small>
</div>
</div>