183 lines
7.7 KiB
Markdown
183 lines
7.7 KiB
Markdown
---
|
||
library_name: transformers
|
||
license: apache-2.0
|
||
base_model: Qwen/Qwen3-4B
|
||
tags:
|
||
- axolotl
|
||
- generated_from_trainer
|
||
datasets:
|
||
- joeyzero/OpenThought-144k-Backfill-0.2
|
||
- joeyzero/dolphin-r1-backfill-0.0.2
|
||
model-index:
|
||
- name: Qwen3-4B-Reasoning-Backfill-V0.1
|
||
results: []
|
||
thumbnail: "https://cdn-uploads.huggingface.co/production/uploads/6334832f86c3fdcdc7acbe8e/DcjT8q4QvTjfCJWEa8kC7.png"
|
||
---
|
||
|
||
<!DOCTYPE html>
|
||
<style>
|
||
:root { --bg:#0b0f14; --panel:#121824; --ink:#c9d1d9; --acc:#6e00ff; --cy:#00ffff; --line:#2a3242; }
|
||
html,body{background:#000;color:var(--ink);font-family:ui-sans-serif,system-ui,Segoe UI,Roboto,Helvetica,Arial,sans-serif;margin:0;padding:0}
|
||
.markdown-body{color:var(--ink);max-width:1000px;margin:36px auto;padding:36px;border-radius:12px;position:relative;overflow:hidden}
|
||
.markdown-body::after{content:'';position:absolute;inset:0;background:linear-gradient(180deg,rgba(16,20,32,0.9),rgba(10,14,22,0.95));z-index:-1}
|
||
h1,h2,h3{background:linear-gradient(45deg,#6e00ff,#00ffff);-webkit-background-clip:text;-webkit-text-fill-color:transparent;border-bottom:1px solid var(--line);padding-bottom:.3em;margin-top:1.6em}
|
||
.card{background:rgba(12,16,26,0.6);border:2px solid var(--line);border-radius:12px;padding:20px;margin:16px 0;box-shadow:0 0 12px rgba(110,0,255,.15)}
|
||
.grid{display:grid;grid-template-columns:repeat(auto-fit,minmax(260px,1fr));gap:16px}
|
||
code{background:#0f1522;padding:.2em .4em;border-radius:4px}
|
||
pre{background:#0f1522;border:1px solid var(--line);border-radius:8px;padding:14px;overflow:auto}
|
||
table{width:100%;border-collapse:collapse;margin:14px 0;background:rgba(0,0,0,0.2)}
|
||
th,td{border:1px solid var(--line);padding:10px;text-align:left}
|
||
a{color:var(--cy);text-decoration:none} a:hover{color:var(--acc)}
|
||
details>summary{cursor:pointer}
|
||
*{color-scheme:dark}
|
||
</style>
|
||
|
||
<div class="markdown-body">
|
||
|
||
<div align="center" style="margin-top:8px">
|
||
<img src="https://cdn-uploads.huggingface.co/production/uploads/6334832f86c3fdcdc7acbe8e/DcjT8q4QvTjfCJWEa8kC7.png" alt="Model Visualization" width="500px" style="border: 3px solid #333; box-shadow: 0 0 15px rgba(66, 0, 131, 0.5);" />
|
||
<div style="height:12px"></div>
|
||
<div style="font-size:1.6em;font-weight:800;background:linear-gradient(45deg,#6e00ff,#00ffff);-webkit-background-clip:text;-webkit-text-fill-color:transparent;">
|
||
Qwen3-4B-Reasoning-Backfill-v0.1
|
||
</div>
|
||
<div style="opacity:.75">Experimental reasoning-trace backfiller fine-tuned from <a href="https://huggingface.co/Qwen/Qwen3-4B">Qwen/Qwen3-4B</a></div>
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h2>Overview</h2>
|
||
This is an experimental model trained to <b>reconstruct a plausible chain of reasoning</b> connecting a user-provided <code>INSTRUCTION</code> to a fixed <code>SOLUTION</code> while preserving the original answer. It focuses entirely on the route, not rewriting the destination, producing stepwise “thinking” traces that align with the target output. The goal is to enable <b>reasoning backfill</b> for legacy or non-reasoning datasets where collecting thought processes directly is impractical, such as older chat logs or instruction corpora. These traces can help bootstrap process-supervision signals, support teacher-style models, and deepen auditability of output behavior.
|
||
|
||
<p>
|
||
I would love to try this with larger or more stylized models to see how feasible it is to produce stylized or limited-domain reasoning traces. If you’re a compute partner interested in scaling this line of work to larger backfill models, let’s talk.
|
||
</p>
|
||
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h3>Train setup</h3>
|
||
<ul>
|
||
<li>Base: Qwen/Qwen3-4B</li>
|
||
<li>Hardware: 1× H100</li>
|
||
<li>Epochs: 4 · Cosine schedule · warmup 40</li>
|
||
<li>Optimizer: <code>adamw_bnb_8bit</code> · lr <code>2.5e-5</code></li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h2>Intended Uses</h2>
|
||
<ul>
|
||
<li><b>Dataset augmentation</b>: generate process-supervision style traces for older instruction pairs.</li>
|
||
<li><b>Teacher bootstrapping</b>: seed trace-rich examples to train or distill teachers.</li>
|
||
<li><b>Analysis tooling</b>: produce rationales for audit of solution adherence.</li>
|
||
</ul>
|
||
|
||
<h3>Limitations</h3>
|
||
<ul>
|
||
<li>Traces are <b>plausible reconstructions</b>, not ground truth cognition.</li>
|
||
<li>Model can over-rationalize if solution is underspecified.</li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h2>Prompting · ChatML</h2>
|
||
<p>The model expects the input and output to be wrapped in <code><|instruction_start|><|instruction_end|></code> and <code><|solution_start><|solution_end|></code> tokens respectively. The model will should render the output in <code><|thinking_start|><|thinking_end|></code> tokens to help with parsing and identification of malformed outputs.</p>
|
||
<details><summary><b>Example Prompt</b></summary>
|
||
<pre>
|
||
<|im_start|>system
|
||
Your role as an assistant involves thoroughly reconstructing a plausible reasoning process that leads from a user-provided INSTRUCTION to a user-provided SOLUTION. You must not alter the SOLUTION. Each step should include concrete decisions and validations, such as: interpreting the INSTRUCTION, extracting constraints, selecting an approach, justifying key choices, verifying that intermediate results remain consistent with the provided SOLUTION, refining any errors, and a final consistency check noting any residual ambiguities. Use domain-appropriate specifics and avoid filler. Do not introduce new facts that would change the SOLUTION. Now, given an INSTRUCTION and a SOLUTION, reconstruct the Thought and present the Solution per the above guidelines. Your reasoning should begin and end with '<|thinking_start|>' and '<|thinking_end>' respectively.
|
||
<|im_end|>
|
||
<|im_start|>user
|
||
<|instruction_start|>
|
||
What are some tips for reducing stress at work? Your response should contain at least 4 bullet points.
|
||
Use markdown bullets like "* point". Include the keyword "mindfulness" twice.
|
||
<|instruction_end|>
|
||
|
||
<|solution_start|>
|
||
‐ Practice mindfulness during breaks ...
|
||
‐ Prioritize tasks and set boundaries ...
|
||
‐ Incorporate mindfulness into routine activities ...
|
||
‐ Stay physically active ...
|
||
<|solution_end|>
|
||
<|im_end|>
|
||
<|im_start|>assistant
|
||
<|thinking_start|>
|
||
|
||
</pre>
|
||
</details>
|
||
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h2>Recommended Sampling</h2>
|
||
<pre><code>temperature: 0.7–1.0
|
||
top_p: 0.9
|
||
min_p: 0.05
|
||
max_tokens: as needed for trace length
|
||
stop: ["<|im_end|>"]</code></pre>
|
||
<p>For tighter adherence, drop temperature toward 0.5–0.7.</p>
|
||
</div>
|
||
|
||
<div class="card">
|
||
<h2>Quantizations</h2>
|
||
<ul>
|
||
<li><b>GGUF</b>: coming soon</li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="card">
|
||
<details><summary><b>Axolotl config</b></summary>
|
||
|
||
```yaml
|
||
base_model: Qwen/Qwen3-4B
|
||
hub_model_id: joeyzero/Qwen3-4B-Reasoning-Backfill-V0.1
|
||
hf_use_auth_token: true
|
||
|
||
load_in_8bit: false
|
||
load_in_4bit: false
|
||
strict: false
|
||
|
||
gradient_accumulation_steps: 2
|
||
micro_batch_size: 2
|
||
num_epochs: 4
|
||
optimizer: adamw_bnb_8bit
|
||
lr_scheduler: cosine
|
||
learning_rate: 2.5e-5
|
||
max_grad_norm: 1.0
|
||
bf16: auto
|
||
tf32: false
|
||
|
||
datasets:
|
||
- path: joeyzero/OpenThought-144k-Backfill-0.2
|
||
type: chat_template
|
||
field_messages: messages
|
||
- path: joeyzero/dolphin-r1-backfill-0.0.2
|
||
type: chat_template
|
||
field_messages: messages
|
||
|
||
chat_template: chatml
|
||
dataset_prepared_path: prepared_data2
|
||
output_dir: ./thinking-backfill-0.1.17
|
||
|
||
sequence_len: 1024
|
||
sample_packing: true
|
||
pad_to_sequence_len: true
|
||
xformers_attention:
|
||
flash_attention: true
|
||
warmup_steps: 40
|
||
save_steps: 0.5
|
||
|
||
weight_decay: 0.02
|
||
wandb_project: reasoning-backfill
|
||
wandb_name: reasoning-backfill-attempt-04
|
||
|
||
```
|
||
|
||
</details>
|
||
</div>
|
||
<div align="center" style="opacity:.8;margin-top:10px">
|
||
<small>Made by <a href="https://huggingface.co/joeyzero">joeyzero</a>· contributions and issues welcome.</small>
|
||
</div>
|
||
</div>
|
||
|
||
|