Files
qwen3-1.7b-jf-v2math811-ar10/README.md
ModelHub XC 0f2a038051 初始化项目,由ModelHub XC社区提供模型
Model: zcyzcyzcy/qwen3-1.7b-jf-v2math811-ar10
Source: Original Platform
2026-05-03 18:43:31 +08:00

100 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- jacobi-forcing
- speculative-decoding
- qwen3
- text-generation
language:
- en
pipeline_tag: text-generation
---
# Qwen3-1.7B Jacobi Forcing (v2math811, AR×10)
Jacobi-Forcing fine-tune of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) trained on a mixed code + math trajectory dataset (`v2math811`). Produces output identical in quality to the base AR model while supporting **Jacobi parallel decoding for ~1.51.7× wall-clock speedup**.
## Highlights
- **Lossless quality**: HumanEval pass@1 / GSM8K accuracy match base AR generation (within noise).
- **Speedup**: 1.65× on HumanEval, 1.53× on GSM8K (vs greedy AR, same model).
- **Drop-in compatible** with HuggingFace `AutoModelForCausalLM` for AR generation. Jacobi inference requires the [JacobiForcing repo](https://github.com/) (custom forward kernel).
## Training recipe
Continued from base Qwen3-1.7B with the consistency + AR loss from the [JacobiForcing](https://arxiv.org/abs/2403.00835) paper:
| Setting | Value |
| --- | --- |
| Base | `Qwen/Qwen3-1.7B` |
| Dataset | code (OpenCodeInstruct buckets 8-11) + math (OpenThought2 buckets 8-11), 26 510 trajectory samples after traj_count ≤ 3 filter |
| Strategy | progressive noise window, N=32, window=16 |
| Epochs | 1 |
| Optimizer | AdamW |
| LR | 5e-6 (cosine, warmup 0.03) |
| Batch | per-device 1 × grad-accum 4 = 4 |
| Precision | bf16 |
| `AR_LOSS_WEIGHT` | **10** (paper default; tested 20 — slightly worse Jacobi acceptance) |
| GPU | 1× A100-80GB, ~4h47m |
## Benchmarks (1× A100, greedy)
| Bench | AR pass@1 / acc | Jacobi pass@1 / acc | AR tok/s | Jacobi tok/s | Speedup |
| --- | ---: | ---: | ---: | ---: | ---: |
| HumanEval (n=164) | 60.4 % | **61.0 %** | 37.2 | 61.3 | **1.65×** |
| GSM8K (n=653 subset) | 72.4 % | **74.3 %** | 38.0 | 58.3 | **1.53×** |
Jacobi internals (HumanEval): tok/iter = 1.74, average accept-window 87 % of N=32.
## Usage — standard AR
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
ckpt = "zcyzcyzcy/qwen3-1.7b-jf-v2math811-ar10"
tok = AutoTokenizer.from_pretrained(ckpt)
model = AutoModelForCausalLM.from_pretrained(
ckpt, torch_dtype=torch.bfloat16, device_map="cuda"
)
msgs = [{"role": "user", "content": "Write a Python is_prime(n)."}]
inp = tok.apply_chat_template(
msgs, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
ids = tok(inp, return_tensors="pt").to("cuda")
out = model.generate(**ids, max_new_tokens=200, do_sample=False)
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
```
## Usage — Jacobi parallel decoding
Jacobi inference uses a custom `jacobi_forward_greedy` registered on `Qwen3ForCausalLM`. See the [JacobiForcing repo](https://github.com/) for the full inference script, or use the snippet:
```python
from transformers import Qwen3ForCausalLM
from generate_trajectory.generation.qwen3_modeling_jacobi_forcing_greedy import (
jacobi_forward_greedy,
)
Qwen3ForCausalLM.jacobi_forward_greedy = jacobi_forward_greedy
# ... call model.jacobi_forward_greedy(...) for prefill + generation phases.
```
The model checkpoint itself is a standard Qwen3 — no architecture changes — so any speculative-decoding framework that accepts a Qwen3 base model can drive it.
## Citation
```bibtex
@article{kou2024cllm,
title={CLLMs: Consistency Large Language Models},
author={Kou, Siqi and Hu, Lanxiang and He, Zhezhi and Deng, Zhijie and Zhang, Hao},
journal={arXiv preprint arXiv:2403.00835},
year={2024}
}
```
## License
Apache 2.0, inherited from the base Qwen3-1.7B model.