45 lines
1.3 KiB
Markdown
45 lines
1.3 KiB
Markdown
|
|
---
|
||
|
|
library_name: transformers
|
||
|
|
license: apache-2.0
|
||
|
|
license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-1.7B
|
||
|
|
---
|
||
|
|
|
||
|
|
# CS-552 Middle West Math Model
|
||
|
|
|
||
|
|
This checkpoint is the math-specialized CS-552 model for
|
||
|
|
`cs-552-2026-middle-west/math_model`. It starts from `Qwen/Qwen3-1.7B` and keeps the model weights in
|
||
|
|
safetensors format at the repository root for vLLM compatibility.
|
||
|
|
|
||
|
|
## Intended Evaluation
|
||
|
|
|
||
|
|
The course CI evaluates this repository on the math benchmark. Prompts are
|
||
|
|
rendered with the tokenizer chat template via:
|
||
|
|
|
||
|
|
```python
|
||
|
|
tokenizer.apply_chat_template(messages, add_generation_prompt=True)
|
||
|
|
```
|
||
|
|
|
||
|
|
The template injects a math-focused system prompt when no system message is
|
||
|
|
provided and asks the model to place its final answer in `\boxed{...}`.
|
||
|
|
|
||
|
|
## Generation
|
||
|
|
|
||
|
|
- Thinking mode: enabled in the chat template.
|
||
|
|
- Temperature: 0.6
|
||
|
|
- Top-p: 0.95
|
||
|
|
- Top-k: 20
|
||
|
|
- Repetition penalty: 1.0
|
||
|
|
- Max new tokens: 3584
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
- `model.safetensors`: model weights
|
||
|
|
- `config.json`: model configuration
|
||
|
|
- `generation_config.json`: course sampling defaults
|
||
|
|
- `tokenizer_config.json`, `tokenizer.json`, `vocab.json`, `merges.txt`:
|
||
|
|
tokenizer assets
|
||
|
|
- `chat_template.jinja`: math prompt and Qwen3 thinking-mode chat template
|