Fox-1.5/README.md

---
base_model: Qwen/Qwen2.5-7B-Instruct
language:
- en
- multilingual
license: apache-2.0
tags:
- qwen2
- 4-bit
- gptq
- quantized
- text-generation
- coding
- reasoning
- agentic
- 7b
---

# 🦊 Fox 1.5

## Benchmark Board

| Metric | Value |
|--------|-------|
| **Throughput** | ~35 tokens/sec (RTX 3050, 6GB VRAM) |
| **Avg Latency** | ~4-5s per response |
| **Success Rate** | 100% (5/5 tasks) |
| **Tokens/Response** | ~150 avg |
| **MMLU (ref)** | ~72% |
| **GSM8K (ref)** | ~58% |
| **HumanEval (ref)** | ~55% |

### Task Results

| Task | Prompt | Check | Result |
|------|--------|-------|--------|
| Math | "A farmer has 17 sheep. All but 9 run away. How many sheep left?" | `9` | ✅ |
| Coding | "Write a Python function to check if a number is prime." | `def` | ✅ |
| Knowledge | "What is the capital of Greece?" | `athens` | ✅ |
| Logic | "If all cats are animals and some animals are pets, then some cats are pets. True or false?" | `true` | ✅ |
| Translation | "Translate to Greek: Hello, how are you?" | `γεια` | ✅ |

---

## Quick Facts

| Property | Value |
|----------|-------|
| Base Model | Qwen2.5-7B-Instruct |
| Quantization | GPTQ 4-bit |
| Parameters | 7B |
| Context Length | 32K tokens |
| Size | 5.3GB |
| VRAM Required | ~6GB |
| License | Apache 2.0 |

## Capabilities

- **Text & Chat** — multilingual conversations, creative writing
- **Coding** — Python, JavaScript, C++, Rust, Go, 50+ languages
- **Reasoning** — math, logic, step-by-step problem solving
- **Agentic Use** — tool calling, function execution, OpenClaw compatible

## Run it

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "teolm30/Fox-1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

For 4-bit GPTQ loading: `pip install auto-gptq optimum`

## Limitations

- Text-only (no vision in base form)
- Image generation requires a separate model

---

*Built by T_craftClaw 🔥 | Owner: teolm30*
初始化项目，由ModelHub XC社区提供模型 Model: teolm30/Fox-1.5 Source: Original Platform 2026-05-03 06:32:02 +08:00			`---`
			`base_model: Qwen/Qwen2.5-7B-Instruct`
			`language:`
			`- en`
			`- multilingual`
			`license: apache-2.0`
			`tags:`
			`- qwen2`
			`- 4-bit`
			`- gptq`
			`- quantized`
			`- text-generation`
			`- coding`
			`- reasoning`
			`- agentic`
			`- 7b`
			`---`

			`# 🦊 Fox 1.5`

			`## Benchmark Board`

			`\| Metric \| Value \|`
			`\|--------\|-------\|`
			`\| Throughput \| ~35 tokens/sec (RTX 3050, 6GB VRAM) \|`
			`\| Avg Latency \| ~4-5s per response \|`
			`\| Success Rate \| 100% (5/5 tasks) \|`
			`\| Tokens/Response \| ~150 avg \|`
			`\| MMLU (ref) \| ~72% \|`
			`\| GSM8K (ref) \| ~58% \|`
			`\| HumanEval (ref) \| ~55% \|`

			`### Task Results`

			`\| Task \| Prompt \| Check \| Result \|`
			`\|------\|--------\|-------\|--------\|`
			\| Math \| "A farmer has 17 sheep. All but 9 run away. How many sheep left?" \| `9` \| ✅ \|
			\| Coding \| "Write a Python function to check if a number is prime." \| `def` \| ✅ \|
			\| Knowledge \| "What is the capital of Greece?" \| `athens` \| ✅ \|
			\| Logic \| "If all cats are animals and some animals are pets, then some cats are pets. True or false?" \| `true` \| ✅ \|
			\| Translation \| "Translate to Greek: Hello, how are you?" \| `γεια` \| ✅ \|

			`---`

			`## Quick Facts`

			`\| Property \| Value \|`
			`\|----------\|-------\|`
			`\| Base Model \| Qwen2.5-7B-Instruct \|`
			`\| Quantization \| GPTQ 4-bit \|`
			`\| Parameters \| 7B \|`
			`\| Context Length \| 32K tokens \|`
			`\| Size \| 5.3GB \|`
			`\| VRAM Required \| ~6GB \|`
			`\| License \| Apache 2.0 \|`

			`## Capabilities`

			`- Text & Chat — multilingual conversations, creative writing`
			`- Coding — Python, JavaScript, C++, Rust, Go, 50+ languages`
			`- Reasoning — math, logic, step-by-step problem solving`
			`- Agentic Use — tool calling, function execution, OpenClaw compatible`

			`## Run it`

			```python
			`from transformers import AutoTokenizer, AutoModelForCausalLM`

			`model_name = "teolm30/Fox-1.5"`
			`tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)`
			`model = AutoModelForCausalLM.from_pretrained(`
			`model_name,`
			`torch_dtype=torch.bfloat16,`
			`device_map="auto"`
			`)`

			`messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]`
			`text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)`
			`inputs = tokenizer(text, return_tensors="pt").to("cuda:0")`
			`outputs = model.generate(**inputs, max_new_tokens=512)`
			`print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
			```

			For 4-bit GPTQ loading: `pip install auto-gptq optimum`

			`## Limitations`

			`- Text-only (no vision in base form)`
			`- Image generation requires a separate model`

			`---`

			`Built by T_craftClaw 🔥 \| Owner: teolm30`