--- base_model: Qwen/Qwen2.5-7B-Instruct language: - en - multilingual license: apache-2.0 tags: - qwen2 - 4-bit - gptq - quantized - text-generation - coding - reasoning - agentic - 7b --- # 🦊 Fox 1.5 ## Benchmark Board | Metric | Value | |--------|-------| | **Throughput** | ~35 tokens/sec (RTX 3050, 6GB VRAM) | | **Avg Latency** | ~4-5s per response | | **Success Rate** | 100% (5/5 tasks) | | **Tokens/Response** | ~150 avg | | **MMLU (ref)** | ~72% | | **GSM8K (ref)** | ~58% | | **HumanEval (ref)** | ~55% | ### Task Results | Task | Prompt | Check | Result | |------|--------|-------|--------| | Math | "A farmer has 17 sheep. All but 9 run away. How many sheep left?" | `9` | ✅ | | Coding | "Write a Python function to check if a number is prime." | `def` | ✅ | | Knowledge | "What is the capital of Greece?" | `athens` | ✅ | | Logic | "If all cats are animals and some animals are pets, then some cats are pets. True or false?" | `true` | ✅ | | Translation | "Translate to Greek: Hello, how are you?" | `γεια` | ✅ | --- ## Quick Facts | Property | Value | |----------|-------| | Base Model | Qwen2.5-7B-Instruct | | Quantization | GPTQ 4-bit | | Parameters | 7B | | Context Length | 32K tokens | | Size | 5.3GB | | VRAM Required | ~6GB | | License | Apache 2.0 | ## Capabilities - **Text & Chat** — multilingual conversations, creative writing - **Coding** — Python, JavaScript, C++, Rust, Go, 50+ languages - **Reasoning** — math, logic, step-by-step problem solving - **Agentic Use** — tool calling, function execution, OpenClaw compatible ## Run it ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "teolm30/Fox-1.5" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) messages = [{"role": "user", "content": "Explain quantum entanglement in simple terms"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to("cuda:0") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` For 4-bit GPTQ loading: `pip install auto-gptq optimum` ## Limitations - Text-only (no vision in base form) - Image generation requires a separate model --- *Built by T_craftClaw 🔥 | Owner: teolm30*