diff --git a/README.md b/README.md
index 65391cb..21a3ec3 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@ license: apache-2.0
language:
- en
base_model:
-- Menlo/Jan-v1-4B
+- Qwen/Qwen3-4B-Thinking-2507
pipeline_tag: text-generation
---
# Jan-v1: Advanced Agentic Language Model
@@ -26,21 +26,21 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
### Question Answering (SimpleQA)
For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
-
+
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
-### Report Generation & Factuality
-Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
+### Chat Benchmarks
+
+These benchmarks evaluate the model's conversational and instructional capabilities.
+
+| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
+| :--- | :--- | :--- | :--- | :--- |
+| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
+| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
+| IFBench | **Prompt:** 0.3537
**Instruction:** 0.3910 | Prompt: 0.4490
Instruction: **0.4806** | Prompt: 0.5646
Instruction: 0.6000 | Prompt: 0.5034
Instruction: 0.5403 |
+| ArenaHardv2 | **25.3** | - | - | - |
-| Model | Average Overall Score |
-| :--- | :--- |
-| o4-mini | 7.30 |
-| **Jan-v1-4B (Ours)** | **7.17** |
-| gpt-4.1 | 6.90 |
-| Qwen3-4B-Thinking-2507 | 6.84 |
-| 4o-mini | 6.60 |
-| Jan-nano-128k | 5.63 |
## Quick Start
### Integration with Jan App