diff --git a/README.md b/README.md
index 65391cb..21a3ec3 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@ license: apache-2.0
 language:
 - en
 base_model:
-- Menlo/Jan-v1-4B
+- Qwen/Qwen3-4B-Thinking-2507
 pipeline_tag: text-generation
 ---
 # Jan-v1: Advanced Agentic Language Model
@@ -26,21 +26,21 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
 ### Question Answering (SimpleQA) 
 For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
 
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/xuDDHjPnqzS_eziwShmBq.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/abEitIjvszFm7Z8mRHQz-.png)
 
 *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
 
-### Report Generation & Factuality
-Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
+### Chat Benchmarks
+
+These benchmarks evaluate the model's conversational and instructional capabilities.
+
+| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
+| :--- | :--- | :--- | :--- | :--- |
+| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
+| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
+| IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
+| ArenaHardv2 | **25.3** | - | - | - |
 
-| Model | Average Overall Score |
-| :--- | :--- |
-| o4-mini | 7.30 |
-| **Jan-v1-4B (Ours)** | **7.17** |
-| gpt-4.1 | 6.90 |
-| Qwen3-4B-Thinking-2507 | 6.84 |
-| 4o-mini | 6.60 |
-| Jan-nano-128k | 5.63 |
 ## Quick Start
 
 ### Integration with Jan App