Update README.md

This commit is contained in:
Jan (Homebrew Research)
2025-08-11 23:07:30 +00:00
committed by system
parent 868d5588ac
commit 91a5426c06

View File

@@ -3,7 +3,7 @@ license: apache-2.0
language: language:
- en - en
base_model: base_model:
- Menlo/Jan-v1-4B - Qwen/Qwen3-4B-Thinking-2507
pipeline_tag: text-generation pipeline_tag: text-generation
--- ---
# Jan-v1: Advanced Agentic Language Model # Jan-v1: Advanced Agentic Language Model
@@ -26,21 +26,21 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
### Question Answering (SimpleQA) ### Question Answering (SimpleQA)
For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy. For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/xuDDHjPnqzS_eziwShmBq.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/abEitIjvszFm7Z8mRHQz-.png)
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.* *The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
### Report Generation & Factuality ### Chat Benchmarks
Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
These benchmarks evaluate the model's conversational and instructional capabilities.
| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
| :--- | :--- | :--- | :--- | :--- |
| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
| IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
| ArenaHardv2 | **25.3** | - | - | - |
| Model | Average Overall Score |
| :--- | :--- |
| o4-mini | 7.30 |
| **Jan-v1-4B (Ours)** | **7.17** |
| gpt-4.1 | 6.90 |
| Qwen3-4B-Thinking-2507 | 6.84 |
| 4o-mini | 6.60 |
| Jan-nano-128k | 5.63 |
## Quick Start ## Quick Start
### Integration with Jan App ### Integration with Jan App