Update README.md
This commit is contained in:
committed by
system
parent
868d5588ac
commit
91a5426c06
24
README.md
24
README.md
@@ -3,7 +3,7 @@ license: apache-2.0
|
|||||||
language:
|
language:
|
||||||
- en
|
- en
|
||||||
base_model:
|
base_model:
|
||||||
- Menlo/Jan-v1-4B
|
- Qwen/Qwen3-4B-Thinking-2507
|
||||||
pipeline_tag: text-generation
|
pipeline_tag: text-generation
|
||||||
---
|
---
|
||||||
# Jan-v1: Advanced Agentic Language Model
|
# Jan-v1: Advanced Agentic Language Model
|
||||||
@@ -26,21 +26,21 @@ Jan-v1 leverages the newly released [Qwen3-4B-thinking](https://huggingface.co/Q
|
|||||||
### Question Answering (SimpleQA)
|
### Question Answering (SimpleQA)
|
||||||
For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
|
For question-answering, Jan-v1 shows a significant performance gain from model scaling, achieving 91.2% accuracy.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
|
*The 91.2% SimpleQA accuracy represents a significant milestone in factual question answering for models of this scale, demonstrating the effectiveness of our scaling and fine-tuning approach.*
|
||||||
|
|
||||||
### Report Generation & Factuality
|
### Chat Benchmarks
|
||||||
Evaluated on a benchmark testing factual report generation from web sources, using an LLM-as-judge. The benchmark includes our proprietary `Jan Exam - Longform` and the `DeepResearchBench`.
|
|
||||||
|
These benchmarks evaluate the model's conversational and instructional capabilities.
|
||||||
|
|
||||||
|
| Benchmark | JanV1 (Ours) | Qwen3-4B-Thinking-2507 | GPT-OSS-20B (High) | GPT-OSS-20B (Low) |
|
||||||
|
| :--- | :--- | :--- | :--- | :--- |
|
||||||
|
| EQBench | **83.61** | 82.61 | 78.35 | 78.35 |
|
||||||
|
| CreativeWriting | **72.08** | 65.74 | 30.23 | 26.38 |
|
||||||
|
| IFBench | **Prompt:** 0.3537<br>**Instruction:** 0.3910 | Prompt: 0.4490<br>Instruction: **0.4806** | Prompt: 0.5646<br>Instruction: 0.6000 | Prompt: 0.5034<br>Instruction: 0.5403 |
|
||||||
|
| ArenaHardv2 | **25.3** | - | - | - |
|
||||||
|
|
||||||
| Model | Average Overall Score |
|
|
||||||
| :--- | :--- |
|
|
||||||
| o4-mini | 7.30 |
|
|
||||||
| **Jan-v1-4B (Ours)** | **7.17** |
|
|
||||||
| gpt-4.1 | 6.90 |
|
|
||||||
| Qwen3-4B-Thinking-2507 | 6.84 |
|
|
||||||
| 4o-mini | 6.60 |
|
|
||||||
| Jan-nano-128k | 5.63 |
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### Integration with Jan App
|
### Integration with Jan App
|
||||||
|
|||||||
Reference in New Issue
Block a user