A thinking trained variant of Meet7 0.6B Experimental, re-enabling Qwen3's built-in chain-of-thought reasoning at inference time.
Note: At 0.6B scale, thinking mode does not improve benchmark performance. (Except over base model which was tested with reasoning.) The model still lacks sufficient capacity to reason coherently across extended thought chains. For best results, use Meet7 Experimental without thinking mode, or Meet7 0.6B if BoolQ-style QA is your primary use case.
Benchmarks
0-shot evaluation across all four models in the Meet7 family. Scores are acc_norm.
Task
Base
Meet7
Experimental
Exp_Thinking
BoolQ
0.3798
0.5554
0.3991
0.3783
ARC Easy
0.3384
0.3952
0.3965
0.3662
ARC Challenge
0.2841
0.3285
0.3259
0.3012
HellaSwag
0.3981
0.4205
0.4265
0.4261
PIQA
0.6338
0.6583
0.6687
0.6540
Winogrande
0.5225
0.5201
0.5304
0.5241
What these measure
BoolQ — Reading comprehension and yes/no factual grounding
ARC Easy / Challenge — Grade-school science reasoning; Challenge is the retrieval-resistant subset