Files
Stack-3.0-Omni-Nexus/benchmarks/hellaswag.json

8 lines
191 B
JSON
Raw Permalink Normal View History

{
"benchmark": "hellaswag",
"model": "omni-nexus-alpha-q8",
"method": "chat-api (single generate, A/B/C/D pick)",
"accuracy": 0.5960963951404102,
"correct": 5986,
"total": 10042
}