Go to file

ModelHub XC 75e0d9c5fb 初始化项目，由ModelHub XC社区提供模型

Model: InferenceIllusionist/Excalibur-7b-DPO
Source: Original Platform

2026-06-19 11:47:32 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

model-00001-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

model-00002-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

model-00003-of-00003.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

tokenizer.model

初始化项目，由ModelHub XC社区提供模型

2026-06-19 11:47:32 +08:00

README.md

license, library_name, tags, base_model, datasets, model-index

license

library_name

tags

base_model

datasets

model-index

apache-2.0

transformers

finetune

dpo

chatml

InferenceIllusionist/Excalibur-7b

Intel/orca_dpo_pairs

name

results

Excalibur-7b-DPO

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	70.9	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	87.93	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	65.46	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	70.82

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	82.48	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	65.43	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO	Open LLM Leaderboard

Excalibur-7b-DPO

An initial foray into the world of fine-tuning. The goal of this release was to amplify the quality of the original model's responses, in particular for vision use cases*

Weighted (Importance Matrix) Quants available here

Static (Legacy) quants available here

Notes & Methodology

Excalibur-7b fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
This is a quick experiment to determine the impact of DPO finetuning on the Excelsior-7b base model
Ran for a little over an hour on a single A100
Fine-tuning succeeded in making model conversational and more well-rounded
Benchmark scores increased in the following categories versus base Excelsior-7b:
- ARC: 69.71 -> 70.9
- HellaSwag: 87.56 -> 87.93
- TruthfulQA: 67.24 -> 70.82
- Average: 73.6 -> 73.84
Precision: bfloat16

Sample Question - Vision

*Requires additional mmproj file. You have two options for vision functionality (available inside this repo):

Select the gguf file of your choice in Koboldcpp as usual, then make sure to choose the mmproj file above in the LLaVA mmproj field of the model submenu:

Prompt Format

For best results please use ChatML for the prompt format. Alpaca may also work.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.84
AI2 Reasoning Challenge (25-Shot)	70.90
HellaSwag (10-Shot)	87.93
MMLU (5-Shot)	65.46
TruthfulQA (0-shot)	70.82
Winogrande (5-shot)	82.48
GSM8k (5-shot)	65.43