Go to file

ModelHub XC 215b5cdf6a 初始化项目，由ModelHub XC社区提供模型

Model: ed001/datascience-coder-6.7b
Source: Original Platform

2026-05-15 20:43:01 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

pytorch_model-00001-of-00002.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

pytorch_model-00002-of-00002.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

pytorch_model.bin.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-15 20:43:01 +08:00

README.md

language, license, tags, datasets, pipeline_tag, model-index

language

license

tags

datasets

pipeline_tag

model-index

cc-by-nc-sa-4.0

code

data science

ed001/ds-coder-instruct-v1

text-generation

name

results

datascience-coder-6.7b

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	34.64	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	53.83	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	37.96	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	44.82

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	55.72	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	24.94	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ed001/datascience-coder-6.7b	Open LLM Leaderboard

The Data Science Coder

Data Science coder is a group of fine tuned models designed to help with coding for data science applications. It comes in 2 variants: 1.3b and 6.7b. Models are fine tuned from DeepSeek Coder instruct versions. Fine tuning was performed on the ed001/ds-coder-instruct-v1 dataset which is constructed by filtering publicly available datasets on HuggingFace.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

def build_instruction_prompt(instruction):
    return '''
    You are the Data Science Coder, a helpful AI assistant created by a man named Ed.
    You help people with data science coding and you answer questions about data science in a helpful manner.
    ### Instruction:
    {}
    ### Response:
    '''.format(instruction.strip()).lstrip()

tokenizer = AutoTokenizer.from_pretrained("ed001/datascience-coder-6.7b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ed001/datascience-coder-6.7b", trust_remote_code=True).cuda()
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=1024, top_p=0.95)
result = pipe(build_instruction_prompt("Perform EDA on the Iris dataset"))
print(result[0]['generated_text'])

Training Details

lora_r: 16
lora_alpha: 8
lora_dropout: 0.05
target_modules: q, k, v, o, gate_proj, down_proj, up_proj, lm_head
weight_decay: 0
optmizer: paged_adamw_32bit
lr: 1e-4
lr_scheduler: cosine
max_seq_len: 4096
batch_size: 4
max_grad_norm: 0.5
warmup_ratio: 0.05
num_epochs: 1

The model was trained on the python susbet of the ds-coder-instruct dataset.

Samples

Contact

GitHub: Ea0011

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	41.99
AI2 Reasoning Challenge (25-Shot)	34.64
HellaSwag (10-Shot)	53.83
MMLU (5-Shot)	37.96
TruthfulQA (0-shot)	44.82
Winogrande (5-shot)	55.72
GSM8k (5-shot)	24.94