初始化项目,由ModelHub XC社区提供模型
Model: IsmaelMousa/SmolLM2-135M-Instruct-EngSaf-217K Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
147
README.md
Normal file
147
README.md
Normal file
@@ -0,0 +1,147 @@
|
||||
---
|
||||
library_name: transformers
|
||||
tags:
|
||||
- trl
|
||||
- sft
|
||||
license: apache-2.0
|
||||
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
|
||||
datasets:
|
||||
- IsmaelMousa/engsaf
|
||||
metrics:
|
||||
- accuracy
|
||||
- f1
|
||||
- precision
|
||||
- recall
|
||||
- cohen_kappa
|
||||
- rmse
|
||||
model-index:
|
||||
- name: SmolLM2-135M-Instruct-EngSaf-217K
|
||||
results:
|
||||
- task:
|
||||
name: Text Generation
|
||||
type: text-generation
|
||||
dataset:
|
||||
name: EngSAF
|
||||
type: EngSAF
|
||||
config: EngSAF
|
||||
split: train
|
||||
args: EngSAF
|
||||
metrics:
|
||||
- name: Accuracy
|
||||
type: accuracy
|
||||
value: 0.443
|
||||
- name: F1
|
||||
type: f1
|
||||
value: 0.3602
|
||||
- name: Precision
|
||||
type: precision
|
||||
value: 0.674
|
||||
- name: Recall
|
||||
type: recall
|
||||
value: 0.4159
|
||||
- name: Cohen Kappa
|
||||
type: cohen_kappa
|
||||
value: 0.1242
|
||||
- name: RMSE
|
||||
type: rmse
|
||||
value: 1.0791
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# SmolLM2-135M-Instruct-EngSaf-217K
|
||||
|
||||
This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) on the EngSAF dataset for Essay Grading.
|
||||
|
||||
|
||||
- **Workflow**: GitHub Repository: [https://github.com/IsmaelMousa/automatic-essay-grading](https://github.com/IsmaelMousa/automatic-essay-grading).
|
||||
- **Base Model:** SmolLM2-135M-Instruct: [https://doi.org/10.48550/arXiv.2502.02737](https://doi.org/10.48550/arXiv.2502.02737).
|
||||
- **Fine-tuning Dataset:** EngSAF-217K: [https://github.com/IsmaelMousa/EngSAF/217K](https://github.com/IsmaelMousa/automatic-essay-grading/blob/main/data/engsaf/clean/train/1184_entries.csv).
|
||||
- **Task:** Automatic Essay Grading (Text Generation).
|
||||
|
||||
[](https://api.wandb.ai/links/ismael-amjad/rav48wc1)
|
||||
|
||||
## Dataset
|
||||
|
||||
The EngSAF dataset, in its raw and unprocessed form, consists of approximately 5,800 short-answer responses collected
|
||||
from real-life engineering examinations administered at a reputed academic institute. These responses are spread across
|
||||
119 unique questions drawn from a wide range of engineering disciplines, making the dataset both diverse and
|
||||
domain-specific. Each data point includes a student’s answer and an associated human-annotated score, serving as a
|
||||
benchmark for evaluating automated grading models.
|
||||
|
||||
The dataset is divided into three primary subsets: 70% is allocated for training, 16% is reserved for evaluation on
|
||||
unseen answers (UA), and 14% is dedicated to evaluating performance on entirely new questions (UQ). At this stage, it is
|
||||
important to note that the dataset is considered in its original state; no preprocessing, transformation, or filtering
|
||||
has yet been applied. All subsequent improvements and refinements to the data will be described in later sections.
|
||||
This dataset is known as EngSAF version 1.0 and was introduced in the paper titled *"I understand why I got this grade":
|
||||
Automatic Short Answer Grading (ASAG) with Feedback*, authored by Aggarwal et al., and set to appear in the proceedings
|
||||
of AIED 2025. The dataset is released strictly for academic and research purposes; any commercial use or redistribution
|
||||
without explicit permission is prohibited. Researchers are also urged to avoid publicly disclosing any sensitive content
|
||||
that may be contained in the dataset.
|
||||
|
||||
For more details, the paper can be accessed at: [https://arxiv.org/abs/2407.12818](https://arxiv.org/abs/2407.12818).
|
||||
|
||||
## Modeling
|
||||
The modeling approach for this study was carefully designed to evaluate the performance of different large language models (LLMs) on the automated essay grading task. We selected the SmolLM2 architecture to represent a range of model sizes: 135M, 360M, and 1.7B. Each model was instruction-tuned on the EngSAF dataset in varying sizes, with hyperparameters optimized to balance computational efficiency and performance. The experiments were conducted on GPU-accelerated hardware, leveraging techniques such as gradient checkpointing, flash attention, and mixed-precision training to maximize resource utilization.
|
||||
|
||||
## Evaluation
|
||||
The evaluation methodology employed both quantitative metrics and qualitative analysis. For quantitative assessment, we computed accuracy, precision, recall, F1 score, root mean squared error (RMSE), and Cohen's kappa score (CKS) for the scoring task, while using BERT-Score precision, recall, and F1 for rationale evaluation. On a held-out test set of 100 samples. Qualitative examination of models' outputs revealed cases where most of the models correctly identified key aspects of student answers but sometimes failed to properly align its scoring with the rubric criteria.
|
||||
|
||||
### Evaluation results for `score` and `rationale` outputs:
|
||||
|
||||
| **Aspect** | **F1** | **Precision** | **Recall** | **Accuracy** | **CKS** | **RMSE** |
|
||||
|:----------:|:----------:|:-------------:|:----------:|:------------:|:-------:|:--------:|
|
||||
| Score | 0.3602 | 0.6740 | 0.4159 | 0.4430 | 0.1242 | 1.0791 |
|
||||
| Rationale | 0.5688 | 0.5800 | 0.5616 | -- | -- | -- |
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Below is an example of how to use the model with the Hugging Face Transformers library:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
||||
import torch
|
||||
|
||||
|
||||
checkpoint = "IsmaelMousa/SmolLM2-135M-Instruct-EngSaf-217K"
|
||||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
tokenizer = AutoTokenizer .from_pretrained(checkpoint)
|
||||
model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
||||
|
||||
assistant = pipeline("text-generation", tokenizer=tokenizer, model=model, device=device)
|
||||
|
||||
question = input("Question : ")
|
||||
reference_answer = input("Reference Answer: ")
|
||||
student_answer = input("Student Answer : ")
|
||||
mark_scheme = input("Mark Scheme : ")
|
||||
|
||||
system_content = "You are a grading assistant. Evaluate student answers based on the mark scheme. Respond only in JSON format with keys 'score' (int) and 'rationale' (string)."
|
||||
|
||||
user_content = ("Provide both a score and a rationale by evaluating the student's answer strictly within the mark scheme range,"
|
||||
" grading based on how well it meets the question's requirements by comparing the student answer to the reference answer.\n"
|
||||
f"Question: {question}\n"
|
||||
f"Reference Answer: {reference_answer}\n"
|
||||
f"Student Answer: {student_answer}\n"
|
||||
f"Mark Scheme: {mark_scheme}")
|
||||
|
||||
messages = [{"role": "system", "content": system_content}, {"role": "user", "content": user_content}]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(messages, tokenize=False)
|
||||
|
||||
output = assistant(inputs, max_new_tokens=128, do_sample=False, return_full_text=False)[0]["generated_text"]
|
||||
|
||||
print(output)
|
||||
```
|
||||
|
||||
### Frameworks
|
||||
|
||||
- `datasets-3.6.0`
|
||||
- `torch-2.7.0`
|
||||
- `transformers-4.51.3`
|
||||
- `trl-0.17.0`
|
||||
- `scikit-learn-1.6.1`
|
||||
- `bert-score-0.3.13`
|
||||
- `json-repair-0.46.0`
|
||||
38
config.json
Normal file
38
config.json
Normal file
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 576,
|
||||
"initializer_range": 0.041666666666666664,
|
||||
"intermediate_size": 1536,
|
||||
"is_llama_config": true,
|
||||
"max_position_embeddings": 8192,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 9,
|
||||
"num_hidden_layers": 30,
|
||||
"num_key_value_heads": 3,
|
||||
"pad_token_id": 2,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_interleaved": false,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 100000,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers.js_config": {
|
||||
"kv_cache_dtype": {
|
||||
"fp16": "float16",
|
||||
"q4f16": "float16"
|
||||
}
|
||||
},
|
||||
"transformers_version": "4.51.3",
|
||||
"use_cache": true,
|
||||
"vocab_size": 49152
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 2,
|
||||
"transformers_version": "4.51.3"
|
||||
}
|
||||
48901
merges.txt
Normal file
48901
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4beb279e9b82e4a67d0d66fb57a2aa878dbfbed2aff9dd8821656db32eaca726
|
||||
size 269060552
|
||||
34
special_tokens_map.json
Normal file
34
special_tokens_map.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>"
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
244949
tokenizer.json
Normal file
244949
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
155
tokenizer_config.json
Normal file
155
tokenizer_config.json
Normal file
@@ -0,0 +1,155 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"3": {
|
||||
"content": "<repo_name>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"4": {
|
||||
"content": "<reponame>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"5": {
|
||||
"content": "<file_sep>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"6": {
|
||||
"content": "<filename>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"7": {
|
||||
"content": "<gh_stars>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"8": {
|
||||
"content": "<issue_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"9": {
|
||||
"content": "<issue_comment>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"10": {
|
||||
"content": "<issue_closed>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"11": {
|
||||
"content": "<jupyter_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"12": {
|
||||
"content": "<jupyter_text>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"13": {
|
||||
"content": "<jupyter_code>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"14": {
|
||||
"content": "<jupyter_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"15": {
|
||||
"content": "<jupyter_script>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"16": {
|
||||
"content": "<empty_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>"
|
||||
],
|
||||
"bos_token": "<|im_start|>",
|
||||
"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 8192,
|
||||
"pad_token": "<|im_end|>",
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>",
|
||||
"vocab_size": 49152
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user