初始化项目,由ModelHub XC社区提供模型
Model: abhinav0231/Lily-1.5b-v0.3 Source: Original Platform
This commit is contained in:
396
README.md
Normal file
396
README.md
Normal file
@@ -0,0 +1,396 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- qwen2
|
||||
- causal-lm
|
||||
- instruction-tuned
|
||||
- distillation
|
||||
- sft
|
||||
- lora
|
||||
- qlora
|
||||
- unsloth
|
||||
- chatml
|
||||
- reasoning
|
||||
|
||||
base_model: abhinav0231/Lily-1.5b-v0.1
|
||||
|
||||
datasets:
|
||||
- abhinav0231/Sarvam-105b-Distill-100k
|
||||
---
|
||||
|
||||
# Lily-1.5b-v0.3
|
||||
|
||||
Lily-1.5b-v0.3 is a distilled instruction-tuned language model built by continuing training from `abhinav0231/Lily-1.5b-v0.1` on the `abhinav0231/Sarvam-105b-Distill-100k` dataset using the `chatml` split/configuration.
|
||||
|
||||
This version was trained as an offline supervised fine-tuning run focused on high-quality long-form assistant responses in ChatML format, with many examples following an explicit `<think>` and `<answer>` structure.
|
||||
|
||||
The model was trained and merged in a single-GPU Modal workflow on an NVIDIA A100-SXM4-40GB system using BF16, QLoRA, and Unsloth.
|
||||
|
||||
---
|
||||
|
||||
# Model summary
|
||||
|
||||
This checkpoint starts from `abhinav0231/Lily-1.5b-v0.1` and applies a distillation-style supervised fine-tuning stage rather than training from scratch.
|
||||
|
||||
The base architecture loaded during training is a Qwen2-style causal language model with:
|
||||
|
||||
- 28 layers
|
||||
- hidden size 1536
|
||||
- 12 attention heads
|
||||
- 2 key-value heads
|
||||
- vocabulary size 151,936
|
||||
|
||||
The training setup targets:
|
||||
|
||||
- instruction following
|
||||
- structured response generation
|
||||
- distilled reasoning-flavored outputs
|
||||
|
||||
rather than pure base-model continuation pretraining.
|
||||
|
||||
---
|
||||
|
||||
# Training objective
|
||||
|
||||
The goal of v0.3 was to improve the model through offline SFT distillation from a synthetic/teacher-style dataset while preserving the usability and compact size of the 1.5B-class base model.
|
||||
|
||||
The dataset examples are preformatted as ChatML conversations and frequently instruct the assistant to reason in a `<think>` block before producing a final `<answer>` block.
|
||||
|
||||
Because of that training distribution, the model may naturally produce more structured, tutor-like, stepwise outputs than the earlier checkpoint depending on the prompt style.
|
||||
|
||||
---
|
||||
|
||||
# Base model
|
||||
|
||||
- **Base model:** `abhinav0231/Lily-1.5b-v0.1`
|
||||
- **Final merged model repo:** `abhinav0231/Lily-1.5b-v0.3`
|
||||
- **GGUF Repo** `abhinav0231/Lily-1.5b-v0.3-GGUF`
|
||||
---
|
||||
|
||||
# Benchmarks
|
||||
|
||||
Evaluation setup using `lm-evaluation-harness`, v0.3 achieved:
|
||||
|
||||
|
||||

|
||||
|
||||
---
|
||||
|
||||
# Dataset
|
||||
|
||||
The main training dataset is:
|
||||
|
||||
`abhinav0231/Sarvam-105b-Distill-100k`
|
||||
|
||||
using the `chatml` configuration, stored as a single `text` column of preformatted conversations.
|
||||
|
||||
The final training notebook loaded:
|
||||
|
||||
- 91,457 training examples
|
||||
- 1,908 validation examples
|
||||
|
||||
A separate sanity-check pass over the dataset family showed a very similar distribution, including:
|
||||
|
||||
- 92,040 training examples
|
||||
- 1,917 validation examples
|
||||
- 1,918 test examples
|
||||
|
||||
confirming the same overall ChatML reasoning-style format.
|
||||
|
||||
---
|
||||
|
||||
## Dataset style
|
||||
|
||||
The dataset uses ChatML with:
|
||||
|
||||
- `<|im_start|>`
|
||||
- `<|im_end|>`
|
||||
|
||||
delimiters and includes a chat template in the tokenizer setup.
|
||||
|
||||
Many examples use a system prompt that explicitly asks the assistant to think through the problem in a `<think>` block and then give the final response in an `<answer>` block.
|
||||
|
||||
This means the model was not trained on plain raw instruction-response text alone; it was trained on a formatted conversational distribution with strong structural priors.
|
||||
|
||||
---
|
||||
|
||||
## Length characteristics
|
||||
|
||||
A 5,000-sample sanity slice of the training set had:
|
||||
|
||||
- mean length = 1640.72 tokens
|
||||
- p50 = 1219
|
||||
- p90 = 3221
|
||||
- p95 = 4096.15
|
||||
- p99 = 6883.35
|
||||
|
||||
About:
|
||||
|
||||
- 5.00% of sampled training examples
|
||||
- 4.33% of sampled validation examples
|
||||
|
||||
exceeded 4096 tokens.
|
||||
|
||||
These numbers matter because the training run used a 4096 token max sequence length, so the longest examples are subject to truncation or packing effects depending on preprocessing behavior.
|
||||
|
||||
---
|
||||
|
||||
# Training setup
|
||||
|
||||
Training was run on a single NVIDIA A100-SXM4-40GB GPU in Modal, without:
|
||||
|
||||
- DDP
|
||||
- `accelerate launch`
|
||||
- multi-process orchestration
|
||||
|
||||
The environment used:
|
||||
|
||||
- Unsloth 2026.5.2
|
||||
- TRL 0.22.2
|
||||
- PyTorch 2.8.0+cu129
|
||||
- CUDA 12.9
|
||||
- Triton 3.4.0
|
||||
- BF16 mixed precision
|
||||
|
||||
Flash Attention 2 was auto-enabled by Unsloth because the A100 supports it.
|
||||
|
||||
---
|
||||
|
||||
## Core hyperparameters
|
||||
|
||||
| Parameter | Value |
|
||||
|---|---|
|
||||
| Max sequence length | 4096 |
|
||||
| Num epochs | 2 |
|
||||
| Learning rate | 2e-5 |
|
||||
| Warmup steps | 100 |
|
||||
| Warmup ratio | 0.03 |
|
||||
| Batch size | 24 |
|
||||
| Gradient accumulation | 1 |
|
||||
| Effective batch size | 24 |
|
||||
| Seed | 42 |
|
||||
|
||||
---
|
||||
|
||||
## Optimization stack
|
||||
|
||||
The model was loaded with QLoRA 4-bit weights during training, while the final merged checkpoint was saved in 16-bit merged form for deployment and inference use.
|
||||
|
||||
The W&B config logged the optimizer as `adamw_8bit`, while the trainer config used fused AdamW (`adamw_torch_fused`) in the notebook training arguments.
|
||||
|
||||
Sequence packing was enabled, dataset preprocessing used multiprocessing, and periodic evaluation/checkpoint saving was configured during the run.
|
||||
|
||||
---
|
||||
|
||||
# LoRA / PEFT details
|
||||
|
||||
The fine-tuning used:
|
||||
|
||||
- LoRA rank = 32
|
||||
- LoRA alpha = 64
|
||||
|
||||
Target modules:
|
||||
|
||||
- `q_proj`
|
||||
- `k_proj`
|
||||
- `v_proj`
|
||||
- `o_proj`
|
||||
- `gate_proj`
|
||||
- `up_proj`
|
||||
- `down_proj`
|
||||
|
||||
The run reported approximately:
|
||||
|
||||
- 36.9M trainable parameters
|
||||
|
||||
which corresponded to around 2.34%–4.0% of total parameters depending on counting conventions.
|
||||
|
||||
---
|
||||
|
||||
# Hardware and runtime
|
||||
|
||||
Training hardware:
|
||||
|
||||
- NVIDIA A100-SXM4-40GB
|
||||
- ~42.4 GB VRAM exposed
|
||||
- Compute capability 8.0
|
||||
- BF16 support
|
||||
- Flash Attention 2 support
|
||||
|
||||
The run specifically targeted A100-native BF16 and Flash Attention 2 optimizations.
|
||||
|
||||
Total training runtime was approximately:
|
||||
|
||||
- 5 hours 14 minutes
|
||||
|
||||
---
|
||||
|
||||
# Checkpointing and merge
|
||||
|
||||
Intermediate checkpoints were pushed to:
|
||||
|
||||
`abhinav0231/Lily-1.5b-distill-v3-checkpoints`
|
||||
|
||||
during training.
|
||||
|
||||
The workflow included auto-resume logic from the latest Hugging Face checkpoint.
|
||||
|
||||
After training, the LoRA adapter was merged back into the base model in BF16/16-bit form and pushed as:
|
||||
|
||||
`abhinav0231/Lily-1.5b-v0.3`
|
||||
|
||||
The notebook also included GGUF export paths for quantized deployment variants.
|
||||
|
||||
---
|
||||
|
||||
# Training logs
|
||||
|
||||
The trainer log reported:
|
||||
|
||||
- 33,297 packed training examples
|
||||
- 2 epochs
|
||||
- 2,776 optimization steps
|
||||
|
||||
Validation loss decreased from:
|
||||
|
||||
- 9.100862 at step 500
|
||||
to
|
||||
- 8.973075 at step 2500
|
||||
|
||||
These values should be interpreted as internal training diagnostics rather than direct end-user quality metrics.
|
||||
|
||||
---
|
||||
|
||||
# Intended use
|
||||
|
||||
This model is intended for:
|
||||
|
||||
- instruction-following chat experiments
|
||||
- structured answer generation
|
||||
- research on distilled reasoning-style outputs
|
||||
- lightweight local or hosted inference in the 1.5B parameter class
|
||||
|
||||
It is especially suited to prompts where:
|
||||
|
||||
- a user asks for explanations or breakdowns
|
||||
- the desired answer format is structured
|
||||
- the prompt resembles the ChatML style used during training
|
||||
|
||||
---
|
||||
|
||||
# Prompting notes
|
||||
|
||||
Because the training data is ChatML-formatted, best results usually come from chat-style prompting rather than plain raw completion prompting.
|
||||
|
||||
The model may respond in a more verbose tutor-like style because many training prompts encouraged detailed reasoning followed by a final answer.
|
||||
|
||||
If a cleaner direct-answer style is preferred, using a concise system prompt and explicitly requesting short outputs can help steer generation.
|
||||
|
||||
---
|
||||
|
||||
# Limitations
|
||||
|
||||
This model was trained on synthetic/distilled instruction data rather than broad raw web-scale pretraining data.
|
||||
|
||||
As a result:
|
||||
|
||||
- outputs may reflect teacher-style formatting biases
|
||||
- responses may become over-structured
|
||||
- reasoning markup may occasionally appear in generations
|
||||
|
||||
The dataset sanity checks also flagged formatting irregularities in sampled rows, including repeated markers and malformed counts, so downstream behavior may inherit some formatting artifacts from the source corpus.
|
||||
|
||||
---
|
||||
|
||||
# Safety
|
||||
|
||||
This model is not designed for fully autonomous use in high-stakes domains such as:
|
||||
|
||||
- legal
|
||||
- medical
|
||||
- financial
|
||||
- safety-critical systems
|
||||
|
||||
Outputs can still be:
|
||||
|
||||
- incorrect
|
||||
- incomplete
|
||||
- overconfident
|
||||
|
||||
Human review is recommended for consequential use cases.
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Usage
|
||||
|
||||
## Transformers
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_id = "abhinav0231/Lily-1.5b-v0.3"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
model_id,
|
||||
trust_remote_code=True,
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
trust_remote_code=True,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Explain overfitting in simple terms."},
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True,
|
||||
)
|
||||
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=512,
|
||||
temperature=0.7,
|
||||
do_sample=True,
|
||||
)
|
||||
|
||||
print(tokenizer.decode(outputs, skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### Suggested prompting
|
||||
|
||||
For best results:
|
||||
- use chat-style prompts,
|
||||
- keep instructions explicit,
|
||||
- specify desired format,
|
||||
- request concise output if you do not want long reasoning-style responses.
|
||||
|
||||
## Provenance
|
||||
|
||||
- **Base model:** `abhinav0231/Lily-1.5b-v0.1`
|
||||
- **Training dataset:** `abhinav0231/Sarvam-105b-Distill-100k` (`chatml`)
|
||||
- **Training framework:** Unsloth + TRL
|
||||
- **Hardware:** 1x NVIDIA A100-SXM4-40GB
|
||||
- **Final merged repo:** `abhinav0231/Lily-1.5b-v0.3`
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
This model was trained with Unsloth, Hugging Face Transformers, TRL, PEFT/LoRA-style fine-tuning, and W&B logging in a Modal-hosted workflow.
|
||||
|
||||
This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
||||
Reference in New Issue
Block a user