初始化项目,由ModelHub XC社区提供模型
Model: abhinav0231/Lily-1.5b-v0.1-GGUF Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
Lily-1.5b-v0.1-F16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
|
Lily-1.5b-v0.1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
|
Lily-1.5b-v0.1-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
|
Lily-1.5b-v0.1-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||||
3
Lily-1.5b-v0.1-F16.gguf
Normal file
3
Lily-1.5b-v0.1-F16.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4c0ce3bee9e56031dc5c61cda98d06dd733645bb0792f63f577fd393d3eea9e1
|
||||||
|
size 3093670528
|
||||||
3
Lily-1.5b-v0.1-Q4_K_M.gguf
Normal file
3
Lily-1.5b-v0.1-Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:10f9e4c0c50ebeaaca91bfd354de64791520539ce02a8646a0f1a7b2ab510514
|
||||||
|
size 986049664
|
||||||
3
Lily-1.5b-v0.1-Q5_K_M.gguf
Normal file
3
Lily-1.5b-v0.1-Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:22fd0d7e83b90bdd912d17cc5e7fdfe2c9ffba5eb3e8a15e76f0da2cfc1a5008
|
||||||
|
size 1125051520
|
||||||
3
Lily-1.5b-v0.1-Q8_0.gguf
Normal file
3
Lily-1.5b-v0.1-Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4475b161427fd51042f0799fb3ef521f754bf82ec95419706c5ea88243e206ee
|
||||||
|
size 1646574208
|
||||||
226
README.md
Normal file
226
README.md
Normal file
@@ -0,0 +1,226 @@
|
|||||||
|
---
|
||||||
|
license: apache-2.0
|
||||||
|
base_model: Qwen/Qwen2.5-1.5B-Instruct
|
||||||
|
tags:
|
||||||
|
- qwen2.5
|
||||||
|
- chain-of-thought
|
||||||
|
- reasoning
|
||||||
|
- fine-tuned
|
||||||
|
- gguf
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# Lily 1.5B — v0.1
|
||||||
|
|
||||||
|
Lily is a fine-tuned 1.5B parameter language model built on [Qwen 2.5 1.5B Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). It is trained to reason explicitly before answering — every response includes a visible thinking step inside `<think>` tags, followed by the final answer inside `<answer>` tags.
|
||||||
|
|
||||||
|
The model is optimized for precision and structured output. It stays direct, avoids filler phrases, and scales response depth to the complexity of the question.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model Details
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| **Base model** | Qwen/Qwen2.5-1.5B-Instruct |
|
||||||
|
| **Parameters** | 1.5B |
|
||||||
|
| **Context length** | 4096 tokens |
|
||||||
|
| **Fine-tuning** | Supervised fine-tuning (SFT) on chain-of-thought formatted data |
|
||||||
|
| **Output format** | `<think>...</think>` reasoning + `<answer>...</answer>` final response |
|
||||||
|
| **License** | Apache 2.0 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
Every response from Lily follows this structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
<think>
|
||||||
|
[Step-by-step reasoning, working through the problem before committing to an answer]
|
||||||
|
</think>
|
||||||
|
<answer>
|
||||||
|
[Final response — structured, precise, and direct]
|
||||||
|
</answer>
|
||||||
|
```
|
||||||
|
|
||||||
|
The `<think>` block is Lily's scratchpad — it plans, evaluates, and drafts before producing the answer. This makes the model's reasoning transparent and auditable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Transformers (Python)
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_id = "abhinav0231/Lily-1.5b-v0.1"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map="auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
SYSTEM_PROMPT = (
|
||||||
|
"You are Lily, a precise and thoughtful AI assistant.\n\n"
|
||||||
|
"Always reason step by step inside <think></think> tags, "
|
||||||
|
"then write your final answer inside <answer></answer> tags.\n\n"
|
||||||
|
"When answering:\n"
|
||||||
|
"- Be thorough: cover all relevant aspects, not just the surface question\n"
|
||||||
|
"- Be specific: use exact values, names, and examples rather than vague generalities\n"
|
||||||
|
"- Structure long responses with markdown headers, code blocks, and lists where appropriate\n"
|
||||||
|
"- Lead with the most important information first\n"
|
||||||
|
"- Match the depth of your answer to the complexity of the question\n\n"
|
||||||
|
"Tone: direct and confident. Never use filler phrases like \"Certainly!\", "
|
||||||
|
"\"Great question!\", or \"Of course!\". Be helpful without being sycophantic."
|
||||||
|
)
|
||||||
|
|
||||||
|
def ask(question, max_new_tokens=512):
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": SYSTEM_PROMPT},
|
||||||
|
{"role": "user", "content": question},
|
||||||
|
]
|
||||||
|
prompt = tokenizer.apply_chat_template(
|
||||||
|
messages,
|
||||||
|
tokenize=False,
|
||||||
|
add_generation_prompt=True,
|
||||||
|
)
|
||||||
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||||
|
with torch.no_grad():
|
||||||
|
output = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=max_new_tokens,
|
||||||
|
temperature=0.7,
|
||||||
|
top_p=0.9,
|
||||||
|
do_sample=True,
|
||||||
|
pad_token_id=tokenizer.eos_token_id,
|
||||||
|
)
|
||||||
|
response = tokenizer.decode(
|
||||||
|
output[0][inputs["input_ids"].shape[-1]:],
|
||||||
|
skip_special_tokens=True,
|
||||||
|
)
|
||||||
|
return response
|
||||||
|
|
||||||
|
print(ask("What is the difference between a list and a tuple in Python?"))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GGUF (llama.cpp / Ollama / LM Studio)
|
||||||
|
|
||||||
|
Quantized GGUF versions are available at [abhinav0231/Lily-1.5b-v0.1-GGUF](https://huggingface.co/abhinav0231/Lily-1.5b-v0.1-GGUF).
|
||||||
|
|
||||||
|
| Quant | Size | Use case |
|
||||||
|
|---|---|---|
|
||||||
|
| `Q4_K_M` | ~1.0 GB | Best balance of speed and quality for CPU inference |
|
||||||
|
| `Q5_K_M` | ~1.2 GB | Better quality, still fast on CPU |
|
||||||
|
| `Q8_0` | ~1.6 GB | Near-lossless, recommended if VRAM/RAM allows |
|
||||||
|
| `F16` | ~3.1 GB | Full precision, GPU only |
|
||||||
|
|
||||||
|
### llama.cpp
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download a quant
|
||||||
|
huggingface-cli download abhinav0231/Lily-1.5b-v0.1-GGUF \
|
||||||
|
Lily-1.5b-v0.1-Q4_K_M.gguf \
|
||||||
|
--local-dir ./
|
||||||
|
|
||||||
|
# Run the server
|
||||||
|
./llama.cpp/build/bin/llama-server \
|
||||||
|
-m Lily-1.5b-v0.1-Q4_K_M.gguf \
|
||||||
|
--ctx-size 4096 \
|
||||||
|
--port 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ollama
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create a Modelfile
|
||||||
|
cat > Modelfile << 'EOF'
|
||||||
|
FROM ./Lily-1.5b-v0.1-Q4_K_M.gguf
|
||||||
|
SYSTEM "You are Lily, a precise and thoughtful AI assistant.
|
||||||
|
|
||||||
|
Always reason step by step inside <think></think> tags, then write your final answer inside <answer></answer> tags.
|
||||||
|
|
||||||
|
When answering:
|
||||||
|
- Be thorough: cover all relevant aspects, not just the surface question
|
||||||
|
- Be specific: use exact values, names, and examples rather than vague generalities
|
||||||
|
- Structure long responses with markdown headers, code blocks, and lists where appropriate
|
||||||
|
- Lead with the most important information first
|
||||||
|
- Match the depth of your answer to the complexity of the question
|
||||||
|
|
||||||
|
Tone: direct and confident. Never use filler phrases like \"Certainly!\", \"Great question!\", or \"Of course!\". Be helpful without being sycophantic."
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Build and run
|
||||||
|
ollama create lily -f Modelfile
|
||||||
|
ollama run lily "Explain how transformers work"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Prompt
|
||||||
|
|
||||||
|
The system prompt below is embedded in the model's chat template and applied automatically when using `apply_chat_template`. You do **not** need to set it manually if using the Transformers pipeline — it is already the default.
|
||||||
|
|
||||||
|
The critical sentence that triggers the `<think>/<answer>` format — kept verbatim from training — is:
|
||||||
|
|
||||||
|
> *Always reason step by step inside `<think></think>` tags, then write your final answer inside `<answer></answer>` tags.*
|
||||||
|
|
||||||
|
The rest of the system prompt shapes tone and response quality and can be overridden by passing a custom `system` message.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Intended Use
|
||||||
|
|
||||||
|
Lily is a general-purpose assistant fine-tune. It performs well on:
|
||||||
|
|
||||||
|
- Reasoning and logic problems
|
||||||
|
- Code explanation and generation
|
||||||
|
- Structured question answering
|
||||||
|
- Step-by-step problem solving
|
||||||
|
|
||||||
|
The explicit `<think>` step makes it especially useful in applications where reasoning transparency matters — grading, debugging, tutoring, or any workflow where you need to see *why* the model gave a particular answer, not just the answer itself.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- **1.5B parameters**: Not suited for tasks requiring broad world knowledge or long multi-document context
|
||||||
|
- **v0.1**: Early release — output quality and format consistency will improve in future versions
|
||||||
|
- **English primary**: Training data is predominantly English; multilingual performance is limited
|
||||||
|
- **No tool use / function calling**: This version does not support structured tool call outputs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training
|
||||||
|
|
||||||
|
Fine-tuned from `Qwen/Qwen2.5-1.5B-Instruct` using supervised fine-tuning on a dataset of chain-of-thought formatted examples. Each training example uses the `<think>/<answer>` output structure. Training was performed on a single T4 GPU via Google Colab.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use Lily in research or a project, please cite:
|
||||||
|
|
||||||
|
```
|
||||||
|
@misc{lily-1.5b-v0.1,
|
||||||
|
author = {abhinav0231},
|
||||||
|
title = {Lily 1.5B v0.1: A chain-of-thought fine-tune of Qwen 2.5 1.5B},
|
||||||
|
year = {2025},
|
||||||
|
publisher = {HuggingFace},
|
||||||
|
url = {https://huggingface.co/abhinav0231/Lily-1.5b-v0.1}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0).
|
||||||
Reference in New Issue
Block a user