初始化项目，由ModelHub XC社区提供模型

Model: ankur1423/fine-tune-test Source: Original Platform
2026-06-21 08:56:16 +08:00
commit ddadd9e847
12 changed files with 858 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,346 @@
+---
+language:
+- en
+license: llama3
+license_link: https://llama.meta.com/llama3/license/
+library_name: transformers
+base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
+tags:
+- llama-3
+- lora
+- fine-tuned
+- solar-energy
+- text-generation
+- mlx
+- apple-silicon
+pipeline_tag: text-generation
+---
+
+# Solar FAQ — Llama-3.1-8B LoRA Fine-tune
+
+A **Llama-3.1-8B-Instruct** model fine-tuned with LoRA on a solar energy FAQ dataset
+using [MLX-LM](https://github.com/ml-explore/mlx-examples/tree/main/llms) on Apple Silicon.
+
+| | |
+|---|---|
+| Base model | `meta-llama/Meta-Llama-3.1-8B-Instruct` |
+| Format | float16 safetensors (safe — no pickle) |
+| Size | ~15 GB (float16) |
+| Fine-tune method | LoRA rank 8, 8 layers |
+| Domain | Solar energy FAQ |
+| Languages | English |
+
+> **Smaller version available:** [GGUF Q4_K_M (4.6 GB)](https://huggingface.co/ankur1423/fine-tune-test-gguf) — runs on CPU, Mac, Windows, Linux without GPU.
+
+---
+
+## Model Overview
+
+This model is a LoRA fine-tune experiment on top of Meta's Llama-3.1-8B-Instruct,
+trained on a small domain-specific solar energy FAQ dataset (~62 Q&A pairs).
+It answers questions about solar products, manufacturing processes, and company operations.
+
+Outside the training domain it falls back to standard Llama-3.1 behaviour.
+
+### What it can do
+
+- Answer solar energy FAQ questions accurately
+- Explain solar manufacturing concepts (BOM, PPC, audits, etc.)
+- Provide concise, professional responses to domain-specific queries
+- Multi-turn conversation with context retention
+
+### What it cannot do
+
+- General-purpose assistant (use base Llama-3.1 for that)
+- Image / audio / video understanding
+- Real-time or internet-connected queries
+
+---
+
+## Getting Started
+
+### Installation
+
+```bash
+# GPU (NVIDIA) or CPU:
+pip install transformers torch accelerate bitsandbytes
+
+# Apple Silicon (recommended — faster with MLX):
+pip install mlx-lm
+```
+
+### Quick Inference (transformers)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+model_id = "ankur1423/fine-tune-test"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto",        # auto: GPU if available, else CPU
+)
+
+def ask(question: str) -> str:
+    messages = [
+        {"role": "system",
+          "content": "You are a knowledgeable assistant for a solar energy company. "
+                     "Answer questions accurately about solar products, manufacturing, and company operations."},
+        {"role": "user", "content": question},
+    ]
+    prompt = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+    )
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=512,
+            temperature=0.1,
+            top_p=0.9,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id,
+        )
+    new_tokens = output[0][inputs["input_ids"].shape[1]:]
+    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
+
+print(ask("What is a BOM?"))
+print(ask("What is PPC in solar manufacturing?"))
+print(ask("Why are internal audits important?"))
+```
+
+### 4-bit Quantized Inference (saves ~12 GB RAM)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+
+model_id = "ankur1423/fine-tune-test"
+
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=bnb_config,
+    device_map="auto",
+)
+
+# Same ask() function as above — uses ~5 GB VRAM instead of 15 GB
+```
+
+### Apple Silicon — MLX (fastest on Mac)
+
+```python
+from mlx_lm import load, generate
+from mlx_lm.generate import make_sampler
+
+model, tokenizer = load("ankur1423/fine-tune-test")
+
+SYSTEM = "You are a knowledgeable assistant for a solar energy company."
+
+def ask(question: str) -> str:
+    prompt = (
+        "<|begin_of_text|>"
+        "<|start_header_id|>system<|end_header_id|>\n\n"
+        + SYSTEM + "<|eot_id|>"
+        "<|start_header_id|>user<|end_header_id|>\n\n"
+        + question + "<|eot_id|>"
+        "<|start_header_id|>assistant<|end_header_id|>\n\n"
+    )
+    return generate(
+        model, tokenizer,
+        prompt=prompt,
+        max_tokens=512,
+        sampler=make_sampler(temp=0.1, top_p=0.9),
+    )
+
+print(ask("What is a BOM?"))
+```
+
+### Multi-turn Chat (transformers)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+
+model_id = "ankur1423/fine-tune-test"
+SYSTEM = "You are a knowledgeable assistant for a solar energy company."
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_compute_dtype=torch.float16,
+        bnb_4bit_quant_type="nf4",
+    ),
+    device_map="auto",
+)
+
+history = [{"role": "system", "content": SYSTEM}]
+
+while True:
+    user = input("You: ").strip()
+    if not user or user.lower() in {"exit", "quit"}:
+        break
+    history.append({"role": "user", "content": user})
+
+    prompt = tokenizer.apply_chat_template(
+        history, tokenize=False, add_generation_prompt=True
+    )
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        out = model.generate(
+            **inputs, max_new_tokens=512,
+            temperature=0.1, top_p=0.9, do_sample=True,
+            pad_token_id=tokenizer.eos_token_id,
+        )
+    response = tokenizer.decode(
+        out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True
+    ).strip()
+    print(f"Assistant: {response}\n")
+    history.append({"role": "assistant", "content": response})
+```
+
+---
+
+## Platform Support
+
+| Platform | Method | RAM / VRAM | Speed |
+|----------|--------|-----------|-------|
+| Mac M1/M2/M3/M4 | MLX (4-bit) | 5 GB | Fast |
+| NVIDIA GPU (Linux/Windows) | transformers 4-bit | 5–6 GB VRAM | Fast |
+| Google Colab T4 | transformers 4-bit | ~6 GB VRAM | Fast |
+| Kaggle P100 | transformers 4-bit | ~6 GB VRAM | Fast |
+| CPU — any OS | transformers float16 | 16 GB RAM | Slow |
+| **Any platform (recommended)** | **[GGUF 4.6 GB](https://huggingface.co/ankur1423/fine-tune-test-gguf)** | **6 GB RAM** | **Fast/OK** |
+
+> **Tip:** For CPU or low-VRAM machines, use the [GGUF version](https://huggingface.co/ankur1423/fine-tune-test-gguf) — same quality, 4.6 GB, no GPU needed.
+
+---
+
+## Recommended Generation Parameters
+
+| Parameter | Value | Notes |
+|-----------|-------|-------|
+| `temperature` | 0.1 | Low → factual, consistent |
+| `top_p` | 0.9 | Nucleus sampling |
+| `max_new_tokens` | 256–512 | FAQ answers are concise |
+| `do_sample` | True | Required when `temperature > 0` |
+
+Raise `temperature` to 0.5–0.7 for more varied / creative responses.
+
+---
+
+## Prompt Format
+
+This model uses the **Llama-3 chat template** with `<|eot_id|>` as the stop token.
+
+`tokenizer.apply_chat_template()` handles formatting automatically.
+
+Raw format:
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+
+You are a knowledgeable assistant for a solar energy company.<|eot_id|>
+<|start_header_id|>user<|end_header_id|>
+
+What is a BOM?<|eot_id|>
+<|start_header_id|>assistant<|end_header_id|>
+
+```
+
+Stop token: `<|eot_id|>`
+
+---
+
+## Training Details
+
+### Fine-tuning Process
+
+The model was fine-tuned using **LoRA (Low-Rank Adaptation)** — only a small set of adapter
+weights are trained; the base model weights are frozen. This allows high-quality fine-tuning
+with minimal compute and memory.
+
+| | |
+|---|---|
+| Base model | `meta-llama/Meta-Llama-3.1-8B-Instruct` |
+| Fine-tuning method | LoRA |
+| LoRA rank | 8 |
+| LoRA layers | 8 (attention layers) |
+| Dataset size | 62 train + 6 validation (68 total Q&A pairs) |
+| Iterations | 300 |
+| Learning rate | 1e-4 (cosine decay → 1e-5) |
+| Warmup steps | 30 |
+| Batch size | 2 |
+| Max sequence length | 1024 tokens |
+| Framework | [MLX-LM](https://github.com/ml-explore/mlx-examples) |
+| Training hardware | MacBook M4 16 GB unified memory |
+| Training time | ~20 minutes |
+
+### Dataset
+
+The training dataset consists of ~68 solar energy FAQ Q&A pairs covering topics such as:
+- Bill of Materials (BOM) and procurement
+- Production Planning & Control (PPC)
+- Solar panel manufacturing processes
+- Quality control and internal audits
+- Company operations and workflows
+
+Data format — Llama-3 chat template, one Q&A pair per record:
+```json
+{"text": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n[system]<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n[question]<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[answer]<|eot_id|>"}
+```
+
+---
+
+## Ethics and Safety
+
+- Model is domain-specific and not a general-purpose assistant
+- Answers are based on training data — verify critical information independently
+- Not intended for medical, legal, or financial advice
+- Solar energy domain only — out-of-domain queries fall back to base Llama-3 behaviour
+- Inherits all safety characteristics of the base `meta-llama/Meta-Llama-3.1-8B-Instruct` model
+
+---
+
+## Usage and Limitations
+
+### Intended Use
+
+- Solar energy company FAQ chatbot
+- Internal knowledge base assistant
+- Learning / research on domain-specific LoRA fine-tuning with MLX
+
+### Out-of-Scope Use
+
+- General-purpose assistant (use base Llama-3.1 instead)
+- Medical, legal, or financial advice
+- Real-time data retrieval (model has no internet access)
+- Languages other than English
+
+### Known Limitations
+
+- Small dataset (~68 pairs) — may not generalize to all solar topics
+- English only
+- Float16 format requires ~15 GB disk and ~6 GB VRAM / 16 GB RAM
+- Apple Silicon only for MLX inference (use transformers on other platforms)
+
+---
+
+## License
+
+This model is derived from Meta Llama 3.1, which is licensed under the
+[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/).
+Use is subject to Meta's acceptable use policy.