初始化项目，由ModelHub XC社区提供模型

Model: ahczhg/Llama-3.2-1B-Aegis-SFT-DPO Source: Original Platform
2026-06-03 23:24:19 +08:00
commit fec258e892
9 changed files with 2643 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,476 @@
 ---
 language:
 - en
 license: llama3.2
 base_model: meta-llama/Llama-3.2-1B
 tags:
 - llama-3.2
 - fine-tuned
 - sft
 - dpo
 - content-safety
 - aegis
 - trl
 - peft
 - lora
 - rlhf
 library_name: transformers
 pipeline_tag: text-generation
 datasets:
 - nvidia/Aegis-AI-Content-Safety-Dataset-2.0
 widget:
 - text: "What is artificial intelligence?"
  example_title: "AI Question"
 - text: "How can I learn programming?"
  example_title: "Learning Question"
 - text: "Explain quantum computing in simple terms."
  example_title: "Complex Topic"
 ---
 # Llama-3.2-1B-Aegis-SFT-DPO
 <div align="center">
  <strong>Fine-tuned Llama 3.2 1B for Content-Safe Instruction Following</strong>
 </div>
 <br>
 This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) using a **two-stage training approach**:
 1. **Supervised Fine-Tuning (SFT)** - Teaching the model to follow instructions
 2. **Direct Preference Optimization (DPO)** - Aligning with human preferences for safety
 ## 🎯 Model Description
 - **Base Model**: meta-llama/Llama-3.2-1B
 - **Fine-tuning Method**: SFT + DPO (RLHF approach)
 - **Dataset**: [nvidia/Aegis-AI-Content-Safety-Dataset-2.0](https://huggingface.co/nvidia/Aegis-AI-Content-Safety-Dataset-2.0)
 - **Training Samples**: 500
 - **Focus**: Content safety and responsible AI responses
 - **Architecture**: Parameter Efficient Fine-Tuning (LoRA)
 - **Model Size**: ~1B parameters
 - **Quantization**: 4-bit during training, full precision release
 ## 🚀 Quick Start
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 # Load model and tokenizer
 model_name = "ahczhg/Llama-3.2-1B-Aegis-SFT-DPO"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
 )
 # Prepare messages
 messages = [
    {"role": "user", "content": "What is artificial intelligence?"}
 ]
 # Apply chat template and generate
 inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
 ).to(model.device)
 outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
 )
 # Decode response
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 ```
 ## 📊 Training Details
 ### Dataset Information
 - **Source**: NVIDIA Aegis AI Content Safety Dataset 2.0
 - **Total Samples Used**: 500
 - **SFT Split**: 400 samples (~80%)
 - **DPO Split**: 100 samples (~20%)
 - **Data Filtering**: Removed redacted prompts and invalid entries
 - **Format**: Conversational pairs with safety labels
 ### Training Methodology
 This model follows a two-stage approach similar to RLHF (Reinforcement Learning from Human Feedback), inspired by [AMD's Instella-3B-Instruct](https://huggingface.co/amd/Instella-3B-Instruct):
 #### Stage 1: Supervised Fine-Tuning (SFT)
 Teaching the model to follow the instruction format and generate appropriate responses.
 **Hyperparameters**:
 ```yaml
 Epochs: 2
 Batch Size: 1
 Gradient Accumulation: 8
 Effective Batch Size: 8
 Learning Rate: 1e-5
 Optimizer: AdamW
 LR Scheduler: Cosine
 Warmup Steps: 100
 Weight Decay: 0.1
 Max Gradient Norm: 1.0
 Precision: BF16
 Gradient Checkpointing: True
 ```
 #### Stage 2: Direct Preference Optimization (DPO)
 Optimizing the model to prefer safe, helpful responses over problematic ones using preference learning.
 **Hyperparameters**:
 ```yaml
 Epochs: 1
 Batch Size: 1
 Gradient Accumulation: 8
 Effective Batch Size: 8
 Learning Rate: 5e-7
 Beta (DPO): 0.1
 Max Prompt Length: 512
 Max Sequence Length: 1024
 Optimizer: AdamW
 LR Scheduler: Cosine
 Warmup Ratio: 10%
 Precision: BF16
 ```
 ### LoRA Configuration
 Parameter-efficient fine-tuning using Low-Rank Adaptation:
 ```yaml
 Rank (r): 8
 Alpha: 16
 Dropout: 0.05
 Target Modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
 Bias: none
 Task Type: CAUSAL_LM
 Trainable Parameters: ~0.5% of total
 ```
 ### Training Infrastructure
 - **Platform**: Google Colab
 - **GPU**: NVIDIA T4 (16GB VRAM)
 - **Training Quantization**: 4-bit NF4 with double quantization
 - **Gradient Checkpointing**: Enabled for memory efficiency
 - **Final Model Format**: Full precision (merged LoRA adapters)
 - **Total Training Time**: ~30-50 minutes
 ## 💻 Advanced Usage
 ### Multi-turn Conversation
 ```python
 messages = [
    {"role": "user", "content": "What is machine learning?"},
    {"role": "assistant", "content": "Machine learning is a subset of AI..."},
    {"role": "user", "content": "Can you give me an example?"}
 ]
 inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
 outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, top_p=0.9, do_sample=True)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ### Streaming Generation
 ```python
 from transformers import TextIteratorStreamer
 from threading import Thread
 streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)
 generation_kwargs = dict(
    inputs=inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    streamer=streamer,
    pad_token_id=tokenizer.eos_token_id
 )
 thread = Thread(target=model.generate, kwargs=generation_kwargs)
 thread.start()
 for new_text in streamer:
    print(new_text, end="", flush=True)
 thread.join()
 ```
 ### Batch Inference
 ```python
 prompts = [
    "Explain neural networks",
    "What is deep learning?",
    "How does backpropagation work?"
 ]
 messages_batch = [[{"role": "user", "content": p}] for p in prompts]
 # Tokenize all at once
 inputs = tokenizer.apply_chat_template(
    messages_batch,
    add_generation_prompt=True,
    return_tensors="pt",
    padding=True
 ).to(model.device)
 # Generate
 outputs = model.generate(inputs, max_new_tokens=200, temperature=0.7, pad_token_id=tokenizer.eos_token_id)
 # Decode all
 for output in outputs:
    print(tokenizer.decode(output, skip_special_tokens=True))
    print("-" * 80)
 ```
 ### Custom Generation Parameters
 ```python
 # More creative
 outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.9,      # Higher = more creative
    top_p=0.95,
    top_k=50,
    do_sample=True,
    repetition_penalty=1.1
 )
 # More focused/deterministic
 outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.3,      # Lower = more focused
    top_p=0.85,
    do_sample=True,
    repetition_penalty=1.05
 )
 ```
 ## 🎨 Chat Template Format
 The model uses Llama 3.2's official chat format with special tokens:
 ```
 <|start_header_id|>user<|end_header_id|>
 Your question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>
 Model response here<|eot_id|>
 ```
 The tokenizer's `apply_chat_template` method handles this automatically.
 ## 📈 Intended Use Cases
 ### ✅ Recommended Applications
 - **Educational Tools**: Safe, informative responses for learning
 - **Content Safety Research**: Studying AI alignment and safety
 - **Prototype Development**: Building conversational AI systems
 - **Instruction Following**: General-purpose task completion
 - **Safe Text Generation**: Content-aware generation tasks
 ### ❌ Out-of-Scope Use
 - **Production Systems**: Without additional safety validation
 - **High-Stakes Decisions**: Medical, legal, financial advice
 - **Unsupervised Deployment**: Without human oversight
 - **Harmful Content**: Generating dangerous or illegal content
 - **Critical Infrastructure**: Without extensive testing
 ## ⚠️ Limitations and Considerations
 ### Known Limitations
 1. **Training Data**: Only 500 samples - more data could improve performance
 2. **Language**: Primarily English-focused, limited multilingual capability
 3. **Context Length**: Maximum of 1024 tokens
 4. **Model Size**: 1B parameters - smaller than larger models, may have reduced capabilities
 5. **Safety Bounds**: Fine-tuned for safety but not perfect - can still make mistakes
 6. **Domain Knowledge**: Limited to training data cutoff and base model knowledge
 ### Biases and Ethical Considerations
 - Inherits biases from base Llama 3.2 model
 - Safety fine-tuning may make responses overly conservative
 - Content safety dataset has its own biases
 - Not suitable for all cultural contexts without adaptation
 - Should be tested thoroughly before deployment
 ### Performance Notes
 - **Speed**: ~10-20 tokens/second on T4 GPU
 - **Memory**: ~4GB VRAM in BF16, ~2GB with 4-bit quantization
 - **Best For**: General instruction following with safety awareness
 - **Trade-offs**: Safety focus may reduce creativity in some cases
 ## 🔬 Evaluation
 ### Qualitative Assessment
 The model has been tested on:
 - ✅ General knowledge questions
 - ✅ Instruction following tasks
 - ✅ Content safety scenarios
 - ✅ Multi-turn conversations
 - ✅ Edge cases and adversarial prompts
 ### Sample Outputs
 *(Coming soon - add your evaluation results)*
 ### Comparison to Base Model
 | Metric | Base Llama 3.2 | This Model | Improvement |
 |--------|---------------|------------|-------------|
 | Safety Awareness | Baseline | Enhanced | +Safety Focus |
 | Instruction Following | Good | Better | +SFT Training |
 | Response Quality | High | High | +DPO Alignment |
 ## 🛠️ Technical Details
 ### Model Architecture
 - **Base**: Llama 3.2 1B
 - **Vocabulary**: 128,256 tokens
 - **Hidden Size**: 2048
 - **Layers**: 16
 - **Attention Heads**: 32
 - **Parameters**: ~1.23B total, ~6M trainable (LoRA)
 ### Training Efficiency
 - **Trainable Params**: ~0.5% of total (LoRA adapters)
 - **Memory During Training**: ~8GB VRAM (4-bit quantization)
 - **Training Time**: ~40 minutes total (SFT + DPO)
 - **Hardware Cost**: Free tier Google Colab (T4 GPU)
 ### Optimization Techniques
 - ✅ 4-bit NF4 quantization
 - ✅ Gradient checkpointing
 - ✅ LoRA parameter-efficient fine-tuning
 - ✅ Gradient accumulation
 - ✅ BF16 mixed precision
 - ✅ Optimized memory management
 ## 🙏 Acknowledgments
 - **Base Model**: Meta's Llama 3.2 team for the foundation model
 - **Dataset**: NVIDIA for the Aegis AI Content Safety Dataset
 - **Methodology**: AMD for the Instella training approach inspiration
 - **Frameworks**: 
  - Hugging Face Transformers, TRL, PEFT, Datasets
  - PyTorch team
  - Google Colab for compute resources
 ## 📄 License
 This model is licensed under the **Llama 3.2 Community License**:
 - Commercial use allowed with restrictions
 - Attribution required
 - Cannot be used to train other models without permission
 - Full license: https://huggingface.co/meta-llama/Llama-3.2-1B
 ## 📚 Citations
 ### This Model
 ```bibtex
@misc{llama_3.2_1b_aegis_sft_dpo,
  author = {Community Contributor},
  title = {Llama-3.2-1B-Aegis-SFT-DPO: Content-Safe Fine-tuned Llama 3.2},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/ahczhg/Llama-3.2-1B-Aegis-SFT-DPO}}
 }
 ```
 ### Base Model
 ```bibtex
@misc{llama32,
  title={Llama 3.2: Open Foundation and Fine-Tuned Chat Models},
  author={Meta AI},
  year={2024},
  url={https://huggingface.co/meta-llama/Llama-3.2-1B}
 }
 ```
 ### Dataset
 ```bibtex
@misc{aegis_dataset,
  title={Aegis AI Content Safety Dataset 2.0},
  author={NVIDIA},
  year={2024},
  url={https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}
 }
 ```
 ## 🔗 Links
 - **Base Model**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
 - **Dataset**: [nvidia/Aegis-AI-Content-Safety-Dataset-2.0](https://huggingface.co/nvidia/Aegis-AI-Content-Safety-Dataset-2.0)
 - **TRL Library**: [Hugging Face TRL](https://github.com/huggingface/trl)
 - **PEFT Library**: [Hugging Face PEFT](https://github.com/huggingface/peft)
 ## 📞 Feedback & Support
 Found an issue or have suggestions? Please:
 - Open an issue on the model repository
 - Report safety concerns immediately
 - Share your use cases and results
 ---
 **Model Card Version**: 1.0  
 **Last Updated**: 2025-11-15  
 **Training Date**: 2025-11-15
 **Framework Versions**:
 - 🤗 Transformers: `4.57.1`
 - 🔥 PyTorch: `2.8.0+cu126`
 - 🎯 TRL: `0.25.1`
 - 🔧 PEFT: `0.17.1`
 - 📊 Datasets: `4.0.0`
 **Compute**:
 - Platform: Google Colab
 - GPU: NVIDIA T4 (16GB)
 - Training Duration: ~40-50 minutes
 - Carbon Footprint: Minimal (free tier compute)
 ---
 <div align="center">
  <sub>Built with ❤️ using Hugging Face libraries | Trained on Google Colab | Released under Llama 3.2 License</sub>
 </div>
 [![Support me on Ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/ahczhg)
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1 @@
 {% for message in messages %}{% if message['role'] == 'user' %}{{ '<|start_header_id|>user<|end_header_id|>\n\n' + message['content'] + '<|eot_id|>' }}{% elif message['role'] == 'assistant' %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' + message['content'] + '<|eot_id|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,35 @@
 {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "dtype": "bfloat16",
  "eos_token_id": 128001,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 16,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "transformers_version": "4.57.1",
  "use_cache": true,
  "vocab_size": 128256
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,9 @@
 {
  "_from_model_config": true,
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": 128001,
  "temperature": 0.6,
  "top_p": 0.9,
  "transformers_version": "4.57.1"
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7d4a8e8fc3ba2debf7aba9f0656616fd3aeca64ec9026f4eb8d3e5f30835ea98
 size 2471645608
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,17 @@
 {
  "bos_token": {
    "content": "<|begin_of_text|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "<|end_of_text|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": "<|end_of_text|>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
		`@@ -0,0 +1 @@`
							`{% for message in messages %}{% if message['role'] == 'user' %}{{ '<\|start_header_id\|>user<\|end_header_id\|>\n\n' + message['content'] + '<\|eot_id\|>' }}{% elif message['role'] == 'assistant' %}{{ '<\|start_header_id\|>assistant<\|end_header_id\|>\n\n' + message['content'] + '<\|eot_id\|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<\|start_header_id\|>assistant<\|end_header_id\|>\n\n' }}{% endif %}`