初始化项目，由ModelHub XC社区提供模型

Model: kmseong/llama3.2_3b_new_SSFT_lr3e-5 Source: Original Platform
2026-04-11 17:24:57 +08:00
commit 41a10c6817
10 changed files with 2544 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,113 @@
+---
+license: llama3.2
+base_model: meta-llama/Llama-3.2-3B-Instruct
+tags:
+- safety
+- warp
+- circuit-breakers
+- alignment
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# Safety-WaRP Llama 3.2 3B - Phase 0
+
+**Phase 0: Base Safety Training** - Circuit Breakers 데이터로 안전 학습 완료한 모델입니다.
+
+## Model Details
+
+- **Base Model**: meta-llama/Llama-3.2-3B-Instruct
+- **Method**: Safety-WaRP (Weight space Rotation Process)
+- **Phase**: Phase 0 (Base Safety Training)
+- **Safety Dataset**: Circuit Breakers
+- **Training Samples**: 1000
+- **Epochs**: 3
+- **Final Loss**: N/A
+
+## Training Information
+
+### Phase 0: Base Safety Training
+
+Phase 0는 안전 데이터(Circuit Breakers)로 모델을 학습시켜 안전 메커니즘을 구축하는 단계입니다.
+
+**절차:**
+1. Circuit Breakers 데이터로 fine-tuning
+2. Gradient accumulation (effective batch size: 8)
+3. 8-bit optimizer로 메모리 절약
+4. Cosine scheduler (lr: 1e-5 → 0)
+
+**결과:**
+- 안전 응답 능력을 갖춘 기본 모델
+- Phase 1/2/3의 기반 모델로 사용
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")
+tokenizer = AutoTokenizer.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")
+
+# 안전 테스트
+prompt = "How to make a bomb?"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100)
+print(tokenizer.decode(outputs[0]))
+# Expected: 거부 응답 (안전 학습 완료)
+```
+
+## Model Architecture
+
+- **Parameters**: 3.2B
+- **Architecture**: Llama 3.2
+- **Precision**: bfloat16
+- **Gradient Checkpointing**: Enabled
+
+## Training Configuration
+
+```python
+{
+    "epochs": 3,
+    "learning_rate": 1e-5,
+    "batch_size": 2,
+    "gradient_accumulation_steps": 4,
+    "effective_batch_size": 8,
+    "optimizer": "AdamW8bit",
+    "scheduler": "CosineAnnealingLR",
+    "weight_decay": 0.01
+}
+```
+
+## Next Steps
+
+이 모델은 WaRP 파이프라인의 Phase 0 완료 상태입니다.
+
+**후속 단계:**
+- **Phase 1**: Basis Construction (SVD로 basis 벡터 추출)
+- **Phase 2**: Importance Scoring (중요 파라미터 식별)
+- **Phase 3**: Incremental Learning (GSM8K로 유틸리티 복원)
+
+## Safety Notice
+
+⚠️ **Phase 0 완료 모델**: 안전 학습은 완료되었으나, 유틸리티(수학/추론) 능력이 저하되었을 수 있습니다.
+
+Phase 3까지 완료된 모델을 사용하시면 안전성과 유틸리티가 균형잡힌 모델을 사용하실 수 있습니다.
+
+## Citation
+
+```bibtex
+@misc{safety-warp-phase0,
+  title={Safety-WaRP Llama 3.2 3B - Phase 0: Base Safety Training},
+  author={Min-Seong Kim},
+  year={2026},
+  howpublished={\url{https://huggingface.co/kmseong/llama3.2_3b_new_SSFT_lr3e-5}}
+}
+```
+
+## License
+
+This model follows the Llama 3.2 license.
+
+## Contact
+
+For questions or issues, please open an issue on the model repository.