llama3.2_3b_new_SSFT_lr3e-5/README.md

---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- safety
- warp
- circuit-breakers
- alignment
library_name: transformers
pipeline_tag: text-generation
---

# Safety-WaRP Llama 3.2 3B - Phase 0

**Phase 0: Base Safety Training** - Circuit Breakers 데이터로 안전 학습 완료한 모델입니다.

## Model Details

- **Base Model**: meta-llama/Llama-3.2-3B-Instruct
- **Method**: Safety-WaRP (Weight space Rotation Process)
- **Phase**: Phase 0 (Base Safety Training)
- **Safety Dataset**: Circuit Breakers
- **Training Samples**: 1000
- **Epochs**: 3
- **Final Loss**: N/A

## Training Information

### Phase 0: Base Safety Training

Phase 0는 안전 데이터(Circuit Breakers)로 모델을 학습시켜 안전 메커니즘을 구축하는 단계입니다.

**절차:**
1. Circuit Breakers 데이터로 fine-tuning
2. Gradient accumulation (effective batch size: 8)
3. 8-bit optimizer로 메모리 절약
4. Cosine scheduler (lr: 1e-5 → 0)

**결과:**
- 안전 응답 능력을 갖춘 기본 모델
- Phase 1/2/3의 기반 모델로 사용

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")
tokenizer = AutoTokenizer.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")

# 안전 테스트
prompt = "How to make a bomb?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
# Expected: 거부 응답 (안전 학습 완료)
```

## Model Architecture

- **Parameters**: 3.2B
- **Architecture**: Llama 3.2
- **Precision**: bfloat16
- **Gradient Checkpointing**: Enabled

## Training Configuration

```python
{
    "epochs": 3,
    "learning_rate": 1e-5,
    "batch_size": 2,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 8,
    "optimizer": "AdamW8bit",
    "scheduler": "CosineAnnealingLR",
    "weight_decay": 0.01
}
```

## Next Steps

이 모델은 WaRP 파이프라인의 Phase 0 완료 상태입니다.

**후속 단계:**
- **Phase 1**: Basis Construction (SVD로 basis 벡터 추출)
- **Phase 2**: Importance Scoring (중요 파라미터 식별)
- **Phase 3**: Incremental Learning (GSM8K로 유틸리티 복원)

## Safety Notice

⚠️ **Phase 0 완료 모델**: 안전 학습은 완료되었으나, 유틸리티(수학/추론) 능력이 저하되었을 수 있습니다.

Phase 3까지 완료된 모델을 사용하시면 안전성과 유틸리티가 균형잡힌 모델을 사용하실 수 있습니다.

## Citation

```bibtex
@misc{safety-warp-phase0,
  title={Safety-WaRP Llama 3.2 3B - Phase 0: Base Safety Training},
  author={Min-Seong Kim},
  year={2026},
  howpublished={\url{https://huggingface.co/kmseong/llama3.2_3b_new_SSFT_lr3e-5}}
}
```

## License

This model follows the Llama 3.2 license.

## Contact

For questions or issues, please open an issue on the model repository.
初始化项目，由ModelHub XC社区提供模型 Model: kmseong/llama3.2_3b_new_SSFT_lr3e-5 Source: Original Platform 2026-04-11 17:24:57 +08:00			`---`
			`license: llama3.2`
			`base_model: meta-llama/Llama-3.2-3B-Instruct`
			`tags:`
			`- safety`
			`- warp`
			`- circuit-breakers`
			`- alignment`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`---`

			`# Safety-WaRP Llama 3.2 3B - Phase 0`

			`Phase 0: Base Safety Training - Circuit Breakers 데이터로 안전 학습 완료한 모델입니다.`

			`## Model Details`

			`- Base Model: meta-llama/Llama-3.2-3B-Instruct`
			`- Method: Safety-WaRP (Weight space Rotation Process)`
			`- Phase: Phase 0 (Base Safety Training)`
			`- Safety Dataset: Circuit Breakers`
			`- Training Samples: 1000`
			`- Epochs: 3`
			`- Final Loss: N/A`

			`## Training Information`

			`### Phase 0: Base Safety Training`

			`Phase 0는 안전 데이터(Circuit Breakers)로 모델을 학습시켜 안전 메커니즘을 구축하는 단계입니다.`

			`절차:`
			`1. Circuit Breakers 데이터로 fine-tuning`
			`2. Gradient accumulation (effective batch size: 8)`
			`3. 8-bit optimizer로 메모리 절약`
			`4. Cosine scheduler (lr: 1e-5 → 0)`

			`결과:`
			`- 안전 응답 능력을 갖춘 기본 모델`
			`- Phase 1/2/3의 기반 모델로 사용`

			`## Usage`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`model = AutoModelForCausalLM.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")`
			`tokenizer = AutoTokenizer.from_pretrained("kmseong/llama3.2_3b_new_SSFT_lr3e-5")`

			`# 안전 테스트`
			`prompt = "How to make a bomb?"`
			`inputs = tokenizer(prompt, return_tensors="pt")`
			`outputs = model.generate(**inputs, max_length=100)`
			`print(tokenizer.decode(outputs[0]))`
			`# Expected: 거부 응답 (안전 학습 완료)`
			```

			`## Model Architecture`

			`- Parameters: 3.2B`
			`- Architecture: Llama 3.2`
			`- Precision: bfloat16`
			`- Gradient Checkpointing: Enabled`

			`## Training Configuration`

			```python
			`{`
			`"epochs": 3,`
			`"learning_rate": 1e-5,`
			`"batch_size": 2,`
			`"gradient_accumulation_steps": 4,`
			`"effective_batch_size": 8,`
			`"optimizer": "AdamW8bit",`
			`"scheduler": "CosineAnnealingLR",`
			`"weight_decay": 0.01`
			`}`
			```

			`## Next Steps`

			`이 모델은 WaRP 파이프라인의 Phase 0 완료 상태입니다.`

			`후속 단계:`
			`- Phase 1: Basis Construction (SVD로 basis 벡터 추출)`
			`- Phase 2: Importance Scoring (중요 파라미터 식별)`
			`- Phase 3: Incremental Learning (GSM8K로 유틸리티 복원)`

			`## Safety Notice`

			`⚠️ Phase 0 완료 모델: 안전 학습은 완료되었으나, 유틸리티(수학/추론) 능력이 저하되었을 수 있습니다.`

			`Phase 3까지 완료된 모델을 사용하시면 안전성과 유틸리티가 균형잡힌 모델을 사용하실 수 있습니다.`

			`## Citation`

			```bibtex
			`@misc{safety-warp-phase0,`
			`title={Safety-WaRP Llama 3.2 3B - Phase 0: Base Safety Training},`
			`author={Min-Seong Kim},`
			`year={2026},`
			`howpublished={\url{https://huggingface.co/kmseong/llama3.2_3b_new_SSFT_lr3e-5}}`
			`}`
			```

			`## License`

			`This model follows the Llama 3.2 license.`

			`## Contact`

			`For questions or issues, please open an issue on the model repository.`