初始化项目,由ModelHub XC社区提供模型
Model: Aurore-Reveil/Koto-Small-7B-IT Source: Original Platform
This commit is contained in:
192
README.md
Normal file
192
README.md
Normal file
@@ -0,0 +1,192 @@
|
||||
---
|
||||
license: mit
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- allura-org/Koto-Small-7B-PT
|
||||
library_name: transformers
|
||||
tags:
|
||||
- writing
|
||||
- creative-writing
|
||||
- roleplay
|
||||
---
|
||||
|
||||
# Koto Small 7B (Instruct-Tuned)
|
||||
|
||||

|
||||
|
||||
Koto-Small-7B-IT is an instruct-tuned version of [Koto-Small-7B-PT](https://huggingface.co/allura-org/Koto-Small-7B-PT), which was trained on MiMo-7B-Base for almost a billion tokens of creative-writing data. This model is meant for roleplaying and instruct usecases.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
### Chat template
|
||||
|
||||
Trained with ChatML formatting, A typical input would look like this:
|
||||
|
||||
```
|
||||
<|im_start|>system
|
||||
system prompt<|im_end|>
|
||||
<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
```
|
||||
|
||||
## Samplers
|
||||
|
||||
We found that 1.25 temperature and 0.05 min_p worked best, but YMMV!
|
||||
|
||||
## Datasets
|
||||
|
||||
```yaml
|
||||
datasets:
|
||||
- path: Delta-Vector/Hydrus-General-Reasoning
|
||||
- path: Delta-Vector/Hydrus-IF-Mix-Ai2
|
||||
- path: Delta-Vector/Hydrus-Army-Inst
|
||||
- path: Delta-Vector/Hydrus-AM-thinking-Science
|
||||
- path: Delta-Vector/Hydrus-AM-Thinking-Code-Filtered
|
||||
- path: Delta-Vector/Hydrus-AM-Thinking-IF-No-Think
|
||||
- path: Delta-Vector/Hydrus-Tulu-SFT-Mix-V2
|
||||
- path: Delta-Vector/Hydrus-System-Chat-2.0
|
||||
- path: Delta-Vector/Orion-Praxis-Co-Writer
|
||||
- path: Delta-Vector/Orion-Co-Writer-51K
|
||||
- path: Delta-Vector/Orion-Creative_Writing-Complexity
|
||||
- path: Delta-Vector/Orion-vanilla-backrooms-claude-sharegpt
|
||||
- path: Delta-Vector/Hydrus-AM-Thinking-Multi-Turn
|
||||
- path: PocketDoc/Dans-Failuremaxx-Adventure
|
||||
- path: PocketDoc/Dans-Logicmaxx-SAT-AP
|
||||
- path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
|
||||
- path: PocketDoc/Dans-Taskmaxx-DataPrepper
|
||||
- path: PocketDoc/Dans-Prosemaxx-Instructwriter-Long
|
||||
- path: PocketDoc/Dans-Prosemaxx-InstructWriter-ZeroShot-2
|
||||
- path: PocketDoc/Dans-Prosemaxx-InstructWriter-ZeroShot-3
|
||||
- path: PocketDoc/Dans-Prosemaxx-InstructWriter-Continue-2
|
||||
- path: PocketDoc/Dans-Systemmaxx
|
||||
```
|
||||
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
- Thank you very much to [Delta-Vector](https://huggingface.co/Delta-Vector)/[Mango](https://x.com/MangoSweet78) for providing the compute used to train this model.
|
||||
- Fizz for the pretrain.
|
||||
- Pocketdoc/Anthracite for da cool datasets.
|
||||
- Hensen chat.
|
||||
- Thank you to the illustrator of WataNare for drawing the art used in the model card!
|
||||
- Thanks to Curse for testing, ideas.
|
||||
- Thanks to Toasty for some data, ideas.
|
||||
- Thanks to everyone else in allura!
|
||||
|
||||
ilya <3
|
||||
|
||||
## Call for Help
|
||||
If you would like to help build on this model (RP SFT, further annealing on higher quality data, etc)...
|
||||
|
||||
Please join [the allura discord](https://discord.gg/PPBMhF2vgC) or [the matrix](https://matrix.to/#/#allura:allura.moe)! <3
|
||||
|
||||
## Technical Appendix
|
||||
<details>
|
||||
|
||||
### Training Notes
|
||||
|
||||
Same as before, It was trained over the course of 12 hours for over 2 epochs, on an 8xA100 DGX node, Using Ademamix and REX LR schedular, High grad-clipping was used for regularization with NO WEIGHTDECAY because it sucks.
|
||||
|
||||
### [WandB](https://wandb.ai/new-eden/Koto-Small/runs/fgln5fjh?nw=nwuserdeltavector)
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
### Axolotl Config
|
||||
```yaml
|
||||
# =============================================================================
|
||||
# Model + Saving
|
||||
# =============================================================================
|
||||
base_model: allura-forge/Koto-Small-7b-rc1
|
||||
output_dir: ./koto-sft
|
||||
saves_per_epoch: 2
|
||||
deepcompile: true
|
||||
# =============================================================================
|
||||
# DATASET CONFIGURATION
|
||||
# =============================================================================
|
||||
datasets:
|
||||
- path: /home/Ubuntu/Mango/pretok/test-koto-sft-7b-rc-1.parquet
|
||||
ds_type: parquet
|
||||
type:
|
||||
|
||||
shuffle_merged_datasets: true
|
||||
dataset_prepared_path: ./dataset_prepared
|
||||
train_on_inputs: false
|
||||
|
||||
# =============================================================================
|
||||
# EVALUATION SETTINGS
|
||||
# =============================================================================
|
||||
#evals_per_epoch: 4
|
||||
#eval_table_size:
|
||||
#eval_max_new_tokens: 128
|
||||
#eval_sample_packing: false
|
||||
val_set_size: 0.0
|
||||
|
||||
# =============================================================================
|
||||
# MEMORY OPTIMIZATION
|
||||
# =============================================================================
|
||||
plugins:
|
||||
- axolotl.integrations.liger.LigerPlugin
|
||||
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
|
||||
liger_rope: true
|
||||
liger_rms_norm: true
|
||||
liger_layer_norm: true
|
||||
liger_glu_activation: true
|
||||
liger_fused_linear_cross_entropy: false
|
||||
cut_cross_entropy: true
|
||||
sample_packing: true
|
||||
pad_to_sequence_len: true
|
||||
gradient_checkpointing: true
|
||||
flash_attention: true
|
||||
|
||||
# =============================================================================
|
||||
# MULTI-GPU TRAINING
|
||||
# =============================================================================
|
||||
deepspeed: ./deepspeed_configs/zero2.json
|
||||
|
||||
# =============================================================================
|
||||
# LOGGING & MONITORING
|
||||
# =============================================================================
|
||||
wandb_project: Koto-Small
|
||||
wandb_entity:
|
||||
wandb_watch:
|
||||
wandb_name: sft
|
||||
wandb_log_model:
|
||||
logging_steps: 1
|
||||
debug: false
|
||||
|
||||
# =============================================================================
|
||||
# TRAINING PARAMETERS
|
||||
# =============================================================================
|
||||
micro_batch_size: 6
|
||||
gradient_accumulation_steps: 2
|
||||
num_epochs: 2
|
||||
sequence_len: 16000
|
||||
optimizer: paged_ademamix_8bit
|
||||
lr_scheduler: rex
|
||||
learning_rate: 8e-6
|
||||
warmup_ratio: 0.1
|
||||
max_grad_norm: 0.0001
|
||||
weight_decay: 0.0
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# ADDITIONAL SETTINGS
|
||||
# =============================================================================
|
||||
local_rank:
|
||||
group_by_length: false
|
||||
early_stopping_patience:
|
||||
save_safetensors: true
|
||||
bf16: auto
|
||||
special_tokens:
|
||||
```
|
||||
|
||||
</details>
|
||||
Reference in New Issue
Block a user