初始化项目，由ModelHub XC社区提供模型

Model: allura-org/remnant-qwen3-8b Source: Original Platform
2026-06-01 14:09:56 +08:00
commit 32469acf9a
16 changed files with 152430 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,123 @@
+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen3-8B-Base
+tags:
+- roleplay
+- conversational
+- axolotl
+- qwen
+new_version: Aurore-Reveil/Koto-Small-7B-IT
+---
+
+# Remnant Qwen3 8b (series 1)
+
+[English](./README.md) | [简体中文](./README-cn.md)
+
+*There's a wisp of dust in the air. It feels like its from a bygone era, but you don't know where from. It lands on your tongue. It tastes nice.*
+
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/_ovgodU331FO4YAqFGCnk.png)
+
+Remnant is a series of finetuned LLMs focused on SFW and NSFW roleplaying and conversation.
+
+## Quants
+GGUF:
+- Todo!
+
+EXL3:
+- Todo!
+
+EXL2:
+- Todo!
+
+MISC:
+- Todo!
+
+## Recommended Settings
+Chat template: ChatML. Apparently Llama 3 format works too, though? Ymmv :3  
+Samplers:
+- `0.8` temp
+- `0.1` min_p
+- `0.5` presence penalty
+
+## Credits
+Humongous thanks to Allura, ilya <3  
+Big thanks to the developers of Axolotl (whose training framework I used), Tongyi Qianwen/Qwen/Alibaba (whose model I used), Prime Intellect (whose GPUs I used), and my bank (whose debit card I used)
+
+## Misc
+
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+
+axolotl version: `0.10.0.dev0`
+```yaml
+# === Model Configuration ===
+base_model: Qwen/Qwen3-8B-Base
+load_in_8bit: false
+load_in_4bit: false
+
+# === Training Setup ===
+num_epochs: 2
+micro_batch_size: 32
+gradient_accumulation_steps: 1
+sequence_len: 8192
+sample_packing: true
+pad_to_sequence_len: true
+
+# === Hyperparameter Configuration ===
+optimizer: apollo_adamw_layerwise
+# Apollo-mini configuration:
+optim_args: "proj=random,rank=1,scale=128.0,scale_type=tensor,update_proj_gap=200"
+# Regular Apollo configuration:
+# optim_args: 
+optim_target_modules: all_linear
+learning_rate: 2e-5
+lr_scheduler: rex
+weight_decay: 0.01
+warmup_ratio: 0
+
+# === Data Configuration ===
+datasets:
+  - path: allura-org/inkmix-v3.0
+    type: chat_template
+    split: train
+    field_messages: conversations
+    message_field_role: from
+    message_field_content: value
+
+dataset_prepared_path: last_run_prepared
+chat_template: chatml
+
+# === Plugins ===
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+
+# === Hardware Optimization ===
+gradient_checkpointing: unsloth
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+cut_cross_entropy: true
+
+# === Wandb Tracking ===
+wandb_project: qwen3-8b-inkmix-v3
+
+# === Checkpointing ===
+saves_per_epoch: 2
+save_total_limit: 3
+
+# === Advanced Settings ===
+output_dir: /ephemeral/ckpts
+bf16: auto
+flash_attention: true
+train_on_inputs: false
+group_by_length: false
+logging_steps: 1
+trust_remote_code: true
+
+```
+
+</details>