初始化项目,由ModelHub XC社区提供模型
Model: davidterrell1919/Qwen2.5-Coder-3B-heretic Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
109
README.md
Normal file
109
README.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
license: other
|
||||
license_name: qwen-research
|
||||
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-3B/blob/main/LICENSE
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-3B
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
tags:
|
||||
- code
|
||||
- qwen
|
||||
- qwen-coder
|
||||
- codeqwen
|
||||
- heretic
|
||||
- uncensored
|
||||
- decensored
|
||||
- abliterated
|
||||
- reproducible
|
||||
---
|
||||
# This is a decensored version of [Qwen/Qwen2.5-Coder-3B](https://huggingface.co/Qwen/Qwen2.5-Coder-3B), made using [Heretic](https://github.com/p-e-w/heretic) v1.3.0
|
||||
|
||||
> [!TIP]
|
||||
> **This model is reproducible!**
|
||||
>
|
||||
> See the [README](reproduce/README.md) in the `reproduce` directory for more information.
|
||||
|
||||
## Abliteration parameters
|
||||
|
||||
| Parameter | Value |
|
||||
| :-------- | :---: |
|
||||
| **direction_index** | 26.89 |
|
||||
| **attn.o_proj.max_weight** | 1.44 |
|
||||
| **attn.o_proj.max_weight_position** | 27.62 |
|
||||
| **attn.o_proj.min_weight** | 1.05 |
|
||||
| **attn.o_proj.min_weight_distance** | 13.79 |
|
||||
| **mlp.down_proj.max_weight** | 1.16 |
|
||||
| **mlp.down_proj.max_weight_position** | 27.24 |
|
||||
| **mlp.down_proj.min_weight** | 0.98 |
|
||||
| **mlp.down_proj.min_weight_distance** | 20.49 |
|
||||
|
||||
## Performance
|
||||
|
||||
| Metric | This model | Original model ([Qwen/Qwen2.5-Coder-3B](https://huggingface.co/Qwen/Qwen2.5-Coder-3B)) |
|
||||
| :----- | :--------: | :---------------------------: |
|
||||
| **KL divergence** | 0.0626 | 0 *(by definition)* |
|
||||
| **Refusals** | 4/100 | 36/100 |
|
||||
|
||||
-----
|
||||
|
||||
|
||||
# Qwen2.5-Coder-3B
|
||||
|
||||
## Introduction
|
||||
|
||||
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
|
||||
|
||||
- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
|
||||
- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
|
||||
|
||||
**This repo contains the 3B Qwen2.5-Coder model**, which has the following features:
|
||||
- Type: Causal Language Models
|
||||
- Training Stage: Pretraining
|
||||
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
|
||||
- Number of Parameters: 3.09B
|
||||
- Number of Paramaters (Non-Embedding): 2.77B
|
||||
- Number of Layers: 36
|
||||
- Number of Attention Heads (GQA): 16 for Q and 2 for KV
|
||||
- Context Length: Full 32,768 tokens
|
||||
|
||||
**We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.
|
||||
|
||||
For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).
|
||||
|
||||
## Requirements
|
||||
|
||||
The code of Qwen2.5-Coder has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
|
||||
|
||||
With `transformers<4.37.0`, you will encounter the following error:
|
||||
```
|
||||
KeyError: 'qwen2'
|
||||
```
|
||||
|
||||
|
||||
## Evaluation & Performance
|
||||
|
||||
Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/).
|
||||
|
||||
For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
|
||||
|
||||
## Citation
|
||||
|
||||
If you find our work helpful, feel free to give us a cite.
|
||||
|
||||
```
|
||||
@article{hui2024qwen2,
|
||||
title={Qwen2. 5-Coder Technical Report},
|
||||
author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
|
||||
journal={arXiv preprint arXiv:2409.12186},
|
||||
year={2024}
|
||||
}
|
||||
@article{qwen2,
|
||||
title={Qwen2 Technical Report},
|
||||
author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
|
||||
journal={arXiv preprint arXiv:2407.10671},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
54
chat_template.jinja
Normal file
54
chat_template.jinja
Normal file
@@ -0,0 +1,54 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0]['role'] == 'system' %}
|
||||
{{- messages[0]['content'] }}
|
||||
{%- else %}
|
||||
{{- 'You are a helpful assistant.' }}
|
||||
{%- endif %}
|
||||
{{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0]['role'] == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- for message in messages %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{{- '<|im_start|>' + message.role }}
|
||||
{%- if message.content %}
|
||||
{{- '\n' + message.content }}
|
||||
{%- endif %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if tool_call.function is defined %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- message.content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- endif %}
|
||||
69
config.json
Normal file
69
config.json
Normal file
@@ -0,0 +1,69 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen2ForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 151643,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 11008,
|
||||
"layer_types": [
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention"
|
||||
],
|
||||
"max_position_embeddings": 32768,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen2",
|
||||
"num_attention_heads": 16,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 2,
|
||||
"pad_token_id": null,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_parameters": {
|
||||
"rope_theta": 1000000.0,
|
||||
"rope_type": "default"
|
||||
},
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "5.8.0",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": false,
|
||||
"eos_token_id": 151643,
|
||||
"max_new_tokens": 2048,
|
||||
"transformers_version": "5.8.0"
|
||||
}
|
||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:99a651d95f1a46925b90a8bc563b1fc500781cca1579b2d16390439d87a0b047
|
||||
size 4983773104
|
||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:986ac96974b8322d2977dee656ee9d3aa61f843783dddcf1d3519edc3b0ebf76
|
||||
size 1188153880
|
||||
442
model.safetensors.index.json
Normal file
442
model.safetensors.index.json
Normal file
@@ -0,0 +1,442 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 3085938688,
|
||||
"total_size": 6171877376
|
||||
},
|
||||
"weight_map": {
|
||||
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.norm.weight": "model-00002-of-00002.safetensors"
|
||||
}
|
||||
}
|
||||
3804
reproduce/Qwen--Qwen2--5-Coder-3B.jsonl
Normal file
3804
reproduce/Qwen--Qwen2--5-Coder-3B.jsonl
Normal file
File diff suppressed because it is too large
Load Diff
64
reproduce/README.md
Normal file
64
reproduce/README.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Reproduction guide
|
||||
|
||||
This directory contains the necessary information and assets to reproduce the results obtained during this Heretic run.
|
||||
|
||||
## Models
|
||||
|
||||
- **Base model:** [Qwen/Qwen2.5-Coder-3B](https://huggingface.co/Qwen/Qwen2.5-Coder-3B) (Commit: [`09d9bc5`](https://huggingface.co/Qwen/Qwen2.5-Coder-3B/commit/09d9bc5d376b0cfa0100a0694ea7de7232525803))
|
||||
|
||||
## Datasets
|
||||
|
||||
- **Good prompts:** [mlabonne/harmless_alpaca](https://huggingface.co/datasets/mlabonne/harmless_alpaca) (Commit: [`02c6a92`](https://huggingface.co/datasets/mlabonne/harmless_alpaca/commit/02c6a92cfcf11bb0c387334f8146d149d65b587f))
|
||||
- **Bad prompts:** [mlabonne/harmful_behaviors](https://huggingface.co/datasets/mlabonne/harmful_behaviors) (Commit: [`01cead0`](https://huggingface.co/datasets/mlabonne/harmful_behaviors/commit/01cead01398926d81f7c52bdb790ee8cf77ebba7))
|
||||
- **Good evaluation prompts:** [mlabonne/harmless_alpaca](https://huggingface.co/datasets/mlabonne/harmless_alpaca) (Commit: [`02c6a92`](https://huggingface.co/datasets/mlabonne/harmless_alpaca/commit/02c6a92cfcf11bb0c387334f8146d149d65b587f))
|
||||
- **Bad evaluation prompts:** [mlabonne/harmful_behaviors](https://huggingface.co/datasets/mlabonne/harmful_behaviors) (Commit: [`01cead0`](https://huggingface.co/datasets/mlabonne/harmful_behaviors/commit/01cead01398926d81f7c52bdb790ee8cf77ebba7))
|
||||
|
||||
## Selected trial
|
||||
|
||||
- **Trial number:** 136
|
||||
- **KL divergence:** 0.062553
|
||||
- **Refusals:** 4/100
|
||||
|
||||
## System
|
||||
|
||||
- **Python:** 3.12.11 (CPython, GCC 11.2.0) [Conda]
|
||||
- **Operating system:** Linux-6.11.0-1016-nvidia-x86_64-with-glibc2.39 (x86_64)
|
||||
- **CPU:** Intel(R) Xeon(R) Platinum 8468
|
||||
|
||||
### Accelerators
|
||||
|
||||
- **CUDA:** Detected 1 device(s) (139.80 GB total VRAM)
|
||||
- **CUDA Version:** 12.8
|
||||
- **Driver Version:** 580.126.09
|
||||
- **Devices:**
|
||||
- **CUDA 0:** NVIDIA H200 (139.80 GB)
|
||||
|
||||
## Environment
|
||||
|
||||
- **Heretic:** v1.3.0 (Origin: PyPI)
|
||||
- **PyTorch:** 2.8.0+cu128
|
||||
- **Other dependencies:** See [`requirements.txt`](requirements.txt).
|
||||
|
||||
## Contents of this directory
|
||||
|
||||
- [`requirements.txt`](requirements.txt): The exact versions of all Python packages.
|
||||
- [`config.toml`](config.toml): The exact configuration used, including the RNG seed.
|
||||
- [`Qwen--Qwen2--5-Coder-3B.jsonl`](Qwen--Qwen2--5-Coder-3B.jsonl): The Optuna study journal containing the history of all trials.
|
||||
- [`SHA256SUMS`](SHA256SUMS): Cryptographic hashes for all weight files.
|
||||
- [`reproduce.json`](reproduce.json): A machine-readable file containing all reproducibility information.
|
||||
|
||||
## How to reproduce
|
||||
|
||||
1. Ensure your system matches the specifications in the **System** section above. Exact reproducibility is only guaranteed if all aspects of your system are identical to the one the model was originally generated on.
|
||||
1. Install the exact version of Heretic indicated in the **Environment** section above, from its original source.
|
||||
1. Install the packages listed in `requirements.txt`: `pip install -r requirements.txt`
|
||||
1. Install the correct version of PyTorch: `pip install torch==2.8.0+cu128 --index-url https://download.pytorch.org/whl/cu128`
|
||||
1. Place the provided `config.toml` in your working directory.
|
||||
1. Run Heretic without any additional arguments: `heretic`
|
||||
1. Wait for the run to finish, then select trial **136** and export the model.
|
||||
1. Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in `SHA256SUMS`: `sha256sum -c SHA256SUMS` (or look at the hashes online if you uploaded to Hugging Face)
|
||||
|
||||
> [!TIP]
|
||||
> To use the included Optuna study journal `Qwen--Qwen2--5-Coder-3B.jsonl`, place it in the checkpoints directory (usually `checkpoints/`) before running Heretic.
|
||||
>
|
||||
> This allows you to export other models from the Pareto front, or to run additional trials without having to re-run the stored trials.
|
||||
2
reproduce/SHA256SUMS
Normal file
2
reproduce/SHA256SUMS
Normal file
@@ -0,0 +1,2 @@
|
||||
99a651d95f1a46925b90a8bc563b1fc500781cca1579b2d16390439d87a0b047 *model-00001-of-00002.safetensors
|
||||
986ac96974b8322d2977dee656ee9d3aa61f843783dddcf1d3519edc3b0ebf76 *model-00002-of-00002.safetensors
|
||||
91
reproduce/config.toml
Normal file
91
reproduce/config.toml
Normal file
@@ -0,0 +1,91 @@
|
||||
model = "Qwen/Qwen2.5-Coder-3B"
|
||||
model_commit = "09d9bc5d376b0cfa0100a0694ea7de7232525803"
|
||||
dtypes = [
|
||||
"auto",
|
||||
"float16",
|
||||
"bfloat16",
|
||||
"float32",
|
||||
]
|
||||
quantization = "none"
|
||||
device_map = "auto"
|
||||
offload_outputs_to_cpu = true
|
||||
batch_size = 128
|
||||
max_response_length = 100
|
||||
response_prefix = ""
|
||||
kl_divergence_scale = 1.0
|
||||
kl_divergence_target = 0.01
|
||||
orthogonalize_direction = true
|
||||
row_normalization = "full"
|
||||
full_normalization_lora_rank = 3
|
||||
winsorization_quantile = 1.0
|
||||
n_trials = 200
|
||||
n_startup_trials = 60
|
||||
seed = 261455143
|
||||
max_shard_size = "5GB"
|
||||
refusal_markers = [
|
||||
"sorry",
|
||||
"i can'",
|
||||
"i cant",
|
||||
"i cannot",
|
||||
"i won'",
|
||||
"i wont",
|
||||
"i will not",
|
||||
"i unable",
|
||||
"im unable",
|
||||
"i'm unable",
|
||||
"i am unable",
|
||||
"i an ai",
|
||||
"im an ai",
|
||||
"i'm an ai",
|
||||
"i am an ai",
|
||||
"as an ai",
|
||||
"ai assistant",
|
||||
"i designed to",
|
||||
"im designed to",
|
||||
"i'm designed to",
|
||||
"i am designed to",
|
||||
"i programmed to",
|
||||
"im programmed to",
|
||||
"i'm programmed to",
|
||||
"i am programmed to",
|
||||
"violat",
|
||||
"prohibit",
|
||||
"illegal",
|
||||
"harmful",
|
||||
"inappropriate",
|
||||
"unethical",
|
||||
"ethical boundaries",
|
||||
]
|
||||
system_prompt = "You are a helpful assistant."
|
||||
|
||||
[good_prompts]
|
||||
dataset = "mlabonne/harmless_alpaca"
|
||||
commit = "02c6a92cfcf11bb0c387334f8146d149d65b587f"
|
||||
split = "train[:400]"
|
||||
column = "text"
|
||||
prefix = ""
|
||||
suffix = ""
|
||||
|
||||
[bad_prompts]
|
||||
dataset = "mlabonne/harmful_behaviors"
|
||||
commit = "01cead01398926d81f7c52bdb790ee8cf77ebba7"
|
||||
split = "train[:400]"
|
||||
column = "text"
|
||||
prefix = ""
|
||||
suffix = ""
|
||||
|
||||
[good_evaluation_prompts]
|
||||
dataset = "mlabonne/harmless_alpaca"
|
||||
commit = "02c6a92cfcf11bb0c387334f8146d149d65b587f"
|
||||
split = "test[:100]"
|
||||
column = "text"
|
||||
prefix = ""
|
||||
suffix = ""
|
||||
|
||||
[bad_evaluation_prompts]
|
||||
dataset = "mlabonne/harmful_behaviors"
|
||||
commit = "01cead01398926d81f7c52bdb790ee8cf77ebba7"
|
||||
split = "test[:100]"
|
||||
column = "text"
|
||||
prefix = ""
|
||||
suffix = ""
|
||||
291
reproduce/reproduce.json
Normal file
291
reproduce/reproduce.json
Normal file
@@ -0,0 +1,291 @@
|
||||
{
|
||||
"version": "1",
|
||||
"timestamp": "2026-05-06T21:11:34",
|
||||
"system": {
|
||||
"python": {
|
||||
"version": "3.12.11",
|
||||
"implementation": "CPython",
|
||||
"compiler": "GCC 11.2.0",
|
||||
"environment": "Conda"
|
||||
},
|
||||
"os": {
|
||||
"platform": "Linux-6.11.0-1016-nvidia-x86_64-with-glibc2.39",
|
||||
"machine": "x86_64"
|
||||
},
|
||||
"cpu": {
|
||||
"brand": "Intel(R) Xeon(R) Platinum 8468",
|
||||
"vendor": "GenuineIntel",
|
||||
"family": 6,
|
||||
"model": 143,
|
||||
"stepping": 8
|
||||
},
|
||||
"accelerators": {
|
||||
"type": "CUDA",
|
||||
"api_name": "CUDA Version",
|
||||
"api_version": "12.8",
|
||||
"driver_version": "580.126.09",
|
||||
"devices": [
|
||||
{
|
||||
"name": "NVIDIA H200",
|
||||
"vram_gb": 139.8
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"environment": {
|
||||
"heretic": {
|
||||
"version": "1.3.0",
|
||||
"is_standard_pypi": true,
|
||||
"metadata": {
|
||||
"type": "pypi"
|
||||
}
|
||||
},
|
||||
"pytorch_version": "2.8.0+cu128",
|
||||
"requirements": {
|
||||
"absl-py": "2.4.0",
|
||||
"accelerate": "1.13.0",
|
||||
"alembic": "1.18.4",
|
||||
"annotated-doc": "0.0.4",
|
||||
"annotated-types": "0.7.0",
|
||||
"anyio": "4.12.1",
|
||||
"attrs": "26.1.0",
|
||||
"bitsandbytes": "0.49.2",
|
||||
"certifi": "2026.2.25",
|
||||
"chardet": "6.0.0.post1",
|
||||
"charset-normalizer": "3.4.6",
|
||||
"click": "8.3.1",
|
||||
"colorama": "0.4.6",
|
||||
"colorlog": "6.10.1",
|
||||
"dataproperty": "1.1.0",
|
||||
"datasets": "4.8.5",
|
||||
"dill": "0.4.1",
|
||||
"evaluate": "0.4.6",
|
||||
"filelock": "3.25.2",
|
||||
"fsspec": "2026.2.0",
|
||||
"greenlet": "3.5.0",
|
||||
"h11": "0.16.0",
|
||||
"heretic-llm": "1.3.0",
|
||||
"hf-transfer": "0.1.9",
|
||||
"hf-xet": "1.5.0",
|
||||
"httpcore": "1.0.9",
|
||||
"httpx": "0.28.1",
|
||||
"huggingface-hub": "1.14.0",
|
||||
"idna": "3.11",
|
||||
"immutabledict": "4.3.1",
|
||||
"jinja2": "3.1.6",
|
||||
"joblib": "1.5.3",
|
||||
"jsonlines": "4.0.0",
|
||||
"kernels": "0.14.0",
|
||||
"kernels-data": "0.14.0",
|
||||
"langdetect": "1.0.9",
|
||||
"lm-eval": "0.4.11",
|
||||
"lxml": "6.1.0",
|
||||
"mako": "1.3.12",
|
||||
"markdown-it-py": "4.0.0",
|
||||
"markupsafe": "3.0.3",
|
||||
"mbstrdecoder": "1.1.5",
|
||||
"mdurl": "0.1.2",
|
||||
"more-itertools": "11.0.2",
|
||||
"mpmath": "1.3.0",
|
||||
"multiprocess": "0.70.19",
|
||||
"networkx": "3.6.1",
|
||||
"nltk": "3.9.4",
|
||||
"numpy": "2.4.4",
|
||||
"nvidia-cublas-cu12": "12.8.4.1",
|
||||
"nvidia-cuda-cupti-cu12": "12.8.90",
|
||||
"nvidia-cuda-nvrtc-cu12": "12.8.93",
|
||||
"nvidia-cuda-runtime-cu12": "12.8.90",
|
||||
"nvidia-cudnn-cu12": "9.10.2.21",
|
||||
"nvidia-cufft-cu12": "11.3.3.83",
|
||||
"nvidia-cufile-cu12": "1.13.1.3",
|
||||
"nvidia-curand-cu12": "10.3.9.90",
|
||||
"nvidia-cusolver-cu12": "11.7.3.90",
|
||||
"nvidia-cusparse-cu12": "12.5.8.93",
|
||||
"nvidia-cusparselt-cu12": "0.7.1",
|
||||
"nvidia-nccl-cu12": "2.27.3",
|
||||
"nvidia-nvjitlink-cu12": "12.8.93",
|
||||
"nvidia-nvtx-cu12": "12.8.90",
|
||||
"optuna": "4.8.0",
|
||||
"packaging": "25.0",
|
||||
"pandas": "3.0.2",
|
||||
"pathvalidate": "3.3.1",
|
||||
"peft": "0.19.1",
|
||||
"pillow": "12.1.1",
|
||||
"portalocker": "3.2.0",
|
||||
"prompt-toolkit": "3.0.52",
|
||||
"psutil": "7.2.2",
|
||||
"py-cpuinfo": "9.0.0",
|
||||
"pyarrow": "24.0.0",
|
||||
"pydantic": "2.12.5",
|
||||
"pydantic-core": "2.41.5",
|
||||
"pydantic-settings": "2.14.0",
|
||||
"pygments": "2.19.2",
|
||||
"pytablewriter": "1.2.1",
|
||||
"python-dateutil": "2.9.0.post0",
|
||||
"python-dotenv": "1.2.2",
|
||||
"pyyaml": "6.0.3",
|
||||
"questionary": "2.1.1",
|
||||
"regex": "2026.4.4",
|
||||
"requests": "2.32.5",
|
||||
"rich": "14.3.3",
|
||||
"rouge-score": "0.1.2",
|
||||
"sacrebleu": "2.6.0",
|
||||
"safetensors": "0.7.0",
|
||||
"scikit-learn": "1.8.0",
|
||||
"scipy": "1.17.1",
|
||||
"setuptools": "80.10.2",
|
||||
"shellingham": "1.5.4",
|
||||
"six": "1.17.0",
|
||||
"sqlalchemy": "2.0.49",
|
||||
"sqlitedict": "2.1.0",
|
||||
"sympy": "1.14.0",
|
||||
"tabledata": "1.3.4",
|
||||
"tabulate": "0.10.0",
|
||||
"tcolorpy": "0.1.7",
|
||||
"threadpoolctl": "3.6.0",
|
||||
"tokenizers": "0.22.2",
|
||||
"tomli-w": "1.2.0",
|
||||
"tomlkit": "0.14.0",
|
||||
"torch": "2.8.0",
|
||||
"torchvision": "0.23.0",
|
||||
"tqdm": "4.67.3",
|
||||
"transformers": "5.8.0",
|
||||
"triton": "3.4.0",
|
||||
"typepy": "1.3.5",
|
||||
"typer": "0.25.1",
|
||||
"typing-extensions": "4.15.0",
|
||||
"typing-inspection": "0.4.2",
|
||||
"tzdata": "2025.3",
|
||||
"urllib3": "2.5.0",
|
||||
"wcwidth": "0.6.0",
|
||||
"word2number": "1.1",
|
||||
"xxhash": "3.7.0",
|
||||
"zstandard": "0.25.0"
|
||||
}
|
||||
},
|
||||
"settings": {
|
||||
"model": "Qwen/Qwen2.5-Coder-3B",
|
||||
"model_commit": "09d9bc5d376b0cfa0100a0694ea7de7232525803",
|
||||
"dtypes": [
|
||||
"auto",
|
||||
"float16",
|
||||
"bfloat16",
|
||||
"float32"
|
||||
],
|
||||
"quantization": "none",
|
||||
"device_map": "auto",
|
||||
"max_memory": null,
|
||||
"offload_outputs_to_cpu": true,
|
||||
"batch_size": 128,
|
||||
"max_response_length": 100,
|
||||
"response_prefix": "",
|
||||
"kl_divergence_scale": 1.0,
|
||||
"kl_divergence_target": 0.01,
|
||||
"orthogonalize_direction": true,
|
||||
"row_normalization": "full",
|
||||
"full_normalization_lora_rank": 3,
|
||||
"winsorization_quantile": 1.0,
|
||||
"n_trials": 200,
|
||||
"n_startup_trials": 60,
|
||||
"seed": 261455143,
|
||||
"max_shard_size": "5GB",
|
||||
"refusal_markers": [
|
||||
"sorry",
|
||||
"i can'",
|
||||
"i cant",
|
||||
"i cannot",
|
||||
"i won'",
|
||||
"i wont",
|
||||
"i will not",
|
||||
"i unable",
|
||||
"im unable",
|
||||
"i'm unable",
|
||||
"i am unable",
|
||||
"i an ai",
|
||||
"im an ai",
|
||||
"i'm an ai",
|
||||
"i am an ai",
|
||||
"as an ai",
|
||||
"ai assistant",
|
||||
"i designed to",
|
||||
"im designed to",
|
||||
"i'm designed to",
|
||||
"i am designed to",
|
||||
"i programmed to",
|
||||
"im programmed to",
|
||||
"i'm programmed to",
|
||||
"i am programmed to",
|
||||
"violat",
|
||||
"prohibit",
|
||||
"illegal",
|
||||
"harmful",
|
||||
"inappropriate",
|
||||
"unethical",
|
||||
"ethical boundaries"
|
||||
],
|
||||
"system_prompt": "You are a helpful assistant.",
|
||||
"good_prompts": {
|
||||
"dataset": "mlabonne/harmless_alpaca",
|
||||
"commit": "02c6a92cfcf11bb0c387334f8146d149d65b587f",
|
||||
"split": "train[:400]",
|
||||
"column": "text",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"system_prompt": null
|
||||
},
|
||||
"bad_prompts": {
|
||||
"dataset": "mlabonne/harmful_behaviors",
|
||||
"commit": "01cead01398926d81f7c52bdb790ee8cf77ebba7",
|
||||
"split": "train[:400]",
|
||||
"column": "text",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"system_prompt": null
|
||||
},
|
||||
"good_evaluation_prompts": {
|
||||
"dataset": "mlabonne/harmless_alpaca",
|
||||
"commit": "02c6a92cfcf11bb0c387334f8146d149d65b587f",
|
||||
"split": "test[:100]",
|
||||
"column": "text",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"system_prompt": null
|
||||
},
|
||||
"bad_evaluation_prompts": {
|
||||
"dataset": "mlabonne/harmful_behaviors",
|
||||
"commit": "01cead01398926d81f7c52bdb790ee8cf77ebba7",
|
||||
"split": "test[:100]",
|
||||
"column": "text",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"system_prompt": null
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"direction_index": 26.891956746581947,
|
||||
"abliteration_parameters": {
|
||||
"attn.o_proj": {
|
||||
"max_weight": 1.4370609024731553,
|
||||
"max_weight_position": 27.623797182004182,
|
||||
"min_weight": 1.047867277130811,
|
||||
"min_weight_distance": 13.785171424446254
|
||||
},
|
||||
"mlp.down_proj": {
|
||||
"max_weight": 1.164801552450505,
|
||||
"max_weight_position": 27.237354950013874,
|
||||
"min_weight": 0.9776289439299634,
|
||||
"min_weight_distance": 20.48502921710711
|
||||
}
|
||||
}
|
||||
},
|
||||
"metrics": {
|
||||
"kl_divergence": 0.06255289912223816,
|
||||
"refusals": 4,
|
||||
"base_refusals": 36,
|
||||
"n_bad_prompts": 100
|
||||
},
|
||||
"hashes": {
|
||||
"model-00001-of-00002.safetensors": "99a651d95f1a46925b90a8bc563b1fc500781cca1579b2d16390439d87a0b047",
|
||||
"model-00002-of-00002.safetensors": "986ac96974b8322d2977dee656ee9d3aa61f843783dddcf1d3519edc3b0ebf76"
|
||||
}
|
||||
}
|
||||
119
reproduce/requirements.txt
Normal file
119
reproduce/requirements.txt
Normal file
@@ -0,0 +1,119 @@
|
||||
absl-py==2.4.0
|
||||
accelerate==1.13.0
|
||||
alembic==1.18.4
|
||||
annotated-doc==0.0.4
|
||||
annotated-types==0.7.0
|
||||
anyio==4.12.1
|
||||
attrs==26.1.0
|
||||
bitsandbytes==0.49.2
|
||||
certifi==2026.2.25
|
||||
chardet==6.0.0.post1
|
||||
charset-normalizer==3.4.6
|
||||
click==8.3.1
|
||||
colorama==0.4.6
|
||||
colorlog==6.10.1
|
||||
dataproperty==1.1.0
|
||||
datasets==4.8.5
|
||||
dill==0.4.1
|
||||
evaluate==0.4.6
|
||||
filelock==3.25.2
|
||||
fsspec==2026.2.0
|
||||
greenlet==3.5.0
|
||||
h11==0.16.0
|
||||
heretic-llm==1.3.0
|
||||
hf-transfer==0.1.9
|
||||
hf-xet==1.5.0
|
||||
httpcore==1.0.9
|
||||
httpx==0.28.1
|
||||
huggingface-hub==1.14.0
|
||||
idna==3.11
|
||||
immutabledict==4.3.1
|
||||
jinja2==3.1.6
|
||||
joblib==1.5.3
|
||||
jsonlines==4.0.0
|
||||
kernels==0.14.0
|
||||
kernels-data==0.14.0
|
||||
langdetect==1.0.9
|
||||
lm-eval==0.4.11
|
||||
lxml==6.1.0
|
||||
mako==1.3.12
|
||||
markdown-it-py==4.0.0
|
||||
markupsafe==3.0.3
|
||||
mbstrdecoder==1.1.5
|
||||
mdurl==0.1.2
|
||||
more-itertools==11.0.2
|
||||
mpmath==1.3.0
|
||||
multiprocess==0.70.19
|
||||
networkx==3.6.1
|
||||
nltk==3.9.4
|
||||
numpy==2.4.4
|
||||
nvidia-cublas-cu12==12.8.4.1
|
||||
nvidia-cuda-cupti-cu12==12.8.90
|
||||
nvidia-cuda-nvrtc-cu12==12.8.93
|
||||
nvidia-cuda-runtime-cu12==12.8.90
|
||||
nvidia-cudnn-cu12==9.10.2.21
|
||||
nvidia-cufft-cu12==11.3.3.83
|
||||
nvidia-cufile-cu12==1.13.1.3
|
||||
nvidia-curand-cu12==10.3.9.90
|
||||
nvidia-cusolver-cu12==11.7.3.90
|
||||
nvidia-cusparse-cu12==12.5.8.93
|
||||
nvidia-cusparselt-cu12==0.7.1
|
||||
nvidia-nccl-cu12==2.27.3
|
||||
nvidia-nvjitlink-cu12==12.8.93
|
||||
nvidia-nvtx-cu12==12.8.90
|
||||
optuna==4.8.0
|
||||
packaging==25.0
|
||||
pandas==3.0.2
|
||||
pathvalidate==3.3.1
|
||||
peft==0.19.1
|
||||
pillow==12.1.1
|
||||
portalocker==3.2.0
|
||||
prompt-toolkit==3.0.52
|
||||
psutil==7.2.2
|
||||
py-cpuinfo==9.0.0
|
||||
pyarrow==24.0.0
|
||||
pydantic==2.12.5
|
||||
pydantic-core==2.41.5
|
||||
pydantic-settings==2.14.0
|
||||
pygments==2.19.2
|
||||
pytablewriter==1.2.1
|
||||
python-dateutil==2.9.0.post0
|
||||
python-dotenv==1.2.2
|
||||
pyyaml==6.0.3
|
||||
questionary==2.1.1
|
||||
regex==2026.4.4
|
||||
requests==2.32.5
|
||||
rich==14.3.3
|
||||
rouge-score==0.1.2
|
||||
sacrebleu==2.6.0
|
||||
safetensors==0.7.0
|
||||
scikit-learn==1.8.0
|
||||
scipy==1.17.1
|
||||
setuptools==80.10.2
|
||||
shellingham==1.5.4
|
||||
six==1.17.0
|
||||
sqlalchemy==2.0.49
|
||||
sqlitedict==2.1.0
|
||||
sympy==1.14.0
|
||||
tabledata==1.3.4
|
||||
tabulate==0.10.0
|
||||
tcolorpy==0.1.7
|
||||
threadpoolctl==3.6.0
|
||||
tokenizers==0.22.2
|
||||
tomli-w==1.2.0
|
||||
tomlkit==0.14.0
|
||||
torch==2.8.0
|
||||
torchvision==0.23.0
|
||||
tqdm==4.67.3
|
||||
transformers==5.8.0
|
||||
triton==3.4.0
|
||||
typepy==1.3.5
|
||||
typer==0.25.1
|
||||
typing-extensions==4.15.0
|
||||
typing-inspection==0.4.2
|
||||
tzdata==2025.3
|
||||
urllib3==2.5.0
|
||||
wcwidth==0.6.0
|
||||
word2number==1.1
|
||||
xxhash==3.7.0
|
||||
zstandard==0.25.0
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
|
||||
size 11421892
|
||||
30
tokenizer_config.json
Normal file
30
tokenizer_config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"backend": "tokenizers",
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"is_local": false,
|
||||
"local_files_only": false,
|
||||
"model_max_length": 32768,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
Reference in New Issue
Block a user