初始化项目,由ModelHub XC社区提供模型

Model: IQuestLab/IQuest-Coder-V1-14B-Thinking
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-29 13:20:53 +08:00
commit da749a0995
21 changed files with 614768 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
papers/iquest-coder-v1-logo.png filter=lfs diff=lfs merge=lfs -text
papers/results.png filter=lfs diff=lfs merge=lfs -text
papers/results-20260302.png filter=lfs diff=lfs merge=lfs -text

26
LICENSE Normal file
View File

@@ -0,0 +1,26 @@
Modified MIT License
Software Copyright© 2025 IQuest Research
Our only modification is that, if the Software (or any derivative works
thereof) is used for any of your commercial products or services, you shall
prominently display "IQuest Coder" on the user interface of such product or
service.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

281
README.md Normal file
View File

@@ -0,0 +1,281 @@
---
license: other
license_name: iquestcoder
license_link: >-
https://huggingface.co/IQuestLab/IQuest-Coder-V1-14B-Thinking/blob/main/LICENSE
language:
- en
library_name: transformers
---
![Evaluation Results](./papers/iquest-coder-v1-logo.png)
<p align="center">
📘 <a href="https://iquestlab.github.io">Blog (2026-01-01)</a >
&nbsp;&nbsp;
📘 <a href="https://iquestlab.github.io/release-1.0-2603/index.html">Blog (2026-03-02)</a >
&nbsp;&nbsp;
📄 <a href="https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf">Technical Report</a >
</p >
# IQuest-Coder-V1 Model Family Update
🚀🚀🚀 [IQuest-Coder-V1 Model Family Update](https://iquestlab.github.io/release-1.0-2603/index.html): Released 7B & 14B Family Models, 40B-Thinking and 40B-Loop-Thinking, specially optimized for tool use, CLI agents (Like `Claude Code` and `OpenCode`) & HTML/SVG generation, all with 128K context, now on Hugging Face!
## 7B Models
| Model | Link |
|-------|------|
| IQuest-Coder-V1-7B-Base-Stage1 | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-7B-Base-Stage1) |
| IQuest-Coder-V1-7B-Base | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-7B-Base) |
| IQuest-Coder-V1-7B-Instruct | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-7B-Instruct) |
| IQuest-Coder-V1-7B-Thinking | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-7B-Thinking) |
## 14B Models
| Model | Link |
|-------|------|
| IQuest-Coder-V1-14B-Base-Stage1 | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-14B-Base-Stage1) |
| IQuest-Coder-V1-14B-Base | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-14B-Base) |
| IQuest-Coder-V1-14B-Instruct | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-14B-Instruct) |
| IQuest-Coder-V1-14B-Thinking | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-14B-Thinking) |
## 40B Models
| Model | Link |
|-------|------|
| IQuest-Coder-V1-40B-Base-Stage1 | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Base-Stage1) |
| IQuest-Coder-V1-40B-Base | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Base) |
| IQuest-Coder-V1-40B-Instruct | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Instruct) |
| IQuest-Coder-V1-40B-Loop-Instruct | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct) |
| IQuest-Coder-V1-40B-Thinking | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Thinking) |
| IQuest-Coder-V1-40B-Loop-Thinking | [🤗 Hugging Face](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Thinking) |
## Sampling Parameters:
For the IQuest-Coder-V1-Instruct: We suggest using Temperature=0.6, TopP=0.85, TopK=20.
For the IQuest-Coder-V1-Thinking: We suggest using Temperature=1.0, TopP=0.95, TopK=20.
## IQuest-Coder-V1 Highlights
IQuest-Coder-V1 is a new family of code large language models (LLMs) designed to advance autonomous software engineering and code intelligence. Built on the innovative code-flow multi-stage training paradigm, IQuest-Coder-V1 captures the dynamic evolution of software logic, delivering state-of-the-art performance across critical dimensions:
- **Performance**: Achieves leading results on SWE-Bench Verified (76.2%), BigCodeBench (49.9%), LiveCodeBench v6 (81.1%), and other major coding benchmarks, surpassing competitive models across agentic software engineering, competitive programming, and complex tool use.
- **Code-Flow Training Paradigm**: Moving beyond static code representations, our models learn from repository evolution patterns, commit transitions, and dynamic code transformations to understand real-world software development processes.
- **Dual Specialization Paths**: Bifurcated post-training delivers two specialized variants—Thinking models (utilizing reasoning-driven RL for complex problem-solving) and Instruct models (optimized for general coding assistance and instruction-following).
- **Efficient Architecture**: The IQuest-Coder-V1-Loop variant introduces a recurrent mechanism that optimizes the trade-off between model capacity and deployment footprint. The 7B and 14B models adopt shallow architectures for faster inference speed.
- **Native Long Context**: All models natively support up to 128K tokens without requiring additional scaling techniques.
- **CLI Agent Integration**: Demonstrates initial deployment capabilities on ClaudeCode and OpenCode platforms, with the ability to integrate into CLI-based agent workflows.
- **HTML and SVG Generation**: Features preliminary support for HTML and SVG code generation.
- **Architectural Chain-of-Thought via Recurrent Depth**: 40B-Loop-Thinking is a research-oriented, experimental model prototype designed to explore how structural chains of thought and procedural chains of thought can be combined within a single system. The model uniquely integrates structural chains of thought—realized through loop-based computation enabled by the dual-iteration LoopCoder architecture—with procedural chains of thought derived from explicit reasoning trajectories trained via reinforcement learning. Unlike standard reasoning models that rely solely on token-level chain-of-thought expansion, Loop-Thinking introduces implicit multi-step computation at the architectural level through a looped Transformer design. In this design, the second iteration refines the hidden states produced by the first iteration using a globallocal attention gating mechanism. This results in a nested reasoning mechanism: the loop structure supports iterative representation refinement, while the reasoning-oriented training paradigm injects explicit problem decomposition behavior. It is important to note that this model is not intended to achieve state-of-the-art performance across benchmarks, but rather to validate the complementary roles of loop-based computation and reasoning-oriented training in shaping reasoning structures, and to provide experimental evidence for future model design.
## Model Overview
The IQuest-Coder-V1 series includes models ranging from 7B to 40B parameters, with both standard and Loop variants:
| Model | Parameters | Layers | Hidden Size | Attention Heads (Q/KV) | Context Length |
|-------|------------|--------|-------------|------------------------|----------------|
| IQuest-Coder-V1-7B-Instruct | 7B | 14 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-7B-Thinking | 7B | 14 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-14B-Instruct | 14B | 28 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-14B-Thinking | 14B | 28 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Instruct | 40B | 80 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Thinking | 40B | 80 | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Loop-Instruct | 40B | 80 (2 iterations) | 5120 | 40/8 | 128K |
| IQuest-Coder-V1-40B-Loop-Thinking | 40B | 80 (2 iterations) | 5120 | 40/8 | 128K |
**Architecture Features:**
- Grouped Query Attention (GQA) for efficient inference
- Native 128K context length support
- Vocabulary size: 76,800 tokens
- Loop variants use recurrent transformer design with shared parameters across two iterations
For more details, please refer to our Technical Report, GitHub.
## Quickstart
IQuest-Coder-V1 uses custom modeling code via Hugging Face's auto_map feature. We recommend using transformers>=4.52.4.
### Basic Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "IQuest/IQuest-Coder-V1-40B-Instruct"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the input
prompt = "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate response
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
generated_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
response = tokenizer.decode(generated_ids, skip_special_tokens=True)
print(response)
```
### Using Thinking Models
For complex reasoning tasks, use the Thinking variant:
```python
model_name = "IQuestLab/IQuest-Coder-V1-40B-Thinking"
# The Thinking model includes explicit reasoning traces
# Use similar code as above, but expect longer, more detailed responses
# with step-by-step problem decomposition
```
### Deployment with vLLM
For production deployment, you can use vLLM to create an OpenAI-compatible API endpoint. Please refer to the [vLLM PR](https://github.com/vllm-project/vllm/pull/31575/files) for implementation details.
```bash
vllm serve IQuestLab/IQuest-Coder-V1-40B-Instruct --tensor-parallel-size 8
```
For Thinking models with reasoning support:
```bash
vllm serve IQuestLab/IQuest-Coder-V1-40B-Thinking --reasoning-parser qwen3 --tensor-parallel-size 8
```
When using tool, `IQuest-Coder-V1-40B-Instruct` and `IQuest-Coder-V1-40B-Loop-Instruct` should use `--tool-parser qwen3`, while `IQuest-Coder-V1-7B-Instruct`, `IQuest-Coder-V1-7B-Thinking`, `IQuest-Coder-V1-14B-Instruct`, `IQuest-Coder-V1-14B-Thinking`, `IQuest-Coder-V1-40B-Thinking` and `IQuest-Coder-V1-40B-Loop-Thinking` should use `--tool-parser qwen3_coder`.
### CLI-Like Agents and Tools Usage
CLI-like agent capabilities are available for the following models: `IQuest-Coder-V1-7B-Instruct`, `IQuest-Coder-V1-7B-Thinking`, `IQuest-Coder-V1-14B-Instruct`, `IQuest-Coder-V1-14B-Thinking`, `IQuest-Coder-V1-40B-Thinking` and `IQuest-Coder-V1-40B-Loop-Thinking`.
**Step 1:** Deploy the model with vLLM and set tool parser (**Attention: Do not set reasoning parser for Instruct LLMs, otherwise it will cause unexpected errors**):
```bash
vllm serve IQuestLab/IQuest-Coder-V1-7B-Instruct --tool-parser qwen3_coder
```
or
```bash
vllm serve IQuestLab/IQuest-Coder-V1-7B-Thinking --tool-parser qwen3_coder --reasoning-parser qwen3
```
**Step 2:** Use Claude Code to enjoy it:
```bash
export ANTHROPIC_BASE_URL="http://iquestcoder.link"
export ANTHROPIC_AUTH_TOKEN="sk-iquestcoder"
claude --model IQuestCoder-V1-7B-Instruct
```
## Evaluation Results
![Evaluation Results](./papers/results-20260302.png)
![Evaluation Results](./papers/results.png)
### Benchmark Parameters
| Benchmark | Temperature | Top_p |
| :--- | :--- | :--- |
| **Evalplus-HumanEval** | 0.0 | - |
| **Evalplus-MBPP** | 0.0 | - |
| **BigCodeBench** | 0.0 | - |
| **FullStackBench** | 0.0 | - |
| **CruxEval** | 0.0 | - |
| **LiveCodeBench** | 0.6 | 0.95 |
| **Aider-Polyglot** | 0.95 | 0.85 |
| **Mercury** | 0.2 | 0.85 |
| **Bird** | 0.2 | 0.95 |
| **Spider** | 0.2 | 0.95 |
| **Terminal-Bench** | 0.0 | - |
| **Terminal-Bench (2.0)** | 0.7 | 1.0 |
| **SWE-Verified** | 0.0 | - |
| **BFCL V3** | 0.01 | 0.85 |
| **Mind2Web** | 0.0 | - |
### SWE-Bench Verified Evaluation
We provide the evaluation framework and trajectory data for reproducing our SWE-Bench Verified results in `IQuest-Coder-Eval/SWE-Verified/`.
The evaluation framework is based on [R2E-Gym](https://github.com/R2E-Gym/R2E-Gym). To reproduce the evaluation:
```bash
cd IQuest-Coder-Eval/SWE-Verified/R2E-Gym
# Install dependencies
pip install -e .
# Run evaluation
bash benchmark/bench/loopcoder/loopcoder.sh
```
The trajectory file `./IQuest-Coder-Eval/SWE-Verified/traj.zip` contains the complete agent trajectories for our SWE-Bench Verified evaluation.
## Limitations
- **Research Prototype**: The current models are designed for research purposes. Real-world user experience may differ from state-of-the-art commercial models, with weaker instruction-following capabilities in certain scenarios.
- **Long-Context Management**: Due to parameter size constraints, performance on long-horizon tasks and multi-turn tool invocations is limited, particularly in scenarios requiring sustained context management and complex agentic workflows.
- **Reasoning vs. Efficiency Trade-off**: Thinking models provide superior reasoning but generate longer responses; Instruct models are more efficient for straightforward tasks.
- **Code Execution**: Models generate code but do not execute it; always validate outputs in sandboxed environments.
- **Domain Specificity**: While trained on diverse codebases, performance may vary on highly specialized or proprietary frameworks.
- **Factuality**: Models may generate plausible but incorrect code; verify critical implementations thoroughly.
## Citation
If you find our work helpful, please cite:
```bibtex
@article{iquest-coder-v1-2025,
title={IQuest-Coder-V1 Technical Report},
author={IQuest Coder Team},
url={https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf}
year={2025}
}
@article{codescaling,
title={Scaling Laws for Code: Every Programming Language Matters},
author={Yang, Jian and Guo, Shawn and Jing, Lin and Zhang, Wei and Liu, Aishan and Hao, Chuan and Li, Zhoujun and Zhao, Wayne Xin and Liu, Xianglong and Lv, Weifeng and others},
journal={arXiv preprint arXiv:2512.13472},
year={2025}
}
@article{close_the_loop,
title={Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing},
author={Yuwen Li, Wei Zhang, Zelong Huang, Mason Yang, Jiajun Wu, Shawn Guo, Huahao Hu, Lingyi Sun, Jian Yang, Mingjie Tang, Byran Dai},
journal={arXiv preprint arXiv:2512.23611},
year={2025}
}
@article{loopcoder,
title={LoopCoder: Scaling Code Intelligence via Looped Language Models},
author={Jian Yang, Wei Zhang, Shawn Guo, Yizhi Li, Lin Jing, Zhengmao Ye, Shark Liu, Yuyang Song, Jiajun Wu, Che Liu, T. Zheng, Siwei Wu, L. Liao, X. Ma, Chuan Hao, Ran Tao, Yan Xing, Jianzhou Wang, Mingjie Tang, Aishan Liu, Zhoujun Li, Xianglong Liu, Weifeng Lv1, Bryan Dai},
year={2025}
}
@article{swe_compress,
title={Context as a Tool: Context Management for Long-Horizon SWE-Agents},
author={hukai Liu, Jian Yang, Bo Jiang, Yizhi Li, Jinyang Guo, Xianglong Liu, Bryan Dai},
journal={arXiv preprint arXiv:2512.22087},
year={2025}
}
```

41
config.json Normal file
View File

@@ -0,0 +1,41 @@
{
"architectures": [
"IQuestCoderForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": [2, 75864, 75869],
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 27648,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "iquestcoder",
"num_attention_heads": 40,
"num_hidden_layers": 28,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.55.4",
"use_cache": true,
"vocab_size": 76800,
"clip_qkv": null,
"use_sliding_window": false,
"sliding_window": null,
"max_window_layers": 0,
"auto_map": {
"AutoConfig": "configuration_iquestcoder.IQuestCoderConfig",
"AutoModel": "modeling_iquestcoder.IQuestCoderModel",
"AutoModelForCausalLM": "modeling_iquestcoder.IQuestCoderForCausalLM",
"AutoModelForSequenceClassification": "modeling_iquestcoder.IQuestCoderForSequenceClassification",
"AutoModelForTokenClassification": "modeling_iquestcoder.IQuestCoderForTokenClassification",
"AutoModelForQuestionAnswering": "modeling_iquestcoder.IQuestCoderForQuestionAnswering"
}
}

View File

@@ -0,0 +1,182 @@
"""IQuestCoder model configuration."""
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import logging
logger = logging.get_logger(__name__)
class IQuestCoderConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`IQuestCoderModel`]. It is used to instantiate
an IQuestCoder model according to the specified arguments, defining the model architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 76800):
Vocabulary size of the IQuestCoder model. Defines the number of different tokens that can be represented
by the `inputs_ids` passed when calling [`IQuestCoderModel`].
hidden_size (`int`, *optional*, defaults to 5120):
Dimension of the hidden representations.
intermediate_size (`int`, *optional*, defaults to 27648):
Dimension of the MLP representations.
num_hidden_layers (`int`, *optional*, defaults to 80):
Number of hidden layers in the Transformer decoder.
num_attention_heads (`int`, *optional*, defaults to 40):
Number of attention heads for each attention layer in the Transformer decoder.
num_key_value_heads (`int`, *optional*, defaults to 8):
This is the number of key_value heads that should be used to implement Grouped Query Attention (GQA).
If `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA).
If `num_key_value_heads=1`, the model will use Multi Query Attention (MQA).
head_dim (`int`, *optional*, defaults to 128):
The dimension of each attention head. If not specified, defaults to `hidden_size // num_attention_heads`.
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to 16384):
The maximum sequence length that this model might ever be used with.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
pad_token_id (`int`, *optional*):
Padding token id.
bos_token_id (`int`, *optional*, defaults to 1):
Beginning of stream token id.
eos_token_id (`int`, *optional*, defaults to 2):
End of stream token id.
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether to tie weight embeddings.
rope_theta (`float`, *optional*, defaults to 500000.0):
The base period of the RoPE embeddings.
rope_scaling (`Dict`, *optional*):
Dictionary containing the scaling configuration for the RoPE embeddings. Supports various RoPE scaling
types including "linear", "dynamic", "yarn", "longrope", etc.
attention_bias (`bool`, *optional*, defaults to `False`):
Whether to use a bias in the query, key, value and output projection layers during self-attention.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
mlp_bias (`bool`, *optional*, defaults to `False`):
Whether to use a bias in up_proj, down_proj and gate_proj layers in the MLP layers.
clip_qkv (`float`, *optional*):
If set, clip the query, key, and value tensors to this value. Borrowed from OLMo for training stability.
use_sliding_window (`bool`, *optional*, defaults to `False`):
Whether to use sliding window attention. Borrowed from Qwen2.
sliding_window (`int`, *optional*):
The sliding window size. Only effective when `use_sliding_window=True`.
max_window_layers (`int`, *optional*, defaults to 0):
The number of layers that don't use sliding window attention. Borrowed from Qwen2.
Example:
```python
>>> from configuration_iquestcoder import IQuestCoderConfig
>>> from modeling_iquestcoder import IQuestCoderModel
>>> # Initializing a IQuestCoder configuration
>>> configuration = IQuestCoderConfig()
>>> # Initializing a model from the configuration
>>> model = IQuestCoderModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```
"""
model_type = "iquestcoder"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=76800,
hidden_size=5120,
intermediate_size=27648,
num_hidden_layers=80,
num_attention_heads=40,
num_key_value_heads=8,
head_dim=128,
hidden_act="silu",
max_position_embeddings=16384,
initializer_range=0.02,
rms_norm_eps=1e-5,
use_cache=True,
pad_token_id=None,
bos_token_id=1,
eos_token_id=2,
tie_word_embeddings=False,
rope_theta=500000.0,
rope_scaling=None,
attention_bias=False,
attention_dropout=0.0,
mlp_bias=False,
# IQuestCoder specific (borrowed from OLMo)
clip_qkv=None,
# IQuestCoder specific (borrowed from Qwen2)
use_sliding_window=False,
sliding_window=None,
max_window_layers=0,
**kwargs,
):
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.num_key_value_heads = num_key_value_heads
self.head_dim = head_dim
self.hidden_act = hidden_act
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache
self.rope_theta = rope_theta
self.rope_scaling = rope_scaling
self.attention_bias = attention_bias
self.attention_dropout = attention_dropout
self.mlp_bias = mlp_bias
# IQuestCoder specific
self.clip_qkv = clip_qkv
self.use_sliding_window = use_sliding_window
self.sliding_window = sliding_window
self.max_window_layers = max_window_layers
# Validate rope_scaling configuration
self._rope_scaling_validation()
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
tie_word_embeddings=tie_word_embeddings,
**kwargs,
)
def _rope_scaling_validation(self):
"""Validate the `rope_scaling` configuration."""
if self.rope_scaling is None:
return
if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) < 1:
raise ValueError(
"`rope_scaling` must be a dictionary with a minimum of one field, `type` or `rope_type`."
)
rope_scaling_type = self.rope_scaling.get("type", None) or self.rope_scaling.get("rope_type", None)
if rope_scaling_type is None:
raise ValueError(
"`rope_scaling` must have a `type` or `rope_type` field."
)
valid_rope_types = ["linear", "dynamic", "yarn", "longrope", "llama3"]
if rope_scaling_type not in valid_rope_types:
raise ValueError(
f"`rope_scaling`'s type field must be one of {valid_rope_types}, got {rope_scaling_type}"
)
__all__ = ["IQuestCoderConfig"]

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": [2, 75864, 75869],
"transformers_version": "4.55.4"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2cca863fca7fd097d47526209d663bafc7a4c1c5d9d2e7419567d58c05bf34ba
size 4813050528

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:db13525598d929840160929fd02b83f2bdd0fd21d14f705890c6cc7beb3262f5
size 4875986024

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2027ab306830d3773b3868d719838c290f063913e15ab3f98d02e2e59b358763
size 4875986056

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:85e39c93a1ed93848367f97de25317870c2e2302effd418adf31577d91bd32c7
size 4875986072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f8b1a1588b05018245ffb707931f9db2ddb89eb71bdbcc2b8250912c5b61c5ae
size 4875986072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0d813c67f5206643d8091581e8a587c48fa65de82ebd7718085b2dfdae0af1c6
size 4561401704

View File

@@ -0,0 +1,263 @@
{
"metadata": {
"total_parameters": 14439183360,
"total_size": 28878366720
},
"weight_map": {
"lm_head.weight": "model-00006-of-00006.safetensors",
"model.embed_tokens.weight": "model-00001-of-00006.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.norm.weight": "model-00006-of-00006.safetensors"
}
}

1063
modeling_iquestcoder.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2fad84bb195ec8628191705c8c3f21ec21dc6dee27b72884e5342b3ffa0a0c0f
size 122764

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ca6bc03abac8e7633a0fe0755350bc5dac448b705303f33408efde5fb03bc146
size 1238195

3
papers/results.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b4b14ddc9c3fdfa2eed779e393a4ed67754255863b5d7f62641a4a04fcf2d462
size 453412

552
tokenization_iquestcoder.py Normal file
View File

@@ -0,0 +1,552 @@
"""Tokenization classes for IQuestCoder."""
import os
from shutil import copyfile
from typing import Any, Dict, List, Optional, Tuple, Union
import sentencepiece as spm
from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
from transformers.utils import logging
logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {},
"tokenizer_file": {},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
class IQuestCoderTokenizer(PreTrainedTokenizer):
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
def __init__(
self,
vocab_file,
unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
pad_token=None,
sp_model_kwargs: Optional[Dict[str, Any]] = None,
add_bos_token=True,
add_eos_token=False,
clean_up_tokenization_spaces=False,
add_prefix_space=False,
legacy=None,
use_default_system_prompt=False,
chat_template=None,
**kwargs,
):
self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token
eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token
pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
# Legacy behavior handling
if legacy is None:
logger.warning_once(
f"You are using the default legacy behaviour of the {self.__class__.__name__}. This is"
" expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you."
" If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it"
" means, and thoroughly read the reason why this was added as explained in"
" https://github.com/huggingface/transformers/pull/24565"
)
legacy = True
self.legacy = legacy
self.vocab_file = vocab_file
self.add_bos_token = add_bos_token
self.add_eos_token = add_eos_token
self.add_prefix_space = add_prefix_space
self.use_default_system_prompt = use_default_system_prompt
self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
self.sp_model.Load(vocab_file)
super().__init__(
bos_token=bos_token,
eos_token=eos_token,
unk_token=unk_token,
pad_token=pad_token,
add_bos_token=add_bos_token,
add_eos_token=add_eos_token,
sp_model_kwargs=self.sp_model_kwargs,
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
add_prefix_space=add_prefix_space,
legacy=legacy,
use_default_system_prompt=use_default_system_prompt,
chat_template=chat_template,
**kwargs,
)
def __getstate__(self):
state = self.__dict__.copy()
state["sp_model"] = None
return state
def __setstate__(self, d):
self.__dict__ = d
self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
self.sp_model.Load(self.vocab_file)
@property
def vocab_size(self) -> int:
"""Returns the vocabulary size."""
return self.sp_model.get_piece_size()
def get_vocab(self) -> Dict[str, int]:
"""Returns the vocabulary as a dictionary of token to index."""
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
vocab.update(self.added_tokens_encoder)
return vocab
def _tokenize(self, text: str) -> List[str]:
"""
Tokenize a string.
Args:
text (`str`): The text to tokenize.
Returns:
`List[str]`: The list of tokens.
"""
if self.add_prefix_space:
text = " " + text
if self.legacy:
return self.sp_model.encode(text, out_type=str)
# Non-legacy behavior: handle special tokens properly
return self.sp_model.encode(text, out_type=str)
def _convert_token_to_id(self, token: str) -> int:
"""Converts a token (str) to an id using the vocab."""
return self.sp_model.piece_to_id(token)
def _convert_id_to_token(self, index: int) -> str:
"""Converts an index (integer) to a token (str) using the vocab."""
token = self.sp_model.IdToPiece(index)
return token
def convert_tokens_to_string(self, tokens: List[str]) -> str:
"""
Converts a sequence of tokens (strings) to a single string.
This method handles special tokens separately to ensure they are not
decoded using the SentencePiece model.
Args:
tokens (`List[str]`): The list of tokens to convert.
Returns:
`str`: The decoded string.
"""
current_sub_tokens = []
out_string = ""
prev_is_special = False
for i, token in enumerate(tokens):
# make sure that special tokens are not decoded using sentencepiece model
if token in self.all_special_tokens:
if not prev_is_special and i != 0:
out_string += " "
out_string += self.sp_model.decode(current_sub_tokens) + token
prev_is_special = True
current_sub_tokens = []
else:
current_sub_tokens.append(token)
prev_is_special = False
out_string += self.sp_model.decode(current_sub_tokens)
return out_string
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
"""
Save the vocabulary and special tokens file to a directory.
Args:
save_directory (`str`):
The directory in which to save the vocabulary.
filename_prefix (`str`, *optional*):
An optional prefix to add to the named of the saved files.
Returns:
`Tuple(str)`: Paths to the files saved.
"""
if not os.path.isdir(save_directory):
logger.error(f"Vocabulary path ({save_directory}) should be a directory")
return
out_vocab_file = os.path.join(
save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
)
if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
copyfile(self.vocab_file, out_vocab_file)
elif not os.path.isfile(self.vocab_file):
with open(out_vocab_file, "wb") as fi:
content_spiece_model = self.sp_model.serialized_model_proto()
fi.write(content_spiece_model)
return (out_vocab_file,)
def build_inputs_with_special_tokens(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""
Build model inputs from a sequence or a pair of sequences for sequence classification tasks by concatenating
and adding special tokens.
An IQuestCoder sequence has the following format:
- single sequence: `<s> X </s>` (if add_eos_token is True) or `<s> X` (default)
- pair of sequences: `<s> A </s> <s> B </s>` (if add_eos_token is True) or `<s> A <s> B` (default)
Args:
token_ids_0 (`List[int]`):
List of IDs to which the special tokens will be added.
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
Returns:
`List[int]`: List of input IDs with the appropriate special tokens.
"""
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = bos_token_id + token_ids_0 + eos_token_id
if token_ids_1 is not None:
output = output + bos_token_id + token_ids_1 + eos_token_id
return output
def get_special_tokens_mask(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None,
already_has_special_tokens: bool = False
) -> List[int]:
"""
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer `prepare_for_model` method.
Args:
token_ids_0 (`List[int]`):
List of IDs.
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""
if already_has_special_tokens:
return super().get_special_tokens_mask(
token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
)
bos_token_id = [1] if self.add_bos_token else []
eos_token_id = [1] if self.add_eos_token else []
if token_ids_1 is None:
return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
return (
bos_token_id
+ ([0] * len(token_ids_0))
+ eos_token_id
+ bos_token_id
+ ([0] * len(token_ids_1))
+ eos_token_id
)
def create_token_type_ids_from_sequences(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""
Create a mask from the two sequences passed to be used in a sequence-pair classification task.
An IQuestCoder sequence pair mask has the following format:
```
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence | second sequence |
```
If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).
Args:
token_ids_0 (`List[int]`):
List of IDs.
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
Returns:
`List[int]`: List of token type IDs according to the given sequence(s).
"""
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
if token_ids_1 is not None:
output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
return output
@property
def default_chat_template(self) -> str:
"""
Returns the default chat template for IQuestCoder.
This template formats conversations with system, user, and assistant roles.
"""
return DEFAULT_CHAT_TEMPLATE
def apply_chat_template(
self,
conversation: Union[List[Dict[str, str]], "Conversation"],
chat_template: Optional[str] = None,
add_generation_prompt: bool = False,
tokenize: bool = True,
padding: bool = False,
truncation: bool = False,
max_length: Optional[int] = None,
return_tensors: Optional[str] = None,
return_dict: bool = False,
**tokenizer_kwargs,
):
"""
Apply a chat template to format a conversation.
Args:
conversation (`List[Dict[str, str]]` or `Conversation`):
A list of dicts with "role" and "content" keys, representing the conversation history.
chat_template (`str`, *optional*):
A Jinja template to use for formatting. If not provided, the tokenizer's default will be used.
add_generation_prompt (`bool`, *optional*, defaults to `False`):
Whether to add a generation prompt at the end for the assistant to continue.
tokenize (`bool`, *optional*, defaults to `True`):
Whether to tokenize the output. If `False`, returns a string.
padding (`bool`, *optional*, defaults to `False`):
Whether to pad sequences.
truncation (`bool`, *optional*, defaults to `False`):
Whether to truncate sequences.
max_length (`int`, *optional*):
Maximum length of the output.
return_tensors (`str`, *optional*):
The type of tensors to return ("pt", "tf", "np", or None).
return_dict (`bool`, *optional*, defaults to `False`):
Whether to return a dictionary with additional information.
**tokenizer_kwargs:
Additional keyword arguments passed to the tokenizer.
Returns:
`Union[str, List[int], BatchEncoding]`: The formatted (and optionally tokenized) conversation.
Example:
```python
>>> tokenizer = IQuestCoderTokenizer.from_pretrained("path/to/model")
>>> conversation = [
... {"role": "system", "content": "You are a helpful assistant."},
... {"role": "user", "content": "Hello!"},
... {"role": "assistant", "content": "Hi there! How can I help you today?"},
... {"role": "user", "content": "What's the weather like?"},
... ]
>>> tokenizer.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
'<|system|>\\nYou are a helpful assistant.\\n</|system|><|user|>\\nHello!\\n</|user|>...'
```
"""
# Use parent class implementation with our template
return super().apply_chat_template(
conversation,
chat_template=chat_template,
add_generation_prompt=add_generation_prompt,
tokenize=tokenize,
padding=padding,
truncation=truncation,
max_length=max_length,
return_tensors=return_tensors,
return_dict=return_dict,
**tokenizer_kwargs,
)
# Try to import and create Fast tokenizer version
try:
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer, decoders, models, normalizers, pre_tokenizers, processors
class IQuestCoderTokenizerFast(PreTrainedTokenizerFast):
"""
Construct a "fast" IQuestCoder tokenizer (backed by HuggingFace's *tokenizers* library).
This is a fast implementation of [`IQuestCoderTokenizer`] using the 🤗 Tokenizers library.
Args:
vocab_file (`str`, *optional*):
Path to the vocabulary file (SentencePiece model).
tokenizer_file (`str`, *optional*):
Path to a tokenizer JSON file.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token.
bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token.
pad_token (`str`, *optional*):
The token used for padding.
add_bos_token (`bool`, *optional*, defaults to `True`):
Whether to add a BOS token at the start of sequences.
add_eos_token (`bool`, *optional*, defaults to `False`):
Whether to add an EOS token at the end of sequences.
add_prefix_space (`bool`, *optional*, defaults to `False`):
Whether to add an initial space to the input.
use_default_system_prompt (`bool`, *optional*, defaults to `False`):
Whether to use the default system prompt.
chat_template (`str`, *optional*):
A Jinja template for formatting conversations.
Example:
```python
>>> from tokenization_iquestcoder import IQuestCoderTokenizerFast
>>> tokenizer = IQuestCoderTokenizerFast.from_pretrained("path/to/model")
>>> tokenizer.encode("Hello, world!")
[1, 15043, 29892, 3186, 29991]
```
"""
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
slow_tokenizer_class = IQuestCoderTokenizer
def __init__(
self,
vocab_file=None,
tokenizer_file=None,
unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
pad_token=None,
add_bos_token=True,
add_eos_token=False,
add_prefix_space=False,
use_default_system_prompt=False,
chat_template=None,
**kwargs,
):
self.add_bos_token = add_bos_token
self.add_eos_token = add_eos_token
self.add_prefix_space = add_prefix_space
self.use_default_system_prompt = use_default_system_prompt
if chat_template is None:
chat_template = DEFAULT_CHAT_TEMPLATE
super().__init__(
vocab_file=vocab_file,
tokenizer_file=tokenizer_file,
unk_token=unk_token,
bos_token=bos_token,
eos_token=eos_token,
pad_token=pad_token,
add_bos_token=add_bos_token,
add_eos_token=add_eos_token,
add_prefix_space=add_prefix_space,
use_default_system_prompt=use_default_system_prompt,
chat_template=chat_template,
**kwargs,
)
@property
def can_save_slow_tokenizer(self) -> bool:
return os.path.isfile(self.vocab_file) if self.vocab_file else False
@property
def default_chat_template(self) -> str:
"""Returns the default chat template."""
return DEFAULT_CHAT_TEMPLATE
def build_inputs_with_special_tokens(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""Build model inputs with special tokens."""
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = bos_token_id + token_ids_0 + eos_token_id
if token_ids_1 is not None:
output = output + bos_token_id + token_ids_1 + eos_token_id
return output
def get_special_tokens_mask(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None,
already_has_special_tokens: bool = False
) -> List[int]:
"""Retrieve special tokens mask."""
if already_has_special_tokens:
return super().get_special_tokens_mask(
token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
)
bos_token_id = [1] if self.add_bos_token else []
eos_token_id = [1] if self.add_eos_token else []
if token_ids_1 is None:
return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
return (
bos_token_id
+ ([0] * len(token_ids_0))
+ eos_token_id
+ bos_token_id
+ ([0] * len(token_ids_1))
+ eos_token_id
)
def create_token_type_ids_from_sequences(
self,
token_ids_0: List[int],
token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""Create token type IDs from sequences."""
bos_token_id = [self.bos_token_id] if self.add_bos_token else []
eos_token_id = [self.eos_token_id] if self.add_eos_token else []
output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
if token_ids_1 is not None:
output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
return output
except ImportError:
# tokenizers library not available, Fast tokenizer not supported
IQuestCoderTokenizerFast = None
logger.info(
"The `tokenizers` library is not installed. "
"IQuestCoderTokenizerFast will not be available. "
"Install it with `pip install tokenizers`."
)

612046
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7d3be68e090a927f31e0e378d7599b15c206dd47e4a73933775a746cc9c1cd91
size 1345108

240
tokenizer_config.json Normal file

File diff suppressed because one or more lines are too long