初始化项目,由ModelHub XC社区提供模型
Model: NOSIBLE/prediction-v1.1-base Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.png filter=lfs diff=lfs merge=lfs -text
|
||||||
296
README.md
Normal file
296
README.md
Normal file
@@ -0,0 +1,296 @@
|
|||||||
|
---
|
||||||
|
library_name: transformers
|
||||||
|
license: apache-2.0
|
||||||
|
license_link: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/LICENSE
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
base_model:
|
||||||
|
- Qwen/Qwen3-0.6B-Base
|
||||||
|
tags:
|
||||||
|
- finance
|
||||||
|
- nlp
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
datasets:
|
||||||
|
- NOSIBLE/predictive
|
||||||
|
---
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://github.com/NosibleAI/nosible-py/blob/main/docs/_static/readme.png?raw=true"/>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
# Prediction v1.1 Base
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
- **v1.0.0:** Initial version
|
||||||
|
|
||||||
|
**prediction-v1.1-base** is a next-generation classification model designed to determine whether a short text snippet contains a prediction or not. For example,
|
||||||
|
|
||||||
|
Contains a prediction:
|
||||||
|
> Google will increase the development of data centers in 2026 by 25%.
|
||||||
|
|
||||||
|
Does not contain a prediction:
|
||||||
|
> Google increased the development of data centers in 2025 by 25%.
|
||||||
|
|
||||||
|
This model is fine-tuned from [**Qwen3-0.6B**](https://huggingface.co/Qwen/Qwen3-0.6B). Our training corpus consists of **100k** real-world search results sourced from the [**Nosible Search Feeds**](https://www.nosible.com/) product, giving the model broad exposure to real, noisy, and highly varied financial text as it appears on the web. You can find the open-source dataset [here](https://huggingface.co/datasets/NOSIBLE/prediction).
|
||||||
|
|
||||||
|
### Why this model matters
|
||||||
|
|
||||||
|
**1. Accurate detection of predictive statements**
|
||||||
|
Specialized training allows it to reliably distinguish predictive intent from descriptive or historical text, even when cues are subtle.
|
||||||
|
|
||||||
|
**2. Robust on real-world financial and web data**
|
||||||
|
Because it’s trained on noisy, naturally occurring search-feed content, it handles messy, unstructured inputs without heavy preprocessing.
|
||||||
|
|
||||||
|
**3. Scalable prediction mining**
|
||||||
|
Its lightweight architecture enables fast, large-scale extraction of forecasts, predictions, and estimates across massive text corpora.
|
||||||
|
|
||||||
|
### Performance overview
|
||||||
|
|
||||||
|
**prediction-v1.1-base** consistently outperforms larger state-of-the-art LLMs while being deployable at a much lower cost. We computed the validation set accuracy based on a sample of 1,000 points from our validation set.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://huggingface.co/NOSIBLE/prediction-v1.1-base/resolve/main/plots/accuracy.png"/>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://huggingface.co/NOSIBLE/prediction-v1.1-base/resolve/main/plots/results.png"/>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
Cost per 1M tokens for the LLMs was calculated as a weighted average of input and output token costs using a 10:1 ratio (10× input cost + 1× output cost, divided by 11), based on pricing from OpenRouter. This reflects the ratio between our prompt used to label our dataset.
|
||||||
|
|
||||||
|
For the NOSIBLE model, we conservatively used the cost of Qwen-8B on OpenRouter with a 100:1 ratio since the model produces a single output token when used as described in this guide. Despite this, our model is still the cheapest option.
|
||||||
|
|
||||||
|
## Class token mapping.
|
||||||
|
|
||||||
|
Because this is a classification model built off the [**Qwen3-0.6B**](https://huggingface.co/Qwen/Qwen3-0.6B), we mapped the `prediction` and `not-prediction` classes onto tokens. This is the mapping we chose.
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"prediction": "prediction", # Contains a prediction.
|
||||||
|
"not-prediction": "_prediction" # Does not contain a prediction.
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Strict Usage Requirements
|
||||||
|
|
||||||
|
> [!CAUTION]
|
||||||
|
> 1. **Disable Thinking:** You **must** set `enable_thinking=False` (or disable reasoning tokens).
|
||||||
|
> 2. **Exact System Prompt:** You **must** use the specific system prompt: `"Classify whether it contains a prediction or does not contain a prediction."`
|
||||||
|
> 3. **Constrain Output:** You **must** restrict generation to the valid labels (`["prediction", "_prediction"]`) using **grammars**, **regex**, or **guided decoding**.
|
||||||
|
> * **SGLang:** Use `regex="(prediction|_prediction)"` in the API call.
|
||||||
|
> * **vLLM:** Use `guided_choice=["prediction", "_prediction"]` in the API call.
|
||||||
|
> * **llama.cpp / GGUF:** Apply a GBNF grammar or regex to force selection from the list.
|
||||||
|
> * **OpenAI / Structured Outputs:** Use `response_format` or JSON Schema enforcement where supported.
|
||||||
|
>
|
||||||
|
> Deviating from these requirements will **severely** impact performance and reliability.
|
||||||
|
|
||||||
|
## Quickstart
|
||||||
|
|
||||||
|
Since this model was trained as a Causal LM using specific chat templates, you must use the `apply_chat_template` method with the specific system prompt used during training.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
model_id = "NOSIBLE/prediction-v1.1-base"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
device_map="auto",
|
||||||
|
trust_remote_code=True,
|
||||||
|
torch_dtype=torch.bfloat16
|
||||||
|
)
|
||||||
|
|
||||||
|
# Define the input text
|
||||||
|
text = "The company is expected to record a profit margin of more than 15% next quarter."
|
||||||
|
|
||||||
|
# 1. Structure the prompt exactly as used in training
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": "Classify whether it contains a prediction or does not contain a prediction."},
|
||||||
|
{"role": "user", "content": text},
|
||||||
|
]
|
||||||
|
|
||||||
|
# 2. Apply chat template
|
||||||
|
prompt = tokenizer.apply_chat_template(
|
||||||
|
messages,
|
||||||
|
tokenize=False,
|
||||||
|
add_generation_prompt=True,
|
||||||
|
enable_thinking=False # Must be set to false.
|
||||||
|
)
|
||||||
|
|
||||||
|
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
|
||||||
|
|
||||||
|
# 3. Generate the response (label)
|
||||||
|
# We limit max_new_tokens because only a single-word response is expected
|
||||||
|
outputs = model.generate(**inputs, max_new_tokens=1)
|
||||||
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
|
||||||
|
# The model echoes the system and user messages in its output, so we extract the new text
|
||||||
|
print(response.split("<|im_start|>assistant\n")[-1])
|
||||||
|
# Expected Output: prediction
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
For production deployment, you can use `sglang>=0.4.6.post1` `vllm>=0.8.5` to create an OpenAI compatible API endpoint.
|
||||||
|
|
||||||
|
The model is based on Qwen3-0.6B and as such can be deployed everywhere Qwen3-0.6B can. However, we recommend deploying using [**SGLang**](https://github.com/sgl-project/sglang).
|
||||||
|
|
||||||
|
**SGLang:**
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python -m sglang.launch_server --model-path Qwen/Qwen3-0.6B --reasoning-parser qwen3
|
||||||
|
```
|
||||||
|
|
||||||
|
Here is an example API call using an OpenAI compatible server to extract the probability for each label.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import math
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
# Initialize the client pointing to your vLLM server
|
||||||
|
client = OpenAI(
|
||||||
|
base_url="http://localhost:8000/v1", # Replace with your endpoint URL if remote
|
||||||
|
api_key="EMPTY"
|
||||||
|
)
|
||||||
|
|
||||||
|
model_id = "NOSIBLE/prediction-v1.1-base"
|
||||||
|
|
||||||
|
# Input text to classify
|
||||||
|
text = "The company is expected to record a profit margin of more than 15% next quarter."
|
||||||
|
|
||||||
|
# Define the classification labels
|
||||||
|
labels = ["prediction", "_prediction"]
|
||||||
|
|
||||||
|
# Prepare the conversation
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": "Classify whether it contains a prediction or does not contain a prediction."},
|
||||||
|
{"role": "user", "content": text},
|
||||||
|
]
|
||||||
|
|
||||||
|
# Make the API call
|
||||||
|
chat_completion = client.chat.completions.create(
|
||||||
|
model=model_id,
|
||||||
|
messages=messages,
|
||||||
|
temperature=0,
|
||||||
|
max_tokens=1,
|
||||||
|
stream=False,
|
||||||
|
logprobs=True, # Enable log probabilities to calculate confidence
|
||||||
|
top_logprobs=len(labels), # Ensure we capture logprobs for our choices
|
||||||
|
extra_body={
|
||||||
|
"chat_template_kwargs": {"enable_thinking": False}, # Must be set to false.
|
||||||
|
"regex": "(prediction|_prediction)",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
# Extract the response content
|
||||||
|
response_label = chat_completion.choices[0].message.content
|
||||||
|
|
||||||
|
# Extract the logprobs for the generated token to calculate confidence
|
||||||
|
first_token_logprobs = chat_completion.choices[0].logprobs.content[0].top_logprobs
|
||||||
|
|
||||||
|
print(f"--- Classification Results ---")
|
||||||
|
print(f"Input: {text}")
|
||||||
|
print(f"Predicted Label: {response_label}\n")
|
||||||
|
|
||||||
|
print("--- Label Confidence ---")
|
||||||
|
for lp in first_token_logprobs:
|
||||||
|
# Convert log probability to percentage
|
||||||
|
probability = math.exp(lp.logprob)
|
||||||
|
print(f"Token: '{lp.token}' | Probability: {probability:.2%}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Expected Output
|
||||||
|
|
||||||
|
```text
|
||||||
|
--- Classification Results ---
|
||||||
|
Input: The company is expected to record a profit margin of more than 15% next quarter.
|
||||||
|
Predicted Label: prediction
|
||||||
|
|
||||||
|
--- Label Confidence ---
|
||||||
|
Token: 'prediction' | Probability: 99.98%
|
||||||
|
Token: '_prediction' | Probability: 0.02%
|
||||||
|
```
|
||||||
|
|
||||||
|
Using the regex restricts the output to one of **prediction** or **_prediction**, although substring matches may still occur.
|
||||||
|
|
||||||
|
**Local Use:**
|
||||||
|
Applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers also support Qwen3 architectures.
|
||||||
|
|
||||||
|
**Legal Notice:** This model is a modification of the Qwen3-0.6B model. In compliance with the Apache 2.0 license, we retain all original copyright notices and provide this modification under the same license terms.
|
||||||
|
|
||||||
|
## Training Details
|
||||||
|
|
||||||
|
### Training Procedure
|
||||||
|
|
||||||
|
The model was fine-tuned using the Hugging Face `Trainer` with `bf16` precision.
|
||||||
|
|
||||||
|
#### Preprocessing
|
||||||
|
|
||||||
|
- **System Prompt:** Added to every example.
|
||||||
|
- **Masking:** The user prompt and system instructions were masked (labels set to -100) so the model only calculates loss on the assistant's response (the label).
|
||||||
|
- **Max Length:** 2048 tokens.
|
||||||
|
|
||||||
|
#### Training Hyperparameters
|
||||||
|
|
||||||
|
| Hyperparameter | Value |
|
||||||
|
| :--- | :--- |
|
||||||
|
| Learning Rate | 2e-5 |
|
||||||
|
| Scheduler | Cosine (Warmup ratio 0.03) |
|
||||||
|
| Batch Size | 64 |
|
||||||
|
| Epochs | 2 |
|
||||||
|
| Optimizer | AdamW Torch Fused |
|
||||||
|
| Precision | bfloat16 |
|
||||||
|
| NEFTune Noise Alpha | 5 |
|
||||||
|
| Weight Decay | 0.1 |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
While this model is optimized for efficiency and specific financial tasks, users should be aware of the following limitations:
|
||||||
|
|
||||||
|
* **Parameter Size (0.6B):** As a small language model, it lacks the deep reasoning capabilities of larger models (e.g., 7B or 70B parameters). It is designed for fast, specific classification tasks and may struggle with highly nuanced or ambiguous text that requires extensive world knowledge.
|
||||||
|
* **Context Window:** The model was trained with a maximum sequence length of **2048 tokens**. For analyzing long financial documents (like 10-K filings or earnings call transcripts), the text must be chunked or truncated before processing.
|
||||||
|
* **Domain Specificity:** The model is primarily fine-tuned on **financial contexts**. It is not suitable for detecting predictions in other contexts (e.g., social media posts).
|
||||||
|
* **Language Support:** The model is trained primarily on **English** financial data. Its performance on non-English text or multi-lingual financial reports is not guaranteed.
|
||||||
|
* **Factuality:** This model analyzes the *predictive nature* of the text provided; it does not verify the factual accuracy of financial figures, dates, or claims within that text.
|
||||||
|
|
||||||
|
We utilized the following papers and datasets during the research and development of this model:
|
||||||
|
|
||||||
|
**Papers:**
|
||||||
|
* **Qwen3:** *Qwen3 Technical Report*. [arXiv:2505.09388](https://arxiv.org/abs/2505.09388)
|
||||||
|
* **Qwen3 Guard:** *Qwen3Guard Technical Report*. [arXiv:2510.14276v1](https://arxiv.org/html/2510.14276v1)
|
||||||
|
|
||||||
|
**Datasets:**
|
||||||
|
* [**NOSIBLE Prediction:**](https://huggingface.co/datasets/NOSIBLE/prediction) Nosible Ltd.
|
||||||
|
|
||||||
|
## Disclaimer
|
||||||
|
|
||||||
|
* **Not Financial Advice:** The outputs of this model **should not be interpreted as financial advice, investment recommendations, or an endorsement** of any financial instrument or asset.
|
||||||
|
* **Limitations:** This model may not accurately classify predictions in highly complex, nuanced, or evolving financial contexts, including new market trends, highly specialized jargon, or sarcasm. Users are solely responsible for all decisions made based on its output.
|
||||||
|
* **Risk:** Financial markets are inherently volatile and risky. **Never make investment decisions based solely on the output of an AI model.** Always consult with a qualified financial professional.
|
||||||
|
* **Task specific:** This model only identifies whether a statement is prediction; it does not predict outcomes or evaluate likelihood.
|
||||||
|
|
||||||
|
## Team & Credits
|
||||||
|
|
||||||
|
This model was developed and maintained by the following team:
|
||||||
|
|
||||||
|
* [**Matthew Dicks**](https://www.linkedin.com/in/matthewdicks98/)
|
||||||
|
* [**Simon van Dyk**](https://www.linkedin.com/in/simon-van-dyk/)
|
||||||
|
* [**Gareth Warburton**](https://www.linkedin.com/in/garethwarburton/)
|
||||||
|
* [**Stuart Reid**](https://www.linkedin.com/in/stuartgordonreid/)
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model, please cite it as follows:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{nosible2025prediction,
|
||||||
|
author = {NOSIBLE},
|
||||||
|
title = {Prediction v1.1 Base},
|
||||||
|
year = {2025},
|
||||||
|
publisher = {Hugging Face},
|
||||||
|
journal = {Hugging Face Repository},
|
||||||
|
howpublished = {https://huggingface.co/NOSIBLE/prediction-v1.1-base}
|
||||||
|
}
|
||||||
|
```
|
||||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
{
|
||||||
|
"</think>": 151668,
|
||||||
|
"</tool_call>": 151658,
|
||||||
|
"</tool_response>": 151666,
|
||||||
|
"<think>": 151667,
|
||||||
|
"<tool_call>": 151657,
|
||||||
|
"<tool_response>": 151665,
|
||||||
|
"<|box_end|>": 151649,
|
||||||
|
"<|box_start|>": 151648,
|
||||||
|
"<|endoftext|>": 151643,
|
||||||
|
"<|file_sep|>": 151664,
|
||||||
|
"<|fim_middle|>": 151660,
|
||||||
|
"<|fim_pad|>": 151662,
|
||||||
|
"<|fim_prefix|>": 151659,
|
||||||
|
"<|fim_suffix|>": 151661,
|
||||||
|
"<|im_end|>": 151645,
|
||||||
|
"<|im_start|>": 151644,
|
||||||
|
"<|image_pad|>": 151655,
|
||||||
|
"<|object_ref_end|>": 151647,
|
||||||
|
"<|object_ref_start|>": 151646,
|
||||||
|
"<|quad_end|>": 151651,
|
||||||
|
"<|quad_start|>": 151650,
|
||||||
|
"<|repo_name|>": 151663,
|
||||||
|
"<|video_pad|>": 151656,
|
||||||
|
"<|vision_end|>": 151653,
|
||||||
|
"<|vision_pad|>": 151654,
|
||||||
|
"<|vision_start|>": 151652
|
||||||
|
}
|
||||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
{%- if tools %}
|
||||||
|
{{- '<|im_start|>system\n' }}
|
||||||
|
{%- if messages[0].role == 'system' %}
|
||||||
|
{{- messages[0].content + '\n\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||||
|
{%- for tool in tools %}
|
||||||
|
{{- "\n" }}
|
||||||
|
{{- tool | tojson }}
|
||||||
|
{%- endfor %}
|
||||||
|
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||||
|
{%- else %}
|
||||||
|
{%- if messages[0].role == 'system' %}
|
||||||
|
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||||
|
{%- for message in messages[::-1] %}
|
||||||
|
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||||
|
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||||
|
{%- set ns.multi_step_tool = false %}
|
||||||
|
{%- set ns.last_query_index = index %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- for message in messages %}
|
||||||
|
{%- if message.content is string %}
|
||||||
|
{%- set content = message.content %}
|
||||||
|
{%- else %}
|
||||||
|
{%- set content = '' %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||||
|
{%- elif message.role == "assistant" %}
|
||||||
|
{%- set reasoning_content = '' %}
|
||||||
|
{%- if message.reasoning_content is string %}
|
||||||
|
{%- set reasoning_content = message.reasoning_content %}
|
||||||
|
{%- else %}
|
||||||
|
{%- if '</think>' in content %}
|
||||||
|
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||||
|
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if loop.index0 > ns.last_query_index %}
|
||||||
|
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||||
|
{%- else %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- else %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message.tool_calls %}
|
||||||
|
{%- for tool_call in message.tool_calls %}
|
||||||
|
{%- if (loop.first and content) or (not loop.first) %}
|
||||||
|
{{- '\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if tool_call.function %}
|
||||||
|
{%- set tool_call = tool_call.function %}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '<tool_call>\n{"name": "' }}
|
||||||
|
{{- tool_call.name }}
|
||||||
|
{{- '", "arguments": ' }}
|
||||||
|
{%- if tool_call.arguments is string %}
|
||||||
|
{{- tool_call.arguments }}
|
||||||
|
{%- else %}
|
||||||
|
{{- tool_call.arguments | tojson }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '}\n</tool_call>' }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '<|im_end|>\n' }}
|
||||||
|
{%- elif message.role == "tool" %}
|
||||||
|
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||||
|
{{- '<|im_start|>user' }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '\n<tool_response>\n' }}
|
||||||
|
{{- content }}
|
||||||
|
{{- '\n</tool_response>' }}
|
||||||
|
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||||
|
{{- '<|im_end|>\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- if add_generation_prompt %}
|
||||||
|
{{- '<|im_start|>assistant\n' }}
|
||||||
|
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||||
|
{{- '<think>\n\n</think>\n\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
60
config.json
Normal file
60
config.json
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Qwen3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 151643,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 151645,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 1024,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 3072,
|
||||||
|
"layer_types": [
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 40960,
|
||||||
|
"max_window_layers": 28,
|
||||||
|
"model_type": "qwen3",
|
||||||
|
"num_attention_heads": 16,
|
||||||
|
"num_hidden_layers": 28,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 1000000,
|
||||||
|
"sliding_window": null,
|
||||||
|
"tie_word_embeddings": true,
|
||||||
|
"transformers_version": "4.57.3",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 151936
|
||||||
|
}
|
||||||
13
generation_config.json
Normal file
13
generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
{
|
||||||
|
"bos_token_id": 151643,
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
151645,
|
||||||
|
151643
|
||||||
|
],
|
||||||
|
"pad_token_id": 151643,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_k": 20,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"transformers_version": "4.57.3"
|
||||||
|
}
|
||||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2bd4424d4a0b12293d4a03b792273e2b7eb8844e8c02eb291059c28847117b7f
|
||||||
|
size 1192135096
|
||||||
3
plots/accuracy.png
Normal file
3
plots/accuracy.png
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:84dc5d7f8e6ccac06aedb2aa3efba842189372a1325f1a3ab7d77c7e185bdf87
|
||||||
|
size 153910
|
||||||
3
plots/results.png
Normal file
3
plots/results.png
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:627758cb36ad47356c0349eee3c7aa4568614cf34461651dc1a683f39bbe79bc
|
||||||
|
size 156825
|
||||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
||||||
|
size 11422654
|
||||||
239
tokenizer_config.json
Normal file
239
tokenizer_config.json
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": false,
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"151643": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151644": {
|
||||||
|
"content": "<|im_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151645": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151646": {
|
||||||
|
"content": "<|object_ref_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151647": {
|
||||||
|
"content": "<|object_ref_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151648": {
|
||||||
|
"content": "<|box_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151649": {
|
||||||
|
"content": "<|box_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151650": {
|
||||||
|
"content": "<|quad_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151651": {
|
||||||
|
"content": "<|quad_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151652": {
|
||||||
|
"content": "<|vision_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151653": {
|
||||||
|
"content": "<|vision_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151654": {
|
||||||
|
"content": "<|vision_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151655": {
|
||||||
|
"content": "<|image_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151656": {
|
||||||
|
"content": "<|video_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151657": {
|
||||||
|
"content": "<tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151658": {
|
||||||
|
"content": "</tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151659": {
|
||||||
|
"content": "<|fim_prefix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151660": {
|
||||||
|
"content": "<|fim_middle|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151661": {
|
||||||
|
"content": "<|fim_suffix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151662": {
|
||||||
|
"content": "<|fim_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151663": {
|
||||||
|
"content": "<|repo_name|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151664": {
|
||||||
|
"content": "<|file_sep|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151665": {
|
||||||
|
"content": "<tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151666": {
|
||||||
|
"content": "</tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151667": {
|
||||||
|
"content": "<think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151668": {
|
||||||
|
"content": "</think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"bos_token": null,
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "<|im_end|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"extra_special_tokens": {},
|
||||||
|
"model_max_length": 131072,
|
||||||
|
"pad_token": "<|endoftext|>",
|
||||||
|
"split_special_tokens": false,
|
||||||
|
"tokenizer_class": "Qwen2Tokenizer",
|
||||||
|
"unk_token": null
|
||||||
|
}
|
||||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user