初始化项目,由ModelHub XC社区提供模型
Model: WithinUsAI/Llama-3.2-OctoThinker-iNano-1B Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
120
README.md
Normal file
120
README.md
Normal file
@@ -0,0 +1,120 @@
|
||||
---
|
||||
license: other
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- llama
|
||||
- llama-3.2
|
||||
- merge
|
||||
- slerp
|
||||
- reasoning
|
||||
- instruct
|
||||
- chat
|
||||
- coding
|
||||
- 1b
|
||||
- gss1147
|
||||
base_model:
|
||||
- NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B
|
||||
- OctoThinker/OctoThinker-1B-Hybrid-Base
|
||||
- meta-llama/Llama-3.2-1B-Instruct
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Llama-3.2-OctoThinker-iNano-1B
|
||||
|
||||
**Llama-3.2-OctoThinker-iNano-1B** is a compact 1B-parameter merged language model built from three Llama 3.2-based components:
|
||||
|
||||
- `NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B`
|
||||
- `OctoThinker/OctoThinker-1B-Hybrid-Base`
|
||||
- `meta-llama/Llama-3.2-1B-Instruct`
|
||||
|
||||
This model was merged using the **SLERP** merge method, combining instruction-following behavior, reasoning-oriented characteristics, and hybrid assistant-style generation into a single lightweight checkpoint.
|
||||
|
||||
## Model Summary
|
||||
|
||||
This checkpoint is designed to provide a balanced small-model experience across:
|
||||
|
||||
- instruction following
|
||||
- reasoning-style responses
|
||||
- conversational usability
|
||||
- lightweight local inference
|
||||
- general-purpose text generation
|
||||
- basic coding assistance
|
||||
|
||||
The goal of this merge is to blend the structured usability of an instruct model with the reasoning and hybrid-response traits of the other source checkpoints.
|
||||
|
||||
## Merge Details
|
||||
|
||||
### Source Models
|
||||
- **NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B**
|
||||
- **OctoThinker/OctoThinker-1B-Hybrid-Base**
|
||||
- **meta-llama/Llama-3.2-1B-Instruct**
|
||||
|
||||
### Merge Method
|
||||
This model was created using **SLERP**.
|
||||
|
||||
**SLERP** (Spherical Linear Interpolation) is a model merging method used to interpolate between checkpoints in a way that better preserves directional relationships in weight space than simple linear averaging. It is often used to combine useful traits from multiple models while reducing some of the rough edges of naive blends.
|
||||
|
||||
## Intended Use
|
||||
|
||||
This model is intended for:
|
||||
|
||||
- assistant-style chat
|
||||
- general text generation
|
||||
- lightweight reasoning tasks
|
||||
- brainstorming
|
||||
- summarization
|
||||
- simple coding help
|
||||
- prompt experimentation
|
||||
- local low-resource inference
|
||||
|
||||
## Out-of-Scope Use
|
||||
|
||||
This model is not intended for:
|
||||
|
||||
- medical advice
|
||||
- legal advice
|
||||
- financial decision-making
|
||||
- autonomous high-risk use
|
||||
- safety-critical production systems without extensive evaluation
|
||||
|
||||
## Strength Profile
|
||||
|
||||
This merged checkpoint is aimed at users who want a compact model that blends:
|
||||
|
||||
- instruction tuning
|
||||
- reasoning flavor
|
||||
- hybrid assistant behavior
|
||||
- fast inference in constrained environments
|
||||
|
||||
Because it is a 1B model, it is best suited for short-to-medium tasks rather than very deep long-context reasoning.
|
||||
|
||||
## Limitations
|
||||
|
||||
Like other small language models, this model may:
|
||||
|
||||
- hallucinate facts
|
||||
- produce inconsistent reasoning
|
||||
- struggle with harder multi-step coding tasks
|
||||
- lose reliability on long prompts
|
||||
- behave differently depending on prompt formatting
|
||||
- reflect bias inherited from source models and training data
|
||||
|
||||
It should be tested carefully before use in any workflow requiring strong factual reliability or safety guarantees.
|
||||
|
||||
## Prompting Tips
|
||||
|
||||
For best results:
|
||||
|
||||
- use clear direct prompts
|
||||
- request concise or step-by-step answers explicitly
|
||||
- keep tasks focused
|
||||
- avoid overly ambiguous instructions
|
||||
|
||||
### Example Prompt
|
||||
|
||||
```text
|
||||
You are a compact reasoning assistant. Answer clearly and step by step.
|
||||
|
||||
Explain recursion in Python and provide a simple example.
|
||||
38
config.json
Normal file
38
config.json
Normal file
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"dtype": "float16",
|
||||
"eos_token_id": 128001,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8192,
|
||||
"is_llama_config": true,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 16,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": null,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_interleaved": false,
|
||||
"rope_parameters": {
|
||||
"factor": 32.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_theta": 500000.0,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "5.3.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
9
mergekit_config.yml
Normal file
9
mergekit_config.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
merge_method: slerp
|
||||
base_model: X:\Amalgamation AI Universal GUI\_prepared_models\OctoThinker-OctoThinker-1B-Hybrid-Base_5b07aeb080
|
||||
models:
|
||||
- model: C:\Users\GSS1147\Desktop\Safetensors (Models)\meta-llamaLlama--3.2-1B (MODELS)\NeuraLakeAi-iSA-02-Nano-Llama-3.2-1B
|
||||
parameters:
|
||||
weight: 1.0
|
||||
dtype: float16
|
||||
parameters:
|
||||
t: 0.55
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f05221450b8b64b645d90e809f0c426401a540e54e108498539b3030385a9c9c
|
||||
size 2996982200
|
||||
16
special_tokens_map.json
Normal file
16
special_tokens_map.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
|
||||
size 17209920
|
||||
2062
tokenizer_config.json
Normal file
2062
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user