初始化项目,由ModelHub XC社区提供模型

Model: Heralax/Mistrilitary-7b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-02 00:18:25 +08:00
commit 97409a24f1
12 changed files with 268270 additions and 0 deletions

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
*.gguf filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text

69
README.md Normal file
View File

@@ -0,0 +1,69 @@
---
library_name: transformers
license: apache-2.0
base_model: Heralax/army-pretrain-1
tags:
- generated_from_trainer
model-index:
- name: us-army-finetune-1
results: []
---
Was torn between calling it MiLLM and Mistrillitary. *Sigh* naming is one of the two great problems in computer science...
This is a domain-expert finetune based on the US Army field manuals (the ones that are published and available for civvies like me). It's focused on factual question answer only, but seems to be able to answer slightly deeper questions in a pinch.
## Model Quirks
- I had to focus on the army field manuals because the armed forces publishes a truly massive amount of text.
- No generalist assistant data was included, which means this is very very very focused on QA, and may be inflexible.
- Experimental change: data was mostly generated by a smaller model, Mistral NeMo. Quality seems unaffected, costs are much lower. Had problems with the open-ended questions not being in the right format.
- Low temperture recommended. Screenshots use 0.
- ChatML
- No special tokens added.
Examples:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/KakWvjSMwSHkISPGoB0RH.png))
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/7rlJxcjGECqFuEFmYC3aV.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/mzxk9Qa9cveFx7PArnAmB.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/2KtpGhqReVPj4Wh3fles5.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/Pz70D922utg5ZZCqYiGpT.png)
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 5
- gradient_accumulation_steps: 6
- total_train_batch_size: 60
- total_eval_batch_size: 5
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 48
- num_epochs: 6
### Training results
It answers questions alright.
### Framework versions
- Transformers 4.45.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.20.0

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"<|end_of_text|>": 32000
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f7deb6d836e1bcad4d8582b9c7acd14c0ed40fe553aa370f81a12e7efffa4974
size 14484749216

27
config.json Normal file
View File

@@ -0,0 +1,27 @@
{
"_name_or_path": "Heralax/army-pretrain-1",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.45.0",
"use_cache": false,
"vocab_size": 32001
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"transformers_version": "4.45.0"
}

3
ggml-model-Q8_0.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a55e3780edf63b1a5c8a922b67336f7bbd417c02f3c7ec998677bc080cf88832
size 7695867296

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b829aa20aade9424fa731063076ac1d0261433af53b3c9df52aad3d233556d67
size 14483521198

30
special_tokens_map.json Normal file
View File

@@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

268072
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model Normal file

Binary file not shown.

51
tokenizer_config.json Normal file
View File

@@ -0,0 +1,51 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": true,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|end_of_text|>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}