Qwen1.5-0.5B-Chat_mistral/README.md at main

Files

ModelHub XC d3f51d3d94 初始化项目，由ModelHub XC社区提供模型

Model: Minami-su/Qwen1.5-0.5B-Chat_mistral
Source: Original Platform

2026-06-01 01:21:38 +08:00

3.0 KiB

Raw Permalink Blame History

license, license_name, license_link, language, library_name, pipeline_tag, inference, tags

license

license_name

license_link

language

library_name

pipeline_tag

inference

special

1.Before using this model, you need to modify modeling_mistral.py in transformers library

2.vim /root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py

3.find MistralAttention,

4.modify q,k,v,o bias=False ----->, bias=config.attention_bias

Before: After:

Differences between qwen2 mistral and qwen2 llamafy

Compared to qwen2 llamafy,qwen2 mistral can use sliding window attention,qwen2 mistral is faster than qwen2 llamafy, and the context length is better

Usage:


from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("Minami-su/Qwen1.5-0.5B-Chat_mistral")
model = AutoModelForCausalLM.from_pretrained("Minami-su/Qwen1.5-0.5B-Chat_mistral", torch_dtype="auto", device_map="auto")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "user", "content": "Who are you?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to("cuda")
generate_ids = model.generate(inputs,max_length=2048, streamer=streamer)

Test

load in 4bit

hf-causal (pretrained=Qwen1.5-0.5B-Chat), limit: None, provide_description: False, num_fewshot: 0, batch_size: 32
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.2389|±  |0.0125|
|             |       |acc_norm|0.2688|±  |0.0130|
|truthfulqa_mc|      1|mc1     |0.2534|±  |0.0152|
|             |       |mc2     |0.4322|±  |0.0151|
|winogrande   |      0|acc     |0.5564|±  |0.0140|

load in 4bit

hf-causal (pretrained=Qwen1.5-0.5B-Chat_mistral), limit: None, provide_description: False, num_fewshot: 0, batch_size: 32
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.2398|±  |0.0125|
|             |       |acc_norm|0.2705|±  |0.0130|
|truthfulqa_mc|      1|mc1     |0.2534|±  |0.0152|
|             |       |mc2     |0.4322|±  |0.0151|
|winogrande   |      0|acc     |0.5549|±  |0.0140|

3.0 KiB Raw Permalink Blame History

special

Differences between qwen2 mistral and qwen2 llamafy

Test

3.0 KiB

Raw Permalink Blame History