Go to file

ModelHub XC 9a5e6375e4 初始化项目，由ModelHub XC社区提供模型

Model: bigatuna/Qwen3-1.7B-Sushi-Coder
Source: Original Platform

2026-05-02 02:42:07 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 02:42:07 +08:00

README.md

license, base_model, datasets, language, tags, pipeline_tag, library_name

license

base_model

datasets

language

Qwen3-1.7B-Sushi-Coder

A fine-tuned Qwen3-1.7B model optimized for code generation and competitive programming.

Model Details

Base Model: Qwen/Qwen3-1.7B
Fine-tuning Method: SFT with LoRA (merged)
Training Steps: 1000
Context Length: 2048

Training

This model was fine-tuned using:

LoRA (r=8, alpha=16) on attention and MLP layers
Liger Kernel for memory efficiency
Packing with FlashAttention-2
Cosine learning rate schedule (2e-5 peak)

Datasets

ericholam/codeforces-sft-dataset-beta - 1408 competitive programming examples
TeichAI/claude-4.5-opus-high-reasoning-250x - High-quality reasoning examples

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "bigatuna/Qwen3-1.7B-Sushi-Coder",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("bigatuna/Qwen3-1.7B-Sushi-Coder")

messages = [
    {"role": "user", "content": "Write a Python function to solve the two-sum problem."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sampling Parameters

For best results with Qwen3 models:

Temperature: 0.6-0.7
Top-p: 0.95
Top-k: 20
Do not use greedy decoding (temp=0 causes repetitions)

License

Apache 2.0