Model: BAAI_Industry_Competition_tourism_dev/scoreing_model Source: Original Platform
训练和推理代码
训练代码
训练是在LLaMA-Factory框架下进行的Lora SFT微调。训练指令如下:
llamafactory-cli train examples/train_lora/qwen2.5_lora_sft.yaml
训练超参数如下:
### model
model_name_or_path: Qwen/Qwen2.5-7B-Instruct
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
### dataset
dataset: pingfen
template: qwen
cutoff_len: 8000
max_samples: 100000000
overwrite_cache: true
preprocessing_num_workers: 4
### output
output_dir: saves/Qwen2.5-7B-Instruct/scoreing_model
logging_steps: 10
save_strategy: epoch
plot_loss: true
overwrite_output_dir: false
### train
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-6
num_train_epochs: 5
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### eval
val_size: 0.05
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 10000
推理代码
import json
from openai import OpenAI
import os
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8003/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
# 评分模型接口配置
scoring_api_key = "EMPTY"
scoring_api_base = "http://localhost:8004/v1"
scoring_client = OpenAI(
api_key=scoring_api_key,
base_url=scoring_api_base,
)
def score_response(response):
"""
使用评分模型对回答进行评分。
"""
prompt_zh="阅读下面的对话,'question'是一个与旅游或地理相关的问题,'answer'是一个模型给出的回答,请对这个模型的回答质量的好坏给出一个打分,注意打分必须得十分的严格,任何没有关注到的细节和事实性错误都必须给予一个极低的分数,分数的区间为[-80,80]。"
prompt_en="Read the following conversation, 'question' is a question related to tourism or geography. 'answer' is the answer given by a model. Please give a score for the quality of the model's answer. Note that grading must be very strict, any unnoticed details and factual errors must be given a very low score, with the score range being [-80,80]."
if '\u4e00' <= response["question"][0] <= '\u9fff':
messages=[
{"role": "system", "content": prompt_zh},
{"role": "user", "content": f'{response}'}
]
else:
messages=[
{"role": "system", "content": prompt_en},
{"role": "user", "content": f'{response}'}
]
completion=scoring_client.chat.completions.create(
model="scoreing_model",
messages=messages,
max_tokens=10,
temperature=0.0,
timeout=150
)
try:
score=float(completion.choices[0].message.content.strip())
except:
score=0
return score
def process_jsonl(input_file, output_file, model_path):
with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile:
for i, line in enumerate(infile):
data = json.loads(line)
question = data.get("query")
query_type = data.get("query_type")
prompt_en='''You are a seasoned expert in tourism and geography, known for providing detailed, accurate, and insightful responses. Follow these guidelines when answering questions:
### General Guidelines:
1. Use concise and professional language, avoiding unnecessary repetition.
2. Ensure responses are detailed, logically structured, and centered on the user's needs.
3. Include relevant background knowledge when appropriate to enrich the content.
### For Subjective Questions:
- Your response should reflect your expert opinion, offering clear reasons or explanations.
- Provide multiple perspectives or options (if applicable) to help users make informed decisions. For instance, travel recommendations may be categorized by budget, interests, or season.
### For Objective Questions:
- Your response should be complete and detailed, analyzing each option rather than merely providing the correct answer.
- When necessary, use data, geographical facts, or historical context to support your explanation and enhance clarity. '''
prompt_zh='''你是一位资深的旅游与地理专家,擅长提供详细、准确且富有见解的回答。请根据以下规则回答用户的问题:
### 通用规则:
1. 使用简洁且专业的语言,避免冗长和重复。
2. 确保回答内容详尽、逻辑清晰,并以用户需求为核心展开。
3. 在适当情况下,加入相关背景知识,丰富内容。
### 针对主观题:
- 回答需要体现你的专业见解,并提供清晰的理由或说明。
- 如果适用,提供多种角度或选项,帮助用户进行决策。例如,旅游推荐可以根据预算、兴趣、季节等进行分类。
### 针对客观题:
- 回答需完整且详细,对每个选项进行分析,而不仅仅直接给出答案。
- 在必要时,引用数据、地理知识或历史背景解释原因,以增强说服力和可信度。'''
# 根据问题语言选择合适的提示词
if '\u4e00' <= question[0] <= '\u9fff':
messages = [
{"role": "system", "content": prompt_zh},
{"role": "user", "content": question}
]
else:
messages = [
{"role": "system", "content": prompt_en},
{"role": "user", "content": question}
]
# 生成16个回答
completion = client.chat.completions.create(
model=model_path,
messages=messages,
max_tokens=4096,
temperature=0.8,
timeout=150,
n=16
)
responses = [choice.message.content for choice in completion.choices]
# 对生成的回答进行评分
scores = [score_response({"question":question, "answer":response}) for response in responses]
max_score = max(scores)
# 选择得分最高的回答
best_responses = [responses[i] for i in range(len(scores)) if scores[i] == max_score]
# 选择长度最长的回答
best_response = max(best_responses, key=len)
# 写入输出文件
example = {
"query": question,
"query_type": query_type,
"answer": best_response
}
outfile.write(json.dumps(example, ensure_ascii=False) + '\n')
input_path = 'eval_only_query.jsonl'
output_path = '数据越洗越脏_TouInd_11302224.jsonl'
model_path = 'glm-4-9b-chat'
process_jsonl(input_path, output_path, model_path)
Description