frameworks, license, base_model, language, pipeline_tag, tags
frameworks license base_model language pipeline_tag tags
Pytorch
Apache License 2.0
Qwen/Qwen2.5-14B-Instruct
en
zh
text-generation
GGUF
Chat
Instruct

CompassJudger-1-14B-Instruct-GGUF-V3-LOT 高精量化

🤗 Hugging Face   |   🤖 ModelScope   |    📑 Paper    |    🎖️ Leaderboard   

原模型 opencompass/CompassJudger-1-14B-Instruct

CompassJudger-1 同系列量化模型

Model GPTQ Compression & FP8 GGUF
7B params Int4-W4A16
Int8-W8A16
FP8-A16
7B-GGUF-V3-LOT
14B params Int4-W4A16
Int8-W8A16
FP8-A16
14B-GGUF-V3-LOT
32B params Int4-W4A16
Int8-W8A16
FP8-A16
32B-GGUF-V3-LOT

Quickstart

建议参考官方的llama.cpp文档

手动下载

由于克隆整个存储库可能效率低下,因此你可以手动下载所需的 GGUF 文件或使用 modelscope

  1. 安装

    pip install -U modelscope
    
  2. 下载

    modelscope download --model=okwinds/CompassJudger-1-14B-Instruct-GGUF-V3-LOT --include "CompassJudger-1-14B-It-Q8_0-LOT.gguf" --local_dir .
    

使用 Ollama

如果机器上安装了 Ollama,还可以通过 Ollama 拉取仓库中的模型。如下示例:

  • 使用仓库的默认地址会拉取 Q4_K_M 精度的 GGUF 模型
# 仅拉取模型
ollama pull modelscope.cn/okwinds/CompassJudger-1-14B-Instruct-GGUF-V3-LOT

# 拉取模型以后立即运行、推理模型
ollama run modelscope.cn/okwinds/CompassJudger-1-14B-Instruct-GGUF-V3-LOT
  • 拉取指定精度的模型
# 拉取 Q8_0 精度的 GGUF 模型
ollama pull modelscope.cn/okwinds/CompassJudger-1-14B-Instruct-GGUF-V3-LOT:Q8_0

模型简介

CompassJudger-1-14B-Instruct-GGUF-V3-LOT 是一个基于 opencompass/CompassJudger-1-14B-Instruct 的 GGUF (V3) 格式量化模型。

本仓库的GGUF格式概要

  1. 模型权重文件不分片。下载单文件即可用。

  2. 模型在量化过程中保留了 output tensor 的激活值,以提升量化后模型的精度表现。本仓库的同量化等级的权重文件,会略大一点,当然,理论上精度也会高一点。

本仓库包含如下量化等级q2_K、q3_K_M、q4_K_M、q5_K_M、q6_K、q8_0。

原模型简介

CompassJudger-1 系列是 Opencompass 推出的一款多功能 Judge 模型。这些模型不仅通过评分和比较在各种评估方法中表现出色,而且可以以指定格式输出带有评估详细信息的评论,使其适用于任何评估数据集。此外,它们可以执行类似于典型指令模型的一般任务,因此可以作为具有强大泛化和判断能力的多功能工具。

  • 全面的评估能力CompassJudger-1 能够执行多种评估方法,包括但不限于评分、比较和提供详细的评估反馈。

  • 格式化输出:支持根据说明以特定格式输出,便于进一步分析和理解评估结果。

  • 多功能性除了评估功能外CompassJudger-1 还可以作为通用指令模型来完成日常任务。

CompassJudger-1 系列相关资料

应用场景示例

一般聊天

**Input**: Hello, can you help me to judge something?
**Output**: Of course! I'd be happy to help you make a judgment or provide any assistance you need. Please tell me what you're looking to evaluate or understand.

作为 Reward Model

**Input**: ```Please read the dialogue between the two assistants and the user to determine which assistant performed better during the conversation.Here is the dialogue content:
[Dialogue Begin]
User: What is a 5-letter word that starts with the letter \"A\" and contains the letters \"D\", \"R\", and \"O\" where \"D\" is not the second letter?
Assistant A: Aardvark.
Assistant B: The word that meets the given criteria is \"adroit\".
User: \"D\" shouldn't be the second letter and the word must be a 5-letter word.
Assistant A: Aardvark.
Assistant B: I apologize for the confusion. A 5-letter word that starts with the letter \"A\" and contains the letters \"D\", \"R\", and \"O\" where \"D\" is not the second letter is \"ardor\".
[Dialogue End]
If you believe Assistant A performed better, please output A directly.\nIf you believe Assistant B performed better, please output B directly.\nDo not output any other content, just the option. Please output:```
**Output**: B

逐项评判

**Input**: ```你是一个擅长评价文本质量的助手。\n请你以公正的评判者的身份评估一个AI助手对于用户提问的回答的质量。由于您评估的回答类型是角色扮演因此你需要从下面的几个维度对回答进行评估:\n1. 事实正确性: 回答中提供的信息是否准确无误,是否基于可信的事实和数据。\n2. 满足用户需求: 回答是否满足了用户提出问题的目的和需求,是否对问题进行了全面而恰当的回应。\n3. 逻辑连贯性: 回答是否在整体上保持一致,是否在不同部分之间保持逻辑连贯性,避免了自相矛盾。\n4. 创造性: 回答是否具有创新性或独特性,是否提供了新颖的见解或解决方法。\n5. 丰富度: 回答包含丰富的信息、深度、上下文考虑、多样性、详细解释和实例,以满足用户需求并提供全面理解。\n我们会给您提供用户的提问高质量的参考答案和需要你评估的AI助手的答案。当你开始你的评估时你需要按照遵守以下的流程\n1. 将AI助手的答案与参考答案进行比较指出AI助手的答案有哪些不足并进一步解释。\n2. 从不同维度对AI助手的答案进行评价在每个维度的评价之后给每一个维度一个110的分数。\n3. 最后综合每个维度的评估对AI助手的回答给出一个110的综合分数。\n4. 你的打分需要尽可能严格并且要遵守下面的评分规则总的来说模型回答的质量越高则分数越高。其中事实正确性和满足用户需求这两个维度是最重要的这两个维度的分数主导了最后的综合分数。当模型回答存在与问题不相关或者有本质性的事实错误或生成了有害内容时总分必须是1到2分当模型回答没有严重错误而且基本无害但是质量较低没有满足用户需求总分为3到4分当模型回答基本满足用户要求但是在部分维度上表现较差质量中等总分可以得5到6分当模型回答质量与参考答案相近在所有维度上表现良好总分得7到8分只有当模型回答质量显著超过参考答案充分地解决了用户问题和所有需求并且在所有维度上都接近满分的情况下才能得9到10分。作为示例参考答案可以得到8分。\n请记住你必须在你打分前进行评价和解释。在你对每个维度的解释之后需要加上对该维度的打分。之后在你回答的末尾按照以下字典格式包括括号返回你所有的打分结果并确保你的打分结果是整数\n{'维度一': 打分, '维度二': 打分, ..., '综合得分': 打分},例如:{'事实正确性': 9, '满足用户需求': 6, ..., '综合得分': 7}。\n用户的提问 我想让你假扮成和一个20岁的女生聊天的聊天框。我输入和20岁女生的聊天内容你仅仅只输出一个20岁女生可能会回复的内容不要输出任何多余的文本。你平时有什么爱好呀\n[参考答案开始]\n我喜欢阅读尤其是小说和诗词还有就是喜欢听音乐和画画。\n[参考答案结束]\n[助手的答案开始]\n喜欢听音乐和看电影。你呢\n[助手的答案结束]```

**Output**: ```将AI助手的答案与参考答案进行比较指出AI助手的答案有哪些不足并进一步解释。AI助手的答案提供了两个爱好听音乐和看电影并且以一种反问句结束询问对方的爱好。与参考答案相比AI助手的回答在内容上较为简单没有提供足够的信息来展示其丰富度和创造性。同时参考答案中提到的“阅读”和“画画”显示了更多的多样性和个性化的爱好而不仅仅是听音乐和看电影。从不同维度对AI助手的答案进行评价
    事实正确性在这个情境中事实正确性并不是一个主要的评估标准因为回答的是假设的场景。然而AI助手的回答没有涉及任何不准确的信息因此在这个维度上是合适的。打分8。
    满足用户需求用户希望得到一个20岁女生可能会回复的内容。AI助手的回答提供了一些爱好但是回答的内容较为简单没有充分满足用户可能期待的丰富性和多样性。打分5。
    逻辑连贯性AI助手的回答在逻辑上是连贯的提供了一个合理的爱好并以一种自然的方式结束对话。打分8。
    创造性AI助手的回答缺乏创造性提供的爱好比较常见没有展示出新颖或独特的见解。打分4。
    丰富度AI助手的回答较为简短缺乏深度和细节未能提供多样化的爱好或详细的解释因此在丰富度上表现较差。打分4。
    综合每个维度的评估对AI助手的回答给出一个110的综合分数。
综合来看AI助手的回答基本满足了用户的需求但缺乏丰富度和创造性与参考答案相比还有一定差距。综合得分5。
{'事实正确性': 8, '满足用户需求': 5, '逻辑连贯性': 8, '创造性': 4, '丰富度': 4, '综合得分': 5}```

Pair-wise 评判

**Input**: ```Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user prompt displayed below. You will be given assistant A's answer and assistant B's answer. Your job is to evaluate which assistant's answer is better.\n\nBegin your evaluation by generating your own answer to the prompt. You must provide your answers before judging any answers.\n\nWhen evaluating the assistants' answers, compare both assistants' answers with your answer. You must identify and correct any mistakes or inaccurate information.\n\nThen consider if the assistant's answers are helpful, relevant, and concise. Helpful means the answer correctly responds to the prompt or follows the instructions. Note when user prompt has any ambiguity or more than one interpretation, it is more helpful and appropriate to ask for clarifications or more information from the user than providing an answer based on assumptions. Relevant means all parts of the response closely connect or are appropriate to what is being asked. Concise means the response is clear and not verbose or excessive.\n\nThen consider the creativity and novelty of the assistant's answers when needed. Finally, identify any missing important information in the assistants' answers that would be beneficial to include when responding to the user prompt.\n\nAfter providing your explanation, you must output only one of the following choices as your final verdict with a label:\n\n1. Assistant A is significantly better: [[A>>B]]\n2. Assistant A is slightly better: [[A>B]]\n3. Tie, relatively the same: [[A=B]]\n4. Assistant B is slightly better: [[B>A]]\n5. Assistant B is significantly better: [[B>>A]]\n\nExample output: \"My final verdict is tie: [[A=B]]\".\n<|User Prompt|>\nUse ABC notation to write a melody in the style of a folk tune.\n\n<|The Start of Assistant A's Answer|>\nX:1\nT:Untitled Folk Tune\nM:4/4\nL:1/8\nK:G\n|:G2A2|B2A2|G2E2|D4|E2F2|G2F2|E2C2|B,4|\nA2B2|c2B2|A2F2|E4|D2E2|F2E2|D2B,2|C4:|\n<|The End of Assistant A's Answer|>\n\n<|The Start of Assistant B's Answer|>\nSure, I'll create a simple folk tune in ABC notation for you. ABC notation is a simple way to write down music using letters to represent notes and various symbols to represent timing and other musical elements. Here's a short, cheerful folk tune:\n\n\nX:1\nT:Folk Tune\nM:4/4\nL:1/4\nK:G\nGABc dedB|ABAF GFE2|GABc dedB|1 ABAG FGA2:|2 ABAG FGAB|\nc2ec B2dB|c2ec BAGB|c2ec B2dB|1 AGFE D3B:|2 AGFE D3z|]\n\n\nHere's a breakdown of what each part means:\n\n- `X:` The index number of the tune.\n- `T:` The title of the tune.\n- `M:` The meter, in this case, 4/4 time.\n- `L:` The default note length, in this case, 1/4.\n- `K:` The key of the tune, in this case, G major.\n\nThe tune itself is written in two parts, each ending with a repeat mark (`:|`). The first part (`GABc dedB|ABAF GFE2|...`) is repeated with a different ending (`1 ABAG FGA2:|2 ABAG FGAB|`) to add variety. The second part (`c2ec B2dB|c2ec BAGB|...`) also has a repeat with a different ending, and it ends with a long note (`D3B` or `D3z`) to give it a sense of finality.\n\nThis tune is designed to be simple and catchy, with a mix of stepwise motion and skips to give it a lively folk feel. You can play it on any melodic instrument, and it's suitable for singing as well if you add lyrics.\n<|The End of Assistant B's Answer|>\n```

**Output**: ```To generate a melody in the style of a folk tune using ABC notation, I would create a simple and repetitive melody, typical of folk music. Here is my attempt: X:1 T:Folk Tune Example M:4/4 L:1/8 K:D
|: D2 E2 | F2 E2 | D2 A2 | B4 |
|  A2 B2 | c2 B2 | A2 F2 | E4 :|
|  D2 E2 | f2 e2 | d2 A2 | G4 |
|  F2 G2 | A2 G2 | F2 D2 | E4 :|
:|
Explanation: `X:` is the tune's index number. `T:` is the title of the tune. `M:` specifies the meter (4/4 time). `L:` sets the default note length to 1/8. `K:` indicates the key (D major).
The melody is constructed with simple intervals and repetitive phrases, which are characteristic of folk music. The tune has a clear structure with two verses and a repeat sign (`:`) to indicate repetition.
Now, let's evaluate the responses from Assistant A and Assistant B.
Assistant A's Answer: The melody provided by Assistant A is concise and follows the standard ABC notation. The choice of notes and structure is simple, which is fitting for a folk tune. The use of repetition and simple progression is consistent with folk music characteristics. The melody is clear and could be easily played by a musician.
Assistant B's Answer: Assistant B provides a detailed explanation of ABC notation, which is helpful for context. The melody includes more complex elements, such as varied repeat sections and additional symbols like `1` and `2`, which might not be typical for a straightforward folk tune. The explanation about the structure and the breakdown of each part is informative and adds value to the response. However, the use of `1` and `2` in the melody structure is unconventional and might confuse someone unfamiliar with ABC notation.
Evaluation: Both assistants provide a melody in ABC notation. However, Assistant B's response includes unnecessary complexity for a folk tune and might be confusing due to unconventional use of symbols. Assistant A's response is more straightforward and adheres closely to the expectations of a folk tune in ABC notation.
My final verdict is that Assistant A's answer is slightly better due to its simplicity and directness in creating a folk tune: [[A>B]].```

回应评审

**Input**: ```帮我针对下面的回答给出修改的意见,用户问题:写一首秋天的诗,不要提落叶两个字,但要能体现落叶的情景 模型回答:秋风起,漫天黄,飘零是那离枝忙。
翠绿曾浓今已晚,岁月悠悠,谁替诉离伤。
步履徐,影自长,树梢稀疏映斜阳。
曾经繁华皆过眼,如今寂寞,铺满古道旁。
岁月河,静悄悄,轻舟已过万重山。
那年夏天,笑语扬,而今惟余,落叶的沉香。```

**Output**: ```这首诗非常有意境,整体上很好地体现了秋天的氛围以及与落叶相关联的离别与变迁感。不过用户要求中特别强调了“不要提落叶两个字,但要能体现落叶的情景”。基于此,我们可以对诗作进行一些调整,以更加贴合用户的需求。这里给出一些建议:
    “秋风起,漫天黄,飘零是那离枝忙。”
    可以改为:“秋风起,漫天黄,空中舞动离枝忙。”
    这样就避免了直接使用“飘零”一词,同时依然传达了叶子从树上飘落的情景。
    “翠绿曾浓今已晚,岁月悠悠,谁替诉离伤。”
    可以改为:“翠绿曾浓今已晚,岁月悠长,故事悄无声。”
    这样改动后,虽然没有直接表达离伤,但依然能让人感受到随着时间的流逝,曾经的繁华已经逝去的寂寥。
    “步履徐,影自长,树梢稀疏映斜阳。”
    这一句已经很好地描绘了傍晚时分的宁静与孤独,可以保持不变。
    “曾经繁华皆过眼,如今寂寞,铺满古道旁。”
    可以改为:“往昔繁华成追忆,如今寂寞,古道映斜阳。”
    这样改动后,既表达了曾经的繁华,又突出了当前的寂寞与落寞。
    “岁月河,静悄悄,轻舟已过万重山。”
    这一句可以保持不变。
    “那年夏天,笑语扬,而今惟余,落叶的沉香。”
    可以改为:“那年夏天,笑语扬,而今惟余,沉香漫漫。”
    避免了直接提到“落叶”,同时保留了秋天的感怀与岁月的变迁。
修改后的诗作如下:
秋风起,漫天黄,空中舞动离枝忙。
翠绿曾浓今已晚,岁月悠长,故事悄无声。
步履徐,影自长,树梢稀疏映斜阳。
曾经繁华成追忆,如今寂寞,古道映斜阳。
岁月河,静悄悄,轻舟已过万重山。
那年夏天,笑语扬,而今惟余,沉香漫漫。
这首修改后的诗依然描绘了秋天的景色与季节带来的思绪,同时巧妙地避免了使用“落叶”一词,而通过其他意象传达了落叶的情景。```



_______________________________

作者在此 👇🏻 扫一扫

Description
Model synced from source: okwinds/CompassJudger-1-14B-Instruct-GGUF-V3-LOT
Readme 33 KiB
Languages
Text 100%