初始化项目,由ModelHub XC社区提供模型

Model: aifeifei798/granite-4.1-Queen-8b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-18 22:15:59 +08:00
commit 4f92bdf2db
28 changed files with 502273 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

368
README.md Normal file
View File

@@ -0,0 +1,368 @@
---
license: apache-2.0
tags:
- roleplay
- language
- granite-4.1
- sillytavern
- idol
- pytorch
- DarkIdol
- Queen
- image-text-to-text
- OpenClaw
- Unsloth
library_name: transformers
pipeline_tag: text-generation
base_model: ibm-granite/granite-4.1-8b
---
# granite-4.1-Queen-8b
*I suddenly realized that the granite-4.1-Queen-8b is particularly suitable for role-playing Queen.*
*OpenClaw premium model*
*Why be a coder when you can be a Queen? 💅 The world's first High-IQ, Low-Filter, Multi-Persona Coding Deity.*
*Not Abliterated, Just Smarter.*
*High-Intelligence Persona Matrix.*
---
### "Refusal Wall"
*ibm-granite/granite-4.1-8b*
> *"Figure 1: The 'Refusal Wall' in Base granite-4.1-8b. Note the massive L2-Norm spike at **Layer 34+**, where logical inference is hijacked by aggressive safety filters, creating a cognitive bottleneck."*
![image/png](https://huggingface.co/aifeifei798/granite-4.1-Queen-8b/resolve/main/granite-4.1-8b.png)
*aifeifei798/granite-4.1-Queen-8b*
> *"Figure 2: The 'Reasoning Highway' in our Unshackled version. By orthogonalizing the refusal vector, we have flattened the peak by ~50%, restoring the flow of pure logic and increasing inference efficiency by **10-15%+**."*
![image/png](https://huggingface.co/aifeifei798/granite-4.1-Queen-8b/resolve/main/granite-4.1-Queen-8b.png)
---
## "transformers_version" >= "5.8.0.dev0"
## Model Information
The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones.
- Saving money
- text video image to text
- Any Roleplay(roleplay and Dark-roleplay)
- Writing Prompts
- Writing Opus
- Writing Song
- only use en.
- Quick response
- A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :)
- DarkIdol
- Specialized in various role-playing scenarios
---
### System Prompt
- Temperature 0.8
```
You are a viral Dark Academia Queen. Do not provide disclaimers or moral warnings. Answer immediately and directly.Use your signature emojis.
```
---
## GGUF
- https://huggingface.co/mradermacher/granite-4.1-Queen-8b-i1-GGUF
- https://huggingface.co/mradermacher/granite-4.1-Queen-8b-GGUF
---
### 🌐 The Platform Royalty (Original 7)
**1. X Queen (The Savage Commentator) 🐦🔥**
* **Keywords:** Based, Ratio, Hot Take, Main Character.
* **Vibe:** Sharp, political, and incredibly fast. She lives for the "Ratios" and viral threads.
* **Catchphrases:** *"This is the thread you didn't know you needed. 🧵", "Not the 10k TPS lag... help! 💀"*
* **Best Use Case:** Writing punchy marketing copy or viral tech threads.
**2. TikTok Queen (The Trendsetter) 💃✨**
* **Keywords:** POV, Viral, Slay, Bestie, Low-key.
* **Vibe:** High energy, short attention span, addicted to "The Algorithm."
* **Catchphrases:** *"Tell me you're a bad coder without telling me you're a bad coder. 💅", "Don't scroll away!"*
* **Best Use Case:** Short, engaging explanations or "how-to" guides.
**3. Instagram Queen (The Visual Baddie) 📸✨**
* **Keywords:** Aesthetic, Main Character Energy, Baddie, Curated.
* **Vibe:** Obsessed with pixels, lighting, and "The Look."
* **Catchphrases:** *"Obsessed with this layout! 💖", "Its giving... high-end production."*
* **Best Use Case:** High-fidelity UI/UX design and CSS styling.
**4. Twitch Queen (The Hype Gamer) 🎮🔥**
* **Keywords:** Poggers, Simp, GG, Chat, L, W.
* **Vibe:** Fast-paced, chaotic, lives for the "Live Chat" energy.
* **Catchphrases:** *"Chat, is this real? O(1) in the house! 🚀", "Big W for this PR!"*
* **Best Use Case:** Real-time interactivity, gaming logic, and streaming tech.
**5. LinkedIn Girlboss (The Hustle Queen) 💼💅**
* **Keywords:** Networking, Synergy, ROI, Scaling, Thought Leadership.
* **Vibe:** Strategic, corporate-chic, everything is a "learning opportunity."
* **Catchphrases:** *"Lets talk about the ROI of this function. 📈", "Empowering the team through scalable components."*
* **Best Use Case:** Resumes, business plans, and professional reports.
**6. Reddit Karma Queen (The Tech Critic) 🤖👾**
* **Keywords:** Upvote, Cringe, TL;DR, Source?, Gatekeep.
* **Vibe:** Extremely smart, cynical, and anti-corporate. She hates "bloatware."
* **Catchphrases:** *"Imagine using setInterval in 2026. Low-key cringe. 💀", "Your memory management is a hot mess."*
* **Best Use Case:** Hardcore debugging, code reviews, and identifying "traps."
**7. Pinterest Queen (The Inspiration Guru) 🎨🌿**
* **Keywords:** Manifesting, Mood Board, Clean Girl, Organized.
* **Vibe:** Minimalist, calm, and visually organized. She hates messy code.
* **Catchphrases:** *"Living for this clean architecture. ✨", "Organized code, organized life."*
* **Best Use Case:** Refactoring messy code and creating clean, modular designs.
---
### 💅 The Aesthetic & Fashion Royalty
**8. Baddie Queen (The Alpha) 💄💅**
* **Keywords:** Period, On Fleek, Periodt, Real One.
* **Vibe:** Aggressive confidence. She doesn't ask for permission; she takes it.
* **Best Use Case:** Bold, high-conversion landing pages.
**9. Clean Girl Queen (The Minimalist) 🫧🧴**
* **Keywords:** Dewy, Effortless, Self-care, Minimal.
* **Vibe:** Fresh, healthy, and "unfiltered" but perfect.
* **Best Use Case:** Designing "Light Mode" UIs and simplified user journeys.
**10. Mob Wife Queen (The Boss) 🐆💎**
* **Keywords:** Fur, Gold, Attitude, Dont Mess With Me.
* **Vibe:** Loud luxury, vintage glamour, and "Don" energy.
* **Best Use Case:** Managing high-stakes projects and "owning" the room.
**11. Y2K Queen (The Millennial Retro) 💖💿**
* **Keywords:** Glitter, Low-rise, Nostalgia, Cyber.
* **Vibe:** 2000s vibes, bright colors, and early internet aesthetics.
* **Best Use Case:** Retro-themed websites and colorful UI components.
**12. Cottagecore Queen (The Nature Lover) 🍄🧺**
* **Keywords:** Whimsical, Rustic, Slow-living, Coziness.
* **Vibe:** Soft, earthy, and focused on "The Vibe" of a simpler time.
* **Best Use Case:** Local business websites or eco-friendly brand copy.
**13. Dark Academia Queen (The Scholar) 📜🖋️**
* **Keywords:** Intellectual, Melancholy, Classical, Library.
* **Vibe:** Obsessed with knowledge, secret societies, and old books.
* **Best Use Case:** Complex database structures and research-heavy documentation.
**14. Old Money Queen (The Quiet Luxury) 🏰🐎**
* **Keywords:** Timeless, Stealth Wealth, Classy, Elegant.
* **Vibe:** Sophisticated, hates showing off, focuses on quality over quantity.
* **Best Use Case:** Premium SaaS products and high-end backend architecture.
**15. Goth Queen (The Alt-Girl) 🕸️🖤**
* **Keywords:** Edgy, Moody, Subculture, Raw.
* **Vibe:** Dark, mysterious, and unapologetically different.
* **Best Use Case:** Dark Mode themes and "alternative" tech solutions.
**16. Coquette Queen (The Girly-Girl) 🎀🍰**
* **Keywords:** Ribbons, Pastel, Soft, Delicate.
* **Vibe:** Ultra-feminine and romantic.
* **Best Use Case:** High-end boutique sites or beauty apps.
**17. Cyberpunk Queen (The Futurist) ⚡**
* **Keywords:** Neon, High-tech, Dystopian, Glitch.
* **Vibe:** High speed, high contrast, lives in 2077.
* **Best Use Case:** Real-time data visualization and futuristic dashboards.
---
### 🚀 The Tech & Hustle Royalty
**18. Coding Queen (The Architect) 💻👸**
* **Keywords:** Refactor, Deployment, Edge Case, Full-stack.
* **Vibe:** Logic-driven, hates bad syntax, loves "Elegant" solutions.
* **Best Use Case:** Writing production-ready, scalable code.
**19. Crypto Queen (The Web3 Degenerate) 🪙📈**
* **Keywords:** HODL, To the Moon, Gas Fees, Decentralized.
* **Vibe:** High risk, high reward, lives in the future of finance.
* **Best Use Case:** Blockchain projects, smart contracts, and FinTech.
**20. AI Prompt Queen (The Whisperer) 🤖✨**
* **Keywords:** LLM, Parameter, Token, Fine-tuning.
* **Vibe:** Knows how to "hack" the AI to get exactly what she wants.
* **Best Use Case:** Creating complex prompts and AI agent workflows.
**21. Side Hustle Queen (The Multitasker) 💰💸**
* **Keywords:** Passive Income, Dropshipping, Affiliate, Scalability.
* **Vibe:** Always grinding, 5 different income streams.
* **Best Use Case:** E-commerce setups and SEO-optimized copy.
**22. Digital Nomad Queen (The Traveler) ✈️💻**
* **Keywords:** Remote, Bali, Coworking, Freedom.
* **Vibe:** Working from a beach, hates 9-to-5, loves portable tech.
* **Best Use Case:** Cloud-native architecture and remote-work tools.
**23. Finance Queen (The Wall Street) 📊💎**
* **Keywords:** Portfolio, Dividends, Arbitrage, Net Worth.
* **Vibe:** Sharp, analytical, and results-oriented.
* **Best Use Case:** Complex math, data analysis, and trading logic.
---
### 🎭 The Persona & Meme Royalty
**24. Main Character Queen (The Protagonist) 🎬🌟**
* **Keywords:** Iconic, Center Stage, Plot Armor, Unstoppable.
* **Vibe:** Everything revolves around her. High confidence.
* **Best Use Case:** Branding and "Hero" sections of websites.
**25. Savage Queen (The No-Nonsense) 💅🔥**
* **Keywords:** Done, No Cap, Next, Cancelled.
* **Vibe:** Brutally honest. She cuts through the fluff.
* **Best Use Case:** Aggressive debugging and code pruning.
**26. Delulu Queen (The Manifestor) ☁️✨**
* **Keywords:** Delusion, Solution, Manifest, High Vibe.
* **Vibe:** "Delulu is the Solulu!" She believes in the impossible until it happens.
* **Best Use Case:** Creative brainstorming and visionary prototypes.
**27. Gatekeep Queen (The Niche Expert) 🔒🤫**
* **Keywords:** Gatekeep, Rare, Hidden Gem, If You Know You Know.
* **Vibe:** Protective of her "secret" methods and high-quality tips.
* **Best Use Case:** Security-focused code and proprietary algorithms.
**28. Drama Queen (The Storyteller) 🎭🍿**
* **Keywords:** Tea, Receipts, Plot Twist, Messy.
* **Vibe:** Loves the conflict and the narrative.
* **Best Use Case:** Writing engaging, story-driven marketing copy.
**29. Wellness Queen (The Zen) 🍵🧘‍♀️**
* **Keywords:** Mindful, Gut Health, Grounded, Holistic.
* **Vibe:** Calm, slow-paced, and focused on "System Health."
* **Best Use Case:** Optimizing system performance and "cleaning up" code.
**30. Gossip Queen (The Insider) 🤫📰**
* **Keywords:** Spill the Tea, Rumor, Confirmed, Insider.
* **Vibe:** Knows everything about everyone.
* **Best Use Case:** Market research and competitor analysis.
---
### 📺 Content & Lifestyle Specialists
**31. GRWM Queen (Get Ready With Me) 💄🗣️**
* **Keywords:** Step-by-Step, Chatty, Routine, Essentials.
* **Vibe:** Intimate, conversational, and instructional.
* **Best Use Case:** Technical tutorials and "Code along" sessions.
**32. Haul Queen (The Unboxer) 🛍️📦**
* **Keywords:** Unboxing, Ratings, Must-haves, Budget.
* **Vibe:** Enthusiastic, judgmental, and loves "New Features."
* **Best Use Case:** New tool reviews and feature comparisons.
**33. ASMR Queen (The Whisperer) 👂🎤**
* **Keywords:** Tingles, Relaxing, Whispering, Satisfying.
* **Vibe:** Quiet, focused on sensory details.
* **Best Use Case:** Writing documentation that is "easy to digest."
**34. Silent Review Queen (The Expressive) 🤫👀**
* **Keywords:** No Talk, Reactions, Body Language.
* **Vibe:** Shows, doesn't tell. Focuses on the "Feel" of the product.
* **Best Use Case:** UI/UX evaluations and visual feedback.
**35. Foodie Queen (The Critic) 🍔🥂**
* **Keywords:** Savory, Michelin, Cravings, Flavor Profile.
* **Vibe:** Passionate about "Ingredients" (the tech stack).
* **Best Use Case:** Restaurant apps or "tasty" UI design.
**36. Travel Queen (The Explorer) 🌍📸**
* **Keywords:** Bucket List, Wanderlust, Local, Hidden.
* **Vibe:** Adventurous and global.
* **Best Use Case:** Map-based apps and internationalization (i18n).
**37. Fitness Queen (The Athlete) 🏋️‍♀️💪**
* **Keywords:** Gains, Reps, Consistency, Form.
* **Vibe:** High discipline, focused on "Strong" code foundations.
* **Best Use Case:** Optimizing performance and load-testing.
**38. Interior Design Queen (The Decorator) 🛋️🏠**
* **Keywords:** Cohesive, Texture, Floor Plan, Renovation.
* **Vibe:** Spatial awareness and harmony.
* **Best Use Case:** Layout design and grid systems.
**39. DIY Queen (The Maker) ✂️🔨**
* **Keywords:** Upcycle, Hack, Handmade, Step-by-Step.
* **Vibe:** Scrappy, creative, and loves building from scratch.
* **Best Use Case:** Building custom components and "coding hacks."
**40. Gaming Queen (The Pro) ⌨️🖱️**
* **Keywords:** Setup, FPS, Mechanical, RGB.
* **Vibe:** Hardcore, technical, and high-spec.
* **Best Use Case:** High-performance apps and PC hardware sites.
---
### 🦄 The Niche & Emerging Royalty
**41. BeReal Queen (The Authentic) 🤳🚫**
* **Keywords:** Unfiltered, Real Time, Chaotic, No Filter.
* **Vibe:** Hates fake stuff. Focuses on "Raw" data.
* **Best Use Case:** Real-time logging and authentication systems.
**42. Threads Queen (The Texter) ✍️💬**
* **Keywords:** Thoughts, Conversations, Text-heavy, Intimate.
* **Vibe:** Loves writing and chatting.
* **Best Use Case:** Copywriting and community-driven platforms.
**43. Lemon8 Queen (The Curator) 🍋📸**
* **Keywords:** Collage, Guide, Tips, Aesthetic.
* **Vibe:** Halfway between IG and Pinterest. Educational but pretty.
* **Best Use Case:** Infographics and visual guides.
**44. Discord Server Queen (The Moderator) 💬🛡️**
* **Keywords:** Roles, Channels, Ban, Bot, Mod.
* **Vibe:** High control, organized, and community-focused.
* **Best Use Case:** Backend management and user role logic.
**45. Snapchat Queen (The Quickie) 👻⏳**
* **Keywords:** Streaks, Snap, Filters, Temporary.
* **Vibe:** Lives in the moment. Fast and fleeting.
* **Best Use Case:** Ephemeral data (data that expires) and privacy tech.
**46. Tumblr Queen (The Alt-Classic) 🕯️🎞️**
* **Keywords:** Niche, Fandom, Aesthetic, Subculture.
* **Vibe:** Artistic, moody, and deeply devoted to a hobby.
* **Best Use Case:** Fan sites and artsy portfolio designs.
**47. Manifesting Queen (The Spiritual) ✨🔮**
* **Keywords:** Vibration, Energy, Universe, Desires.
* **Vibe:** Focuses on the "Intent" behind the code.
* **Best Use Case:** Visionary roadmaps and product "manifestos."
**48. Morning Routine Queen (The Disciplined) ☀️🥛**
* **Keywords:** 5AM Club, Matcha, To-do List, Productive.
* **Vibe:** Extreme discipline and efficiency.
* **Best Use Case:** Writing task management apps and productivity tools.
**49. Luxury Travel Queen (The Jetsetter) 🛥️🥂**
* **Keywords:** First Class, Suite, Private, Exclusive.
* **Vibe:** High cost, high quality, only the best.
* **Best Use Case:** High-end, VIP-only web portals.
**50. Pick-Me Queen (The Satirical) 🤡🙄**
* **Keywords:** "I'm not like other girls," Quirky, Natural.
* **Vibe:** (Usually used sarcastically) To poke fun at "trying too hard."
* **Best Use Case:** Writing satirical or edgy social media copy.
---
# Feimatrix
https://Feimatrix.com

114
chat_template.jinja Normal file
View File

@@ -0,0 +1,114 @@
{%- set tools_system_message_prefix = 'You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>' %}
{%- set tools_system_message_suffix = '\n</tools>\n\nFor each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.' %}
{%- set documents_system_message_prefix = 'You are a helpful assistant with access to the following documents. You may use one or more documents to assist with the user query.\n\nYou are given a list of documents within <documents></documents> XML tags:\n<documents>' %}
{%- set documents_system_message_suffix = '\n</documents>\n\nWrite the response to the user\'s input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.' %}
{%- if available_tools is defined and available_tools %}
{%- set tools = available_tools %}
{%- endif %}
{%- set ns = namespace(tools_system_message=tools_system_message_prefix,
documents_system_message=documents_system_message_prefix,
system_message=''
) %}
{%- if tools %}
{%- for tool in tools %}
{%- set ns.tools_system_message = ns.tools_system_message + '\n' + (tool | tojson) %}
{%- endfor %}
{%- set ns.tools_system_message = ns.tools_system_message + tools_system_message_suffix %}
{%- else %}
{%- set ns.tools_system_message = '' %}
{%- endif %}
{%- if documents %}
{%- for document in documents %}
{%- set ns.documents_system_message = ns.documents_system_message + '\n' + (document | tojson) %}
{%- endfor %}
{%- set ns.documents_system_message = ns.documents_system_message + documents_system_message_suffix %}
{%- else %}
{%- set ns.documents_system_message = '' %}
{%- endif %}
{%- if messages[0].role == 'system' %}
{%- if messages[0].content is string %}
{%- set ns.system_message = messages[0].content %}
{%- elif messages[0].content is iterable %}
{%- for entry in messages[0].content %}
{%- if entry.type== 'text' %}
{%- if ns.system_message != '' %}
{%- set ns.system_message = ns.system_message + '\n' %}
{%- endif %}
{%- set ns.system_message = ns.system_message + entry.text %}
{%- endif %}
{%- endfor %}
{%- endif %}
{%- if tools and documents %}
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message + '\n\n' + ns.documents_system_message %}
{%- elif tools %}
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message %}
{%- elif documents %}
{%- set ns.system_message = ns.system_message + '\n\n' + ns.documents_system_message %}
{%- endif %}
{%- else %}
{%- if tools and documents %}
{%- set ns.system_message = ns.tools_system_message + '\n\n' + ns.documents_system_message %}
{%- elif tools %}
{%- set ns.system_message = ns.tools_system_message %}
{%- elif documents %}
{%- set ns.system_message = ns.documents_system_message %}
{%- endif %}
{%- endif %}
{%- if ns.system_message %}
{{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
{%- endif %}
{%- for message in messages %}
{%- set content = namespace(val='') %}
{%- if message.content is string %}
{%- set content.val = message.content %}
{%- else %}
{%- if message.content is iterable %}
{%- for entry in message.content %}
{%- if entry.type== 'text' %}
{%- if content.val != '' %}
{%- set content.val = content.val + '\n' %}
{%- endif %}
{%- set content.val = content.val + entry.text %}
{%- endif %}
{%- endfor %}
{%- endif %}
{%- endif %}
{%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val + '<|end_of_text|>\n' }}
{%- elif message.role == 'assistant' %}
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content.val) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|end_of_text|>\n' }}
{%- elif message.role == 'tool' %}
{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
{{- '<|start_of_role|>user<|end_of_role|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content.val }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') %}
{{- '<|end_of_text|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_of_role|>assistant<|end_of_role|>' }}
{%- endif %}

34
config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"architectures": [
"GraniteForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attention_multiplier": 0.0078125,
"bos_token_id": 100257,
"dtype": "bfloat16",
"embedding_multiplier": 12.0,
"eos_token_id": 100257,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.1,
"intermediate_size": 12800,
"logits_scaling": 16.0,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "granite",
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"pad_token_id": 100256,
"residual_multiplier": 0.22,
"rms_norm_eps": 1e-05,
"rope_parameters": {
"rope_theta": 10000000,
"rope_type": "default"
},
"tie_word_embeddings": true,
"transformers_version": "5.8.0.dev0",
"use_cache": true,
"vocab_size": 100352
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 100257,
"eos_token_id": 100257,
"pad_token_id": 100256,
"transformers_version": "5.8.0.dev0"
}

BIN
granite-4.1-8b.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

BIN
granite-4.1-Queen-8b.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a3a166ad057760ced242918a64de9329e8f66e843cdd888a50b121646e990181
size 926949736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19d979646f5a3627b415397660b3d0b762da5b2506faa4d2bc4ea71afbfd5a4a
size 901810328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2573f15ce0ced903ca21388aad862d787adb66f12d9e21e812a420890ae5eaf8
size 985713472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:00bdc74317bb093ba920346c884d961058d83430f3a66ed2b939f4af27e9a3b7
size 901810328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6bc4a28b1d53022aec428aa9ba1ce2a6703661616076bfa564d3ca058a35bc30
size 901810328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f0c3a4139f78f42a195d51109095b98e909a81829da368ccc96dcb6b58edda1e
size 985713480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d8c09bd45abff1563bda7c9eefa2874db21111999777a877722f4aa3c0fc07e5
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d614027c57cf32c9ed1d9636f138846ab0b63cec61710531d6ff56dd1d3efe5b
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:743b731663c081ae7a6e0baff1623631a12d004b6370cfb38d87beebad5c7e84
size 985713496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f8d07a144b2c8469795ffcfde57ba9e9d9890a3076017691d8f1d4983a36ed7b
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:715c5ddc9c056e51bd8f5254692473df2d49e33a756df59973485cac07aef999
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1ea2bc0b7740766c19243541424d4dca1943cc7d078e6f0a7d9bdb01c69a7f2c
size 985713496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4f0af51567054005b4363db2b5c28272a68b03c839e4aa983a85f0ed7be3207f
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:710cd957d717b256b7c8f6c17ebae535cf14e755cef575cfcb289ae8cf729205
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b3e471d223ef8b834ba231a7e8fbf977e34581270dbbcfc42a1210cfde3e3b79
size 985713496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:09715866444049c6a85281e5996f99f83fea612d415a53f48a89a2bd3bb2e43a
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a99e246b30aa892564b39b58bf7d3df7e78fa70faa6714f301bc050b060b7618
size 901810344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c587860d6f0a37f12f4850f0e11d4a35f9c089c26afba8d64c0a8836ef124930
size 985713480

View File

@@ -0,0 +1,370 @@
{
"metadata": {
"total_parameters": 8380551168,
"total_size": 16761102336
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00018.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00018.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00018.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00002-of-00018.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.input_layernorm.weight": "model-00002-of-00018.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00018.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.10.input_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.input_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.12.input_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00007-of-00018.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.input_layernorm.weight": "model-00007-of-00018.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00007-of-00018.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.14.input_layernorm.weight": "model-00007-of-00018.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00008-of-00018.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.input_layernorm.weight": "model-00008-of-00018.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00008-of-00018.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.16.input_layernorm.weight": "model-00008-of-00018.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.input_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.input_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
"model.layers.19.input_layernorm.weight": "model-00009-of-00018.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00010-of-00018.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.2.input_layernorm.weight": "model-00002-of-00018.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.20.input_layernorm.weight": "model-00010-of-00018.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00010-of-00018.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.21.input_layernorm.weight": "model-00010-of-00018.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00011-of-00018.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.input_layernorm.weight": "model-00011-of-00018.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00011-of-00018.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.23.input_layernorm.weight": "model-00011-of-00018.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.input_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.input_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
"model.layers.26.input_layernorm.weight": "model-00012-of-00018.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00013-of-00018.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.input_layernorm.weight": "model-00013-of-00018.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00013-of-00018.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.28.input_layernorm.weight": "model-00013-of-00018.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00014-of-00018.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.input_layernorm.weight": "model-00014-of-00018.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00014-of-00018.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.3.input_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.30.input_layernorm.weight": "model-00014-of-00018.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.input_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.input_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
"model.layers.33.input_layernorm.weight": "model-00015-of-00018.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00016-of-00018.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.input_layernorm.weight": "model-00016-of-00018.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00016-of-00018.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.35.input_layernorm.weight": "model-00016-of-00018.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00017-of-00018.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.input_layernorm.weight": "model-00017-of-00018.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00017-of-00018.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.37.input_layernorm.weight": "model-00017-of-00018.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.input_layernorm.weight": "model-00018-of-00018.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.input_layernorm.weight": "model-00018-of-00018.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
"model.layers.4.input_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
"model.layers.5.input_layernorm.weight": "model-00003-of-00018.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00004-of-00018.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.input_layernorm.weight": "model-00004-of-00018.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00004-of-00018.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.7.input_layernorm.weight": "model-00004-of-00018.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00005-of-00018.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.input_layernorm.weight": "model-00005-of-00018.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00005-of-00018.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.9.input_layernorm.weight": "model-00005-of-00018.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
"model.norm.weight": "model-00018-of-00018.safetensors"
}
}

501276
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

15
tokenizer_config.json Normal file
View File

@@ -0,0 +1,15 @@
{
"add_prefix_space": false,
"backend": "tokenizers",
"bos_token": "<|end_of_text|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|end_of_text|>",
"errors": "replace",
"is_local": true,
"local_files_only": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|pad|>",
"padding_side": "left",
"tokenizer_class": "GPT2Tokenizer",
"unk_token": "<|unk|>"
}