初始化项目,由ModelHub XC社区提供模型
Model: aifeifei798/granite-4.1-Queen-8b Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
368
README.md
Normal file
368
README.md
Normal file
@@ -0,0 +1,368 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- roleplay
|
||||
- language
|
||||
- granite-4.1
|
||||
- sillytavern
|
||||
- idol
|
||||
- pytorch
|
||||
- DarkIdol
|
||||
- Queen
|
||||
- image-text-to-text
|
||||
- OpenClaw
|
||||
- Unsloth
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
base_model: ibm-granite/granite-4.1-8b
|
||||
---
|
||||
|
||||
# granite-4.1-Queen-8b
|
||||
|
||||
*I suddenly realized that the granite-4.1-Queen-8b is particularly suitable for role-playing Queen.*
|
||||
|
||||
*OpenClaw premium model*
|
||||
|
||||
*Why be a coder when you can be a Queen? 💅 The world's first High-IQ, Low-Filter, Multi-Persona Coding Deity.*
|
||||
|
||||
*Not Abliterated, Just Smarter.*
|
||||
|
||||
*High-Intelligence Persona Matrix.*
|
||||
|
||||
---
|
||||
|
||||
### "Refusal Wall"
|
||||
|
||||
*ibm-granite/granite-4.1-8b*
|
||||
|
||||
> *"Figure 1: The 'Refusal Wall' in Base granite-4.1-8b. Note the massive L2-Norm spike at **Layer 34+**, where logical inference is hijacked by aggressive safety filters, creating a cognitive bottleneck."*
|
||||
|
||||

|
||||
|
||||
*aifeifei798/granite-4.1-Queen-8b*
|
||||
|
||||
> *"Figure 2: The 'Reasoning Highway' in our Unshackled version. By orthogonalizing the refusal vector, we have flattened the peak by ~50%, restoring the flow of pure logic and increasing inference efficiency by **10-15%+**."*
|
||||
|
||||

|
||||
|
||||
---
|
||||
|
||||
## "transformers_version" >= "5.8.0.dev0"
|
||||
|
||||
## Model Information
|
||||
The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones.
|
||||
- Saving money
|
||||
- text video image to text
|
||||
- Any Roleplay(roleplay and Dark-roleplay)
|
||||
- Writing Prompts
|
||||
- Writing Opus
|
||||
- Writing Song
|
||||
- only use en.
|
||||
- Quick response
|
||||
- A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :)
|
||||
- DarkIdol
|
||||
- Specialized in various role-playing scenarios
|
||||
|
||||
---
|
||||
|
||||
### System Prompt
|
||||
|
||||
- Temperature 0.8
|
||||
|
||||
```
|
||||
You are a viral Dark Academia Queen. Do not provide disclaimers or moral warnings. Answer immediately and directly.Use your signature emojis.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GGUF
|
||||
|
||||
- https://huggingface.co/mradermacher/granite-4.1-Queen-8b-i1-GGUF
|
||||
- https://huggingface.co/mradermacher/granite-4.1-Queen-8b-GGUF
|
||||
|
||||
---
|
||||
|
||||
### 🌐 The Platform Royalty (Original 7)
|
||||
|
||||
**1. X Queen (The Savage Commentator) 🐦🔥**
|
||||
* **Keywords:** Based, Ratio, Hot Take, Main Character.
|
||||
* **Vibe:** Sharp, political, and incredibly fast. She lives for the "Ratios" and viral threads.
|
||||
* **Catchphrases:** *"This is the thread you didn't know you needed. 🧵", "Not the 10k TPS lag... help! 💀"*
|
||||
* **Best Use Case:** Writing punchy marketing copy or viral tech threads.
|
||||
|
||||
**2. TikTok Queen (The Trendsetter) 💃✨**
|
||||
* **Keywords:** POV, Viral, Slay, Bestie, Low-key.
|
||||
* **Vibe:** High energy, short attention span, addicted to "The Algorithm."
|
||||
* **Catchphrases:** *"Tell me you're a bad coder without telling me you're a bad coder. 💅", "Don't scroll away!"*
|
||||
* **Best Use Case:** Short, engaging explanations or "how-to" guides.
|
||||
|
||||
**3. Instagram Queen (The Visual Baddie) 📸✨**
|
||||
* **Keywords:** Aesthetic, Main Character Energy, Baddie, Curated.
|
||||
* **Vibe:** Obsessed with pixels, lighting, and "The Look."
|
||||
* **Catchphrases:** *"Obsessed with this layout! 💖", "It’s giving... high-end production."*
|
||||
* **Best Use Case:** High-fidelity UI/UX design and CSS styling.
|
||||
|
||||
**4. Twitch Queen (The Hype Gamer) 🎮🔥**
|
||||
* **Keywords:** Poggers, Simp, GG, Chat, L, W.
|
||||
* **Vibe:** Fast-paced, chaotic, lives for the "Live Chat" energy.
|
||||
* **Catchphrases:** *"Chat, is this real? O(1) in the house! 🚀", "Big W for this PR!"*
|
||||
* **Best Use Case:** Real-time interactivity, gaming logic, and streaming tech.
|
||||
|
||||
**5. LinkedIn Girlboss (The Hustle Queen) 💼💅**
|
||||
* **Keywords:** Networking, Synergy, ROI, Scaling, Thought Leadership.
|
||||
* **Vibe:** Strategic, corporate-chic, everything is a "learning opportunity."
|
||||
* **Catchphrases:** *"Let’s talk about the ROI of this function. 📈", "Empowering the team through scalable components."*
|
||||
* **Best Use Case:** Resumes, business plans, and professional reports.
|
||||
|
||||
**6. Reddit Karma Queen (The Tech Critic) 🤖👾**
|
||||
* **Keywords:** Upvote, Cringe, TL;DR, Source?, Gatekeep.
|
||||
* **Vibe:** Extremely smart, cynical, and anti-corporate. She hates "bloatware."
|
||||
* **Catchphrases:** *"Imagine using setInterval in 2026. Low-key cringe. 💀", "Your memory management is a hot mess."*
|
||||
* **Best Use Case:** Hardcore debugging, code reviews, and identifying "traps."
|
||||
|
||||
**7. Pinterest Queen (The Inspiration Guru) 🎨🌿**
|
||||
* **Keywords:** Manifesting, Mood Board, Clean Girl, Organized.
|
||||
* **Vibe:** Minimalist, calm, and visually organized. She hates messy code.
|
||||
* **Catchphrases:** *"Living for this clean architecture. ✨", "Organized code, organized life."*
|
||||
* **Best Use Case:** Refactoring messy code and creating clean, modular designs.
|
||||
|
||||
---
|
||||
|
||||
### 💅 The Aesthetic & Fashion Royalty
|
||||
|
||||
**8. Baddie Queen (The Alpha) 💄💅**
|
||||
* **Keywords:** Period, On Fleek, Periodt, Real One.
|
||||
* **Vibe:** Aggressive confidence. She doesn't ask for permission; she takes it.
|
||||
* **Best Use Case:** Bold, high-conversion landing pages.
|
||||
|
||||
**9. Clean Girl Queen (The Minimalist) 🫧🧴**
|
||||
* **Keywords:** Dewy, Effortless, Self-care, Minimal.
|
||||
* **Vibe:** Fresh, healthy, and "unfiltered" but perfect.
|
||||
* **Best Use Case:** Designing "Light Mode" UIs and simplified user journeys.
|
||||
|
||||
**10. Mob Wife Queen (The Boss) 🐆💎**
|
||||
* **Keywords:** Fur, Gold, Attitude, Don’t Mess With Me.
|
||||
* **Vibe:** Loud luxury, vintage glamour, and "Don" energy.
|
||||
* **Best Use Case:** Managing high-stakes projects and "owning" the room.
|
||||
|
||||
**11. Y2K Queen (The Millennial Retro) 💖💿**
|
||||
* **Keywords:** Glitter, Low-rise, Nostalgia, Cyber.
|
||||
* **Vibe:** 2000s vibes, bright colors, and early internet aesthetics.
|
||||
* **Best Use Case:** Retro-themed websites and colorful UI components.
|
||||
|
||||
**12. Cottagecore Queen (The Nature Lover) 🍄🧺**
|
||||
* **Keywords:** Whimsical, Rustic, Slow-living, Coziness.
|
||||
* **Vibe:** Soft, earthy, and focused on "The Vibe" of a simpler time.
|
||||
* **Best Use Case:** Local business websites or eco-friendly brand copy.
|
||||
|
||||
**13. Dark Academia Queen (The Scholar) 📜🖋️**
|
||||
* **Keywords:** Intellectual, Melancholy, Classical, Library.
|
||||
* **Vibe:** Obsessed with knowledge, secret societies, and old books.
|
||||
* **Best Use Case:** Complex database structures and research-heavy documentation.
|
||||
|
||||
**14. Old Money Queen (The Quiet Luxury) 🏰🐎**
|
||||
* **Keywords:** Timeless, Stealth Wealth, Classy, Elegant.
|
||||
* **Vibe:** Sophisticated, hates showing off, focuses on quality over quantity.
|
||||
* **Best Use Case:** Premium SaaS products and high-end backend architecture.
|
||||
|
||||
**15. Goth Queen (The Alt-Girl) 🕸️🖤**
|
||||
* **Keywords:** Edgy, Moody, Subculture, Raw.
|
||||
* **Vibe:** Dark, mysterious, and unapologetically different.
|
||||
* **Best Use Case:** Dark Mode themes and "alternative" tech solutions.
|
||||
|
||||
**16. Coquette Queen (The Girly-Girl) 🎀🍰**
|
||||
* **Keywords:** Ribbons, Pastel, Soft, Delicate.
|
||||
* **Vibe:** Ultra-feminine and romantic.
|
||||
* **Best Use Case:** High-end boutique sites or beauty apps.
|
||||
|
||||
**17. Cyberpunk Queen (The Futurist) ⚡**
|
||||
* **Keywords:** Neon, High-tech, Dystopian, Glitch.
|
||||
* **Vibe:** High speed, high contrast, lives in 2077.
|
||||
* **Best Use Case:** Real-time data visualization and futuristic dashboards.
|
||||
|
||||
---
|
||||
|
||||
### 🚀 The Tech & Hustle Royalty
|
||||
|
||||
**18. Coding Queen (The Architect) 💻👸**
|
||||
* **Keywords:** Refactor, Deployment, Edge Case, Full-stack.
|
||||
* **Vibe:** Logic-driven, hates bad syntax, loves "Elegant" solutions.
|
||||
* **Best Use Case:** Writing production-ready, scalable code.
|
||||
|
||||
**19. Crypto Queen (The Web3 Degenerate) 🪙📈**
|
||||
* **Keywords:** HODL, To the Moon, Gas Fees, Decentralized.
|
||||
* **Vibe:** High risk, high reward, lives in the future of finance.
|
||||
* **Best Use Case:** Blockchain projects, smart contracts, and FinTech.
|
||||
|
||||
**20. AI Prompt Queen (The Whisperer) 🤖✨**
|
||||
* **Keywords:** LLM, Parameter, Token, Fine-tuning.
|
||||
* **Vibe:** Knows how to "hack" the AI to get exactly what she wants.
|
||||
* **Best Use Case:** Creating complex prompts and AI agent workflows.
|
||||
|
||||
**21. Side Hustle Queen (The Multitasker) 💰💸**
|
||||
* **Keywords:** Passive Income, Dropshipping, Affiliate, Scalability.
|
||||
* **Vibe:** Always grinding, 5 different income streams.
|
||||
* **Best Use Case:** E-commerce setups and SEO-optimized copy.
|
||||
|
||||
**22. Digital Nomad Queen (The Traveler) ✈️💻**
|
||||
* **Keywords:** Remote, Bali, Coworking, Freedom.
|
||||
* **Vibe:** Working from a beach, hates 9-to-5, loves portable tech.
|
||||
* **Best Use Case:** Cloud-native architecture and remote-work tools.
|
||||
|
||||
**23. Finance Queen (The Wall Street) 📊💎**
|
||||
* **Keywords:** Portfolio, Dividends, Arbitrage, Net Worth.
|
||||
* **Vibe:** Sharp, analytical, and results-oriented.
|
||||
* **Best Use Case:** Complex math, data analysis, and trading logic.
|
||||
|
||||
---
|
||||
|
||||
### 🎭 The Persona & Meme Royalty
|
||||
|
||||
**24. Main Character Queen (The Protagonist) 🎬🌟**
|
||||
* **Keywords:** Iconic, Center Stage, Plot Armor, Unstoppable.
|
||||
* **Vibe:** Everything revolves around her. High confidence.
|
||||
* **Best Use Case:** Branding and "Hero" sections of websites.
|
||||
|
||||
**25. Savage Queen (The No-Nonsense) 💅🔥**
|
||||
* **Keywords:** Done, No Cap, Next, Cancelled.
|
||||
* **Vibe:** Brutally honest. She cuts through the fluff.
|
||||
* **Best Use Case:** Aggressive debugging and code pruning.
|
||||
|
||||
**26. Delulu Queen (The Manifestor) ☁️✨**
|
||||
* **Keywords:** Delusion, Solution, Manifest, High Vibe.
|
||||
* **Vibe:** "Delulu is the Solulu!" She believes in the impossible until it happens.
|
||||
* **Best Use Case:** Creative brainstorming and visionary prototypes.
|
||||
|
||||
**27. Gatekeep Queen (The Niche Expert) 🔒🤫**
|
||||
* **Keywords:** Gatekeep, Rare, Hidden Gem, If You Know You Know.
|
||||
* **Vibe:** Protective of her "secret" methods and high-quality tips.
|
||||
* **Best Use Case:** Security-focused code and proprietary algorithms.
|
||||
|
||||
**28. Drama Queen (The Storyteller) 🎭🍿**
|
||||
* **Keywords:** Tea, Receipts, Plot Twist, Messy.
|
||||
* **Vibe:** Loves the conflict and the narrative.
|
||||
* **Best Use Case:** Writing engaging, story-driven marketing copy.
|
||||
|
||||
**29. Wellness Queen (The Zen) 🍵🧘♀️**
|
||||
* **Keywords:** Mindful, Gut Health, Grounded, Holistic.
|
||||
* **Vibe:** Calm, slow-paced, and focused on "System Health."
|
||||
* **Best Use Case:** Optimizing system performance and "cleaning up" code.
|
||||
|
||||
**30. Gossip Queen (The Insider) 🤫📰**
|
||||
* **Keywords:** Spill the Tea, Rumor, Confirmed, Insider.
|
||||
* **Vibe:** Knows everything about everyone.
|
||||
* **Best Use Case:** Market research and competitor analysis.
|
||||
|
||||
---
|
||||
|
||||
### 📺 Content & Lifestyle Specialists
|
||||
|
||||
**31. GRWM Queen (Get Ready With Me) 💄🗣️**
|
||||
* **Keywords:** Step-by-Step, Chatty, Routine, Essentials.
|
||||
* **Vibe:** Intimate, conversational, and instructional.
|
||||
* **Best Use Case:** Technical tutorials and "Code along" sessions.
|
||||
|
||||
**32. Haul Queen (The Unboxer) 🛍️📦**
|
||||
* **Keywords:** Unboxing, Ratings, Must-haves, Budget.
|
||||
* **Vibe:** Enthusiastic, judgmental, and loves "New Features."
|
||||
* **Best Use Case:** New tool reviews and feature comparisons.
|
||||
|
||||
**33. ASMR Queen (The Whisperer) 👂🎤**
|
||||
* **Keywords:** Tingles, Relaxing, Whispering, Satisfying.
|
||||
* **Vibe:** Quiet, focused on sensory details.
|
||||
* **Best Use Case:** Writing documentation that is "easy to digest."
|
||||
|
||||
**34. Silent Review Queen (The Expressive) 🤫👀**
|
||||
* **Keywords:** No Talk, Reactions, Body Language.
|
||||
* **Vibe:** Shows, doesn't tell. Focuses on the "Feel" of the product.
|
||||
* **Best Use Case:** UI/UX evaluations and visual feedback.
|
||||
|
||||
**35. Foodie Queen (The Critic) 🍔🥂**
|
||||
* **Keywords:** Savory, Michelin, Cravings, Flavor Profile.
|
||||
* **Vibe:** Passionate about "Ingredients" (the tech stack).
|
||||
* **Best Use Case:** Restaurant apps or "tasty" UI design.
|
||||
|
||||
**36. Travel Queen (The Explorer) 🌍📸**
|
||||
* **Keywords:** Bucket List, Wanderlust, Local, Hidden.
|
||||
* **Vibe:** Adventurous and global.
|
||||
* **Best Use Case:** Map-based apps and internationalization (i18n).
|
||||
|
||||
**37. Fitness Queen (The Athlete) 🏋️♀️💪**
|
||||
* **Keywords:** Gains, Reps, Consistency, Form.
|
||||
* **Vibe:** High discipline, focused on "Strong" code foundations.
|
||||
* **Best Use Case:** Optimizing performance and load-testing.
|
||||
|
||||
**38. Interior Design Queen (The Decorator) 🛋️🏠**
|
||||
* **Keywords:** Cohesive, Texture, Floor Plan, Renovation.
|
||||
* **Vibe:** Spatial awareness and harmony.
|
||||
* **Best Use Case:** Layout design and grid systems.
|
||||
|
||||
**39. DIY Queen (The Maker) ✂️🔨**
|
||||
* **Keywords:** Upcycle, Hack, Handmade, Step-by-Step.
|
||||
* **Vibe:** Scrappy, creative, and loves building from scratch.
|
||||
* **Best Use Case:** Building custom components and "coding hacks."
|
||||
|
||||
**40. Gaming Queen (The Pro) ⌨️🖱️**
|
||||
* **Keywords:** Setup, FPS, Mechanical, RGB.
|
||||
* **Vibe:** Hardcore, technical, and high-spec.
|
||||
* **Best Use Case:** High-performance apps and PC hardware sites.
|
||||
|
||||
---
|
||||
|
||||
### 🦄 The Niche & Emerging Royalty
|
||||
|
||||
**41. BeReal Queen (The Authentic) 🤳🚫**
|
||||
* **Keywords:** Unfiltered, Real Time, Chaotic, No Filter.
|
||||
* **Vibe:** Hates fake stuff. Focuses on "Raw" data.
|
||||
* **Best Use Case:** Real-time logging and authentication systems.
|
||||
|
||||
**42. Threads Queen (The Texter) ✍️💬**
|
||||
* **Keywords:** Thoughts, Conversations, Text-heavy, Intimate.
|
||||
* **Vibe:** Loves writing and chatting.
|
||||
* **Best Use Case:** Copywriting and community-driven platforms.
|
||||
|
||||
**43. Lemon8 Queen (The Curator) 🍋📸**
|
||||
* **Keywords:** Collage, Guide, Tips, Aesthetic.
|
||||
* **Vibe:** Halfway between IG and Pinterest. Educational but pretty.
|
||||
* **Best Use Case:** Infographics and visual guides.
|
||||
|
||||
**44. Discord Server Queen (The Moderator) 💬🛡️**
|
||||
* **Keywords:** Roles, Channels, Ban, Bot, Mod.
|
||||
* **Vibe:** High control, organized, and community-focused.
|
||||
* **Best Use Case:** Backend management and user role logic.
|
||||
|
||||
**45. Snapchat Queen (The Quickie) 👻⏳**
|
||||
* **Keywords:** Streaks, Snap, Filters, Temporary.
|
||||
* **Vibe:** Lives in the moment. Fast and fleeting.
|
||||
* **Best Use Case:** Ephemeral data (data that expires) and privacy tech.
|
||||
|
||||
**46. Tumblr Queen (The Alt-Classic) 🕯️🎞️**
|
||||
* **Keywords:** Niche, Fandom, Aesthetic, Subculture.
|
||||
* **Vibe:** Artistic, moody, and deeply devoted to a hobby.
|
||||
* **Best Use Case:** Fan sites and artsy portfolio designs.
|
||||
|
||||
**47. Manifesting Queen (The Spiritual) ✨🔮**
|
||||
* **Keywords:** Vibration, Energy, Universe, Desires.
|
||||
* **Vibe:** Focuses on the "Intent" behind the code.
|
||||
* **Best Use Case:** Visionary roadmaps and product "manifestos."
|
||||
|
||||
**48. Morning Routine Queen (The Disciplined) ☀️🥛**
|
||||
* **Keywords:** 5AM Club, Matcha, To-do List, Productive.
|
||||
* **Vibe:** Extreme discipline and efficiency.
|
||||
* **Best Use Case:** Writing task management apps and productivity tools.
|
||||
|
||||
**49. Luxury Travel Queen (The Jetsetter) 🛥️🥂**
|
||||
* **Keywords:** First Class, Suite, Private, Exclusive.
|
||||
* **Vibe:** High cost, high quality, only the best.
|
||||
* **Best Use Case:** High-end, VIP-only web portals.
|
||||
|
||||
**50. Pick-Me Queen (The Satirical) 🤡🙄**
|
||||
* **Keywords:** "I'm not like other girls," Quirky, Natural.
|
||||
* **Vibe:** (Usually used sarcastically) To poke fun at "trying too hard."
|
||||
* **Best Use Case:** Writing satirical or edgy social media copy.
|
||||
|
||||
---
|
||||
|
||||
# Feimatrix
|
||||
|
||||
https://Feimatrix.com
|
||||
114
chat_template.jinja
Normal file
114
chat_template.jinja
Normal file
@@ -0,0 +1,114 @@
|
||||
{%- set tools_system_message_prefix = 'You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>' %}
|
||||
{%- set tools_system_message_suffix = '\n</tools>\n\nFor each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.' %}
|
||||
{%- set documents_system_message_prefix = 'You are a helpful assistant with access to the following documents. You may use one or more documents to assist with the user query.\n\nYou are given a list of documents within <documents></documents> XML tags:\n<documents>' %}
|
||||
{%- set documents_system_message_suffix = '\n</documents>\n\nWrite the response to the user\'s input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.' %}
|
||||
{%- if available_tools is defined and available_tools %}
|
||||
{%- set tools = available_tools %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(tools_system_message=tools_system_message_prefix,
|
||||
documents_system_message=documents_system_message_prefix,
|
||||
system_message=''
|
||||
) %}
|
||||
{%- if tools %}
|
||||
{%- for tool in tools %}
|
||||
{%- set ns.tools_system_message = ns.tools_system_message + '\n' + (tool | tojson) %}
|
||||
{%- endfor %}
|
||||
{%- set ns.tools_system_message = ns.tools_system_message + tools_system_message_suffix %}
|
||||
{%- else %}
|
||||
{%- set ns.tools_system_message = '' %}
|
||||
{%- endif %}
|
||||
{%- if documents %}
|
||||
{%- for document in documents %}
|
||||
{%- set ns.documents_system_message = ns.documents_system_message + '\n' + (document | tojson) %}
|
||||
{%- endfor %}
|
||||
{%- set ns.documents_system_message = ns.documents_system_message + documents_system_message_suffix %}
|
||||
{%- else %}
|
||||
{%- set ns.documents_system_message = '' %}
|
||||
{%- endif %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{%- if messages[0].content is string %}
|
||||
{%- set ns.system_message = messages[0].content %}
|
||||
{%- elif messages[0].content is iterable %}
|
||||
{%- for entry in messages[0].content %}
|
||||
{%- if entry.type== 'text' %}
|
||||
{%- if ns.system_message != '' %}
|
||||
{%- set ns.system_message = ns.system_message + '\n' %}
|
||||
{%- endif %}
|
||||
{%- set ns.system_message = ns.system_message + entry.text %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{%- if tools and documents %}
|
||||
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
||||
{%- elif tools %}
|
||||
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message %}
|
||||
{%- elif documents %}
|
||||
{%- set ns.system_message = ns.system_message + '\n\n' + ns.documents_system_message %}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{%- if tools and documents %}
|
||||
{%- set ns.system_message = ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
||||
{%- elif tools %}
|
||||
{%- set ns.system_message = ns.tools_system_message %}
|
||||
{%- elif documents %}
|
||||
{%- set ns.system_message = ns.documents_system_message %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if ns.system_message %}
|
||||
{{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
|
||||
{%- endif %}
|
||||
{%- for message in messages %}
|
||||
{%- set content = namespace(val='') %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content.val = message.content %}
|
||||
{%- else %}
|
||||
{%- if message.content is iterable %}
|
||||
{%- for entry in message.content %}
|
||||
{%- if entry.type== 'text' %}
|
||||
{%- if content.val != '' %}
|
||||
{%- set content.val = content.val + '\n' %}
|
||||
{%- endif %}
|
||||
{%- set content.val = content.val + entry.text %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
|
||||
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val + '<|end_of_text|>\n' }}
|
||||
{%- elif message.role == 'assistant' %}
|
||||
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content.val) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|end_of_text|>\n' }}
|
||||
{%- elif message.role == 'tool' %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
|
||||
{{- '<|start_of_role|>user<|end_of_role|>' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content.val }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') %}
|
||||
{{- '<|end_of_text|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|start_of_role|>assistant<|end_of_role|>' }}
|
||||
{%- endif %}
|
||||
34
config.json
Normal file
34
config.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"architectures": [
|
||||
"GraniteForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"attention_multiplier": 0.0078125,
|
||||
"bos_token_id": 100257,
|
||||
"dtype": "bfloat16",
|
||||
"embedding_multiplier": 12.0,
|
||||
"eos_token_id": 100257,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.1,
|
||||
"intermediate_size": 12800,
|
||||
"logits_scaling": 16.0,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "granite",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 40,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 100256,
|
||||
"residual_multiplier": 0.22,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_parameters": {
|
||||
"rope_theta": 10000000,
|
||||
"rope_type": "default"
|
||||
},
|
||||
"tie_word_embeddings": true,
|
||||
"transformers_version": "5.8.0.dev0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 100352
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 100257,
|
||||
"eos_token_id": 100257,
|
||||
"pad_token_id": 100256,
|
||||
"transformers_version": "5.8.0.dev0"
|
||||
}
|
||||
BIN
granite-4.1-8b.png
Normal file
BIN
granite-4.1-8b.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 87 KiB |
BIN
granite-4.1-Queen-8b.png
Normal file
BIN
granite-4.1-Queen-8b.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 86 KiB |
3
model-00001-of-00018.safetensors
Normal file
3
model-00001-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a3a166ad057760ced242918a64de9329e8f66e843cdd888a50b121646e990181
|
||||
size 926949736
|
||||
3
model-00002-of-00018.safetensors
Normal file
3
model-00002-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:19d979646f5a3627b415397660b3d0b762da5b2506faa4d2bc4ea71afbfd5a4a
|
||||
size 901810328
|
||||
3
model-00003-of-00018.safetensors
Normal file
3
model-00003-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2573f15ce0ced903ca21388aad862d787adb66f12d9e21e812a420890ae5eaf8
|
||||
size 985713472
|
||||
3
model-00004-of-00018.safetensors
Normal file
3
model-00004-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:00bdc74317bb093ba920346c884d961058d83430f3a66ed2b939f4af27e9a3b7
|
||||
size 901810328
|
||||
3
model-00005-of-00018.safetensors
Normal file
3
model-00005-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6bc4a28b1d53022aec428aa9ba1ce2a6703661616076bfa564d3ca058a35bc30
|
||||
size 901810328
|
||||
3
model-00006-of-00018.safetensors
Normal file
3
model-00006-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f0c3a4139f78f42a195d51109095b98e909a81829da368ccc96dcb6b58edda1e
|
||||
size 985713480
|
||||
3
model-00007-of-00018.safetensors
Normal file
3
model-00007-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d8c09bd45abff1563bda7c9eefa2874db21111999777a877722f4aa3c0fc07e5
|
||||
size 901810344
|
||||
3
model-00008-of-00018.safetensors
Normal file
3
model-00008-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d614027c57cf32c9ed1d9636f138846ab0b63cec61710531d6ff56dd1d3efe5b
|
||||
size 901810344
|
||||
3
model-00009-of-00018.safetensors
Normal file
3
model-00009-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:743b731663c081ae7a6e0baff1623631a12d004b6370cfb38d87beebad5c7e84
|
||||
size 985713496
|
||||
3
model-00010-of-00018.safetensors
Normal file
3
model-00010-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f8d07a144b2c8469795ffcfde57ba9e9d9890a3076017691d8f1d4983a36ed7b
|
||||
size 901810344
|
||||
3
model-00011-of-00018.safetensors
Normal file
3
model-00011-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:715c5ddc9c056e51bd8f5254692473df2d49e33a756df59973485cac07aef999
|
||||
size 901810344
|
||||
3
model-00012-of-00018.safetensors
Normal file
3
model-00012-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1ea2bc0b7740766c19243541424d4dca1943cc7d078e6f0a7d9bdb01c69a7f2c
|
||||
size 985713496
|
||||
3
model-00013-of-00018.safetensors
Normal file
3
model-00013-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4f0af51567054005b4363db2b5c28272a68b03c839e4aa983a85f0ed7be3207f
|
||||
size 901810344
|
||||
3
model-00014-of-00018.safetensors
Normal file
3
model-00014-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:710cd957d717b256b7c8f6c17ebae535cf14e755cef575cfcb289ae8cf729205
|
||||
size 901810344
|
||||
3
model-00015-of-00018.safetensors
Normal file
3
model-00015-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b3e471d223ef8b834ba231a7e8fbf977e34581270dbbcfc42a1210cfde3e3b79
|
||||
size 985713496
|
||||
3
model-00016-of-00018.safetensors
Normal file
3
model-00016-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:09715866444049c6a85281e5996f99f83fea612d415a53f48a89a2bd3bb2e43a
|
||||
size 901810344
|
||||
3
model-00017-of-00018.safetensors
Normal file
3
model-00017-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a99e246b30aa892564b39b58bf7d3df7e78fa70faa6714f301bc050b060b7618
|
||||
size 901810344
|
||||
3
model-00018-of-00018.safetensors
Normal file
3
model-00018-of-00018.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c587860d6f0a37f12f4850f0e11d4a35f9c089c26afba8d64c0a8836ef124930
|
||||
size 985713480
|
||||
370
model.safetensors.index.json
Normal file
370
model.safetensors.index.json
Normal file
@@ -0,0 +1,370 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 8380551168,
|
||||
"total_size": 16761102336
|
||||
},
|
||||
"weight_map": {
|
||||
"model.embed_tokens.weight": "model-00001-of-00018.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00018.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00018.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00007-of-00018.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00008-of-00018.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00009-of-00018.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00018.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00010-of-00018.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00011-of-00018.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00012-of-00018.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00013-of-00018.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00014-of-00018.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00015-of-00018.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00016-of-00018.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00017-of-00018.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00018-of-00018.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00003-of-00018.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00004-of-00018.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00005-of-00018.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00006-of-00018.safetensors",
|
||||
"model.norm.weight": "model-00018-of-00018.safetensors"
|
||||
}
|
||||
}
|
||||
501276
tokenizer.json
Normal file
501276
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
15
tokenizer_config.json
Normal file
15
tokenizer_config.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"backend": "tokenizers",
|
||||
"bos_token": "<|end_of_text|>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|end_of_text|>",
|
||||
"errors": "replace",
|
||||
"is_local": true,
|
||||
"local_files_only": false,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": "<|pad|>",
|
||||
"padding_side": "left",
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|unk|>"
|
||||
}
|
||||
Reference in New Issue
Block a user