初始化项目,由ModelHub XC社区提供模型
Model: jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf Source: Original Platform
This commit is contained in:
40
.gitattributes
vendored
Normal file
40
.gitattributes
vendored
Normal file
@@ -0,0 +1,40 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
gpt-oss-20B-sonnet-flavored-v1-BF16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
hf/gpt-oss-20b-sonnet-flavored-v1/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
609
README.md
Normal file
609
README.md
Normal file
@@ -0,0 +1,609 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- sonnet-flavored
|
||||
- gpt-oss-20b
|
||||
- gpt-oss
|
||||
- moe
|
||||
- mxfp4
|
||||
- lora
|
||||
- fine-tune
|
||||
- sonnet-style
|
||||
- claude-style
|
||||
- distilled
|
||||
- gguf
|
||||
base_model:
|
||||
- openai/gpt-oss-20b
|
||||
language:
|
||||
- en
|
||||
---
|
||||
# gpt-oss-20b-sonnet-flavored-v1
|
||||
|
||||
A LoRA fine-tune of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) that transfers Claude Sonnet 4.6's response style to GPT-OSS-20B. Final-answer-only training; original weight reasoning and capabilities are maintained, original weight flaws too.
|
||||
|
||||
This repo contains both the merged HF safetensors (in `hf/`) and a GGUF ladder with MXFP4 experts preserved natively across all variants.
|
||||
|
||||
---
|
||||
|
||||
## What changed and what didn't
|
||||
|
||||
This LoRA targets **response style only** — phrasing, tone, structure, register. It does not target reasoning capability, factual knowledge, or numerical calibration. For factual or numerical tasks, the underlying GPT-OSS-20B's calibration limits apply: this model will produce confidently-wrong answers in the same situations the base model does, just phrased in Sonnet's voice. Style is not a substitute for capability.
|
||||
|
||||
The MoE experts (~19B of 20B parameters) are unchanged from base GPT-OSS-20B. The LoRA modified attention projections only, at rank 16. Reasoning happens through the same expert routing as the base model.
|
||||
|
||||
---
|
||||
|
||||
## Response Comparisons
|
||||
|
||||
<details>
|
||||
<summary><b>Prompt 1:</b> Coffee shop loyalty program critique</summary>
|
||||
|
||||
**Prompt:**
|
||||
|
||||
A city planner argues that because neighborhoods with an abundance of boutique coffee shops have lower crime rates, the city should heavily subsidize new coffee shops in high-crime areas to make them safer. What are the specific logical and statistical flaws in this policy proposal?
|
||||
|
||||
<details>
|
||||
<summary><b>Base GPT-OSS-20B:</b></summary>
|
||||
|
||||
Below is a concise but comprehensive checklist of the **logical** and **statistical** errors that make the proposal to “heavy‑subsidize boutique coffee shops in high‑crime neighborhoods so they become safer” scientifically weak at best and policy‑dangerous at worst.
|
||||
|
||||
---
|
||||
|
||||
## 1. Logical Flaws
|
||||
|
||||
| # | Flaw | Why it matters |
|
||||
|---|------|----------------|
|
||||
| **1** | **Correlation ≠ Causation** | The planner assumes that because coffee‑shop‑dense areas *correlate* with lower crime, the shops *cause* the drop. No evidence is presented that coffee shops drive crime reductions. |
|
||||
| **2** | **Reverse causality / self‑selection** | Business owners tend to open stores in places that are already safe, accessible and have enough disposable income. Lower crime may *enable* cafés, not the other way around. |
|
||||
| **3** | **Ecological fallacy** | The observation is made at the neighborhood level; it cannot be applied to individual neighborhoods or even to a single coffee shop’s effect on crime in its immediate vicinity. |
|
||||
| **4** | **Missing causal mechanism** | The argument gives no plausible path – e.g., “more foot traffic → more eyes on the street → fewer crimes” – and ignores counter‑effects (e.g., increased commercial activity may invite shoplifting or vandalism). |
|
||||
| **5** | **Oversimplification of crime dynamics** | Crime is driven by a bundle of factors (poverty, unemployment, policing, social networks, housing quality, etc.). A single intervention (coffee shops) cannot reasonably be expected to shift that equilibrium. |
|
||||
| **6** | **Neglecting opportunity cost** | Subsidizing cafés diverts public funds that might better reduce crime through proven measures (community policing, after‑school programs, affordable housing, mental‑health services). |
|
||||
| **7** | **Potentially anti‑equity** | Coffee‑shop subsidies may accelerate gentrification, raise rents, and displace the very residents the policy is supposed to protect. |
|
||||
| **8** | **No testable hypothesis** | The proposal does not specify an observable effect size, a target variable, or a way to evaluate success (e.g., crime statistics before/after, foot‑traffic counts). |
|
||||
|
||||
---
|
||||
|
||||
## 2. Statistical Flaws
|
||||
|
||||
| # | Flaw | Why it matters |
|
||||
|---|------|----------------|
|
||||
| **1** | **Omitted‑variable bias** | Socio‑economic status, police presence, education, and community cohesion are all correlated with both coffee‑shop density and crime. Ignoring them confounds the estimated relationship. |
|
||||
| **2** | **Multicollinearity** | Variables like median income, property values, and employment levels often co‑vary with coffee‑shop density. In a regression that includes only coffee‑shops, coefficients are unstable. |
|
||||
| **3** | **Ecological data quality** | Crime counts may be under‑reported in low‑policing areas; cafés are often “commercial clusters” that attract visitors, leading to *reporting bias* (more crimes recorded simply because more eyes are watching). |
|
||||
| **4** | **Small‑sample / aggregation bias** | Neighborhoods with few shops and few crimes yield noisy estimates. Aggregating across heterogeneous census tracts can mask local variations. |
|
||||
| **5** | **Spurious correlation** | Random chance or a lurking variable (e.g., historic districts) may produce a spurious link. Correlation coefficients lack context without a causal model. |
|
||||
| **6** | **Regression to the mean** | High‑crime neighborhoods tend to fluctuate; a temporary dip could be misattributed to coffee‑shop openings. |
|
||||
| **7** | **No causal inference design** | The data (presumably observational) is unsuited to estimate causality without instrumental variables, difference‑in‑differences, or randomized trials. |
|
||||
| **8** | **Endogeneity of subsidies** | If subsidies themselves influence where coffee shops open (e.g., subsidizing a neighborhood raises its attractiveness), then coffee‑shop density becomes endogenously determined by the policy you are evaluating – classic “simultaneity” problem. |
|
||||
|
||||
---
|
||||
|
||||
## 3. Practical Policy Issues
|
||||
|
||||
1. **Implementation barriers** – High‑crime neighborhoods often lack the foot traffic, infrastructure, or supply chain needed for a profitable café; subsidies alone may not generate business.
|
||||
2. **Accountability & measurement** – How will “safety” be measured? Crime indices are noisy; alternative metrics (perceived safety, community cohesion) may be harder to quantify.
|
||||
3. **Unintended outcomes** – More cafés could increase traffic congestion, noise, and risk of small‑scale shoplifting or vandalism, potentially offsetting any benefit.
|
||||
4. **Gentrification risk** – Café openings often precede rising rents; displaced residents may move to poorer neighborhoods, possibly spreading crime rather than containing it.
|
||||
5. **Legal & regulatory hurdles** – Zoning, licensing, and permitting processes can delay or thwart even a subsidized venture.
|
||||
|
||||
---
|
||||
|
||||
## 4. A Better Evidence‑Based Approach
|
||||
|
||||
| Step | What to Do | Why |
|
||||
|------|------------|-----|
|
||||
| **1** | **Collect high‑resolution data** on crime, foot traffic, income, policing, and business density at the block or establishment level. | Avoids ecological fallacy; captures the true environment surrounding a coffee shop. |
|
||||
| **2** | **Use quasi‑experimental designs** (difference‑in‑differences, instrumental variables, or regression discontinuity) to isolate the impact of cafés on crime. | Controls for confounders and endogeneity. |
|
||||
| **3** | **Pilot subsidies** in a small set of comparable high‑crime neighborhoods while leaving control groups untouched. | Allows measurement of causal effect and cost‑effectiveness before scaling. |
|
||||
| **4** | **Include complementary interventions** (community centers, policing reforms, affordable housing) in the evaluation. | Recognizes that coffee shops are likely a small part of a larger toolbox for crime reduction. |
|
||||
| **5** | **Set explicit metrics** (e.g., 10‑year trend in residential burglary rates, perceived safety survey scores). | Provides clear accountability and a way to revise or abandon the policy if it fails. |
|
||||
|
||||
---
|
||||
|
||||
### Bottom line
|
||||
|
||||
The planner’s proposal hinges on a **spurious correlation** that has not been verified by a rigorous causal analysis. It ignores established drivers of crime, introduces potential displacement harms, and misallocates public resources without a defensible evidence base. Prior to spending money on subsidies, the city should conduct an empirical study that either validates or refutes the coffee‑shop–crime link under controlled conditions. Only then can policymakers decide whether cafés should be part of a comprehensive crime‑reduction strategy.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Sonnet-flavored:</b></summary>
|
||||
|
||||
This is a classic policy proposal worth rigorously examining. I'll break it into several categories of flaw so it's clear what's wrong:
|
||||
|
||||
## The Fundamental Mistake: Correlation ≠ Causation
|
||||
The planner assumes coffee shops *cause* lower crime, but the data actually points to the opposite
|
||||
|
||||
- High-income communities have both more disposable income and better policing/resources
|
||||
- Retail density may be **resulting from** socioeconomic conditions that also reduce crime
|
||||
- The underlying factor (community investment, employment, cohesion) could cause **both** outcomes
|
||||
|
||||
## Selection Bias Specifically Hard on Urban Planning
|
||||
The planner ignores what the data actually tells them
|
||||
|
||||
- Coffee shops *choose* where profit exists - they're predicting unobserved variables (income level)
|
||||
- What predicts coffee shop density is probably the *real* predictor of low crime
|
||||
|
||||
## Omitted Variable Problem
|
||||
Multiple confounders simultaneously:
|
||||
- Police presence/contact hours
|
||||
- Neighborhood socioeconomic status
|
||||
- Demographic composition
|
||||
- Street lighting and design
|
||||
- Community organization strength
|
||||
|
||||
The regressions planners use are likely missing all the actual drivers.
|
||||
|
||||
## Partial Substitution Fallacy
|
||||
Even if coffee shops *actually* somewhat reduce crime,
|
||||
the planner assumes each new shop works like **all** existing shops in coffee-dense neighborhoods, ignoring dose-response nuance.
|
||||
|
||||
## The Gentrification Accusation Gets It Wrong (Ironically)
|
||||
This specific policy is the exact mechanism planners fear:
|
||||
- New commercial districts attract higher income residents
|
||||
- Property values rise
|
||||
- Original residents are displaced before any safety benefits reach them
|
||||
This addresses symptoms in transformed neighborhoods rather than problems unchanged
|
||||
|
||||
## Cost vs. Effectiveness Issues
|
||||
The planner focuses on "creating coffee shops," not measurable outcomes
|
||||
- What evidence supports one shop = X level reduction?
|
||||
- Wouldn't same funding better support improved lighting, increased police shifts, job programs?
|
||||
- Opportunity costs aren't being considered
|
||||
|
||||
## Bad Counterfactual Reasoning
|
||||
The planner treats present coffee shop concentration as evidence about absent shops:
|
||||
- This directionality is suspect because the baseline condition varies across neighborhoods
|
||||
|
||||
## Summary of What Should Be Asked Instead
|
||||
- RCTs: Does adding a shop *in a controlled setting* reduce crime?
|
||||
- Panel data with lagged independent variables
|
||||
- Identifying strategies (difference-in-differences after comparable policy changes)
|
||||
- Cost/benefit against alternative interventions
|
||||
|
||||
The strongest criticism is that this appeals to warm comfort images while the actual economics points elsewhere - which planners use to justify precisely this sort of top-down commercialism.
|
||||
</details>
|
||||
|
||||
</details>
|
||||
---
|
||||
<details>
|
||||
<summary><b>Prompt 2:</b> Hibernation premise-questioning</summary>
|
||||
|
||||
**Prompt:**
|
||||
|
||||
If a technological breakthrough allowed humans to safely and comfortably hibernate for up to six months a year to save money and energy, how would this alter global labor markets, real estate, and the structure of the traditional education system?
|
||||
|
||||
<details>
|
||||
<summary><b>Base GPT-OSS-20B:</b></summary>
|
||||
|
||||
**What would it mean for humans to be able to “hibernate” safely for half the year?**
|
||||
|
||||
If the technology were as reliable, inexpensive, and comfortable as a good mattress, it would act like a new category of **time‑share** for the human body: 6 months in a low‑energy state and 6 months of normal activity. That simple shift would ripple through every sector that depends on human presence—workplaces, housing markets, and schools. Below is a map of likely transformations, grouped by the three domains you asked about.
|
||||
|
||||
---
|
||||
|
||||
## 1. Global Labor Markets
|
||||
|
||||
| Feature | Current Reality | With 6‑month Hibernation | Likely Shift |
|
||||
|---------|-----------------|-------------------------|--------------|
|
||||
| **Annual labor supply** | 52 weeks ≈ 2,080 hrs/year (full‑time) | ~1,040 hrs/year | Supply per worker halves. The economy has to either get the same output with fewer people or pay more for shorter hours. |
|
||||
| **Wages & Pay‑Per‑Hour** | Stable hourly rate; most workers earn 40 hrs/week | Hourly rates might rise as supply shrinks, or employers pay a premium for “awake” periods | We could see a two‑tier wage structure: “awake wage” versus “hibernation‑allowance.” |
|
||||
| **Hours worked per employer** | 5‑day week; on‑call shifts, overtime | Employers may restructure to 6‑month blocks: e.g., “active quarter” followed by a “pause.” | Predictable quarterly patterns give firms a built‑in rest cycle. |
|
||||
| **Job design & staffing models** | Continuous cycles of production, service, or creative work | Rotational teams that keep a project going while others sleep; plug‑and‑play gig workers for the 6 months when most of the workforce is “asleep.” | Many short‑term contracts, hybrid tele‑work + on‑site “shift‑swap” schedules. |
|
||||
| **Automation & AI** | Machines take over repetitive or hazardous tasks, but still need human oversight | During hibernation, AI and robots can run facilities autonomously; the “awake” workforce can focus on oversight, strategy, or value‑adding tasks that only humans do. | Accelerated AI deployment in sectors that require 24/7 uptime; humans become supervisors rather than operators. |
|
||||
| **Workforce‑flexibility & Gig Economy** | Growing but still anchored in weekly contracts | Perfect fit for gig workers: people can “log on” during the 6 active months and take the next 6 months off. | Gig platforms may create monthly/quarterly “turn‑over cycles” where each worker’s shift is known in advance. |
|
||||
| **Cross‑border labor & immigration** | Visa restrictions tied to continuous employment | Companies can hire globally for short bursts, reducing visa bottlenecks. | Potential rise in remote work visas; countries may open shorter‑term “researcher” or “contractor” permits. |
|
||||
| **Equity / Stratification Risks** | Overtime drives inequality | If hibernation is expensive, those who cannot afford it may work longer, increasing burnout and health disparities. | Policy may need to guard against a class of “perpetual laborers.” |
|
||||
|
||||
### Bottom line
|
||||
|
||||
- **Supply shock** – fewer hours per worker → higher wages or more workers hired.
|
||||
- **Demand shift** – firms need to plan around the predictable rhythm of workers’ “on” and “off” periods.
|
||||
- **Productivity gains** – deeper rest could improve focus, reduce errors, and heighten creativity during awake periods.
|
||||
- **New industries** – hibernation maintenance, cryogenic storage leasing, sleep‑health tech, AI‑supervision firms.
|
||||
|
||||
---
|
||||
|
||||
## 2. Real Estate
|
||||
|
||||
| Feature | Current Reality | With 6‑month Hibernation | Likely Shift |
|
||||
|---------|-----------------|--------------------------|--------------|
|
||||
| **Occupancy rates** | 24 hrs/day, full occupancy; houses consume heating, cooling, utilities constantly | Half the time consumption drops to near zero | Residential space can be re‑valued by “active” days rather than sheer square footage. |
|
||||
| **Property taxes & insurance** | Often based on property value and full-year occupancy | Potential for tax rebates or reduced premiums for homes used only half the year; new “hibernation‑friendly” insurance products. | Policy adjustments to reward lower utility consumption. |
|
||||
| **Urban density** | High density in cities to serve constant demand for commuting & services | Less daily traffic, fewer commuters → city centers could pivot to cultural, recreational, or retail uses during active months. | Some neighborhoods may become “sleep hubs” with communal hibernation pods. |
|
||||
| **Home design** | Focus on all‑year comfort, heating/cooling systems | New design elements: integrated cryo‑pods, isolated thermal control zones, ultra‑low‑energy architecture for extended dark periods. | Architecture firms may offer “hibernation‑ready” homes as a niche market. |
|
||||
| **Commercial real estate** | 9‑to‑5 office occupancy; warehouses operate around the clock | Offices can be reduced in size or repurposed: one “awake” coworking space per 6‑month block, with additional robotic labs running during hibernation. | Increase in mixed‑use developments that host both human offices and autonomous manufacturing. |
|
||||
| **Resort & vacation markets** | People travel during their free time (often winter or summer) | “Winter to bed” people may shift vacations to the active months; conversely, tourism could expand into the hibernation months if visitors are comfortable spending nights in “sleep stations.” | Travel industry might create “sleep‑lifted” packages: stay at a hibernation facility while the host sleeps. |
|
||||
| **Infrastructure** | Power grid must supply 24/7 loads | Significant opportunity for micro‑grids, battery storage, and demand‑response systems that take advantage of long, low‑load periods. | Renewable‑energy projects may become more viable with predictable “night” curtailment periods. |
|
||||
|
||||
### Bottom line
|
||||
|
||||
- **Reduced utility footprint** → cheaper homes, lower energy taxes, new property valuations.
|
||||
- **Design innovation** – specialized pods, adaptive architecture, and modular housing that can be “offline.”
|
||||
- **Urban re‑imagining** – lower commuter traffic, more public spaces, potential shift from high‑density office cores to low‑density mixed‑use hubs.
|
||||
|
||||
---
|
||||
|
||||
## 3. Traditional Education System
|
||||
|
||||
| Feature | Current Reality | With 6‑month Hibernation | Likely Shift |
|
||||
|---------|-----------------|--------------------------|--------------|
|
||||
| **Academic calendar** | Fall–spring semesters + summer break (~4 months) | Two “active” halves (e.g., Jan–June and July–Dec); each half could become a full academic year. | Schools may adopt a **bi‑annual model**: compact, intensive periods of 2–3 months separated by a hibernation‑break. |
|
||||
| **School hours & days** | 5 days/week, 6–8 hrs/day | Potential for *all‑day* immersive modules during the active period (e.g., 12‑hour “lab” sessions). | Accelerated curricula that cram subjects into tighter windows. |
|
||||
| **Distance vs. in‑person** | Shifted heavily during COVID, but core remains physical | Lowered necessity for continuous on‑campus presence → more hybrid / virtual learning when students are awake. | Massive adoption of “synchronous learning” platforms that allow real‑time engagement only during the 6 months, with recorded modules stored for later completion. |
|
||||
| **Student lifecycle** | 18–22 yrs for undergrad; structured apprenticeship tracks | Possible *dual‑track* programs: a “sleep‑break” year after freshman year to fund tuition or life experience. | Universities may offer *“Sleep‑Break Credits”* that count toward graduation. |
|
||||
| **Faculty workload** | Continuous tenure cycles, semester‑based contracts | Reduced teaching load: faculty can focus on research or mentorship during the 6‑month window; meanwhile AI tutors fill gaps. | More emphasis on “compressed teaching” and micro‑teaching units. |
|
||||
| **Teacher pay & job security** | Annual contracts, some reliance on overtime | Possible shift to quarterly or semi‑annual contracts with built‑in sabbaticals. | Rise in *“Instructor‑A‑to‑Z”* part‑time roles, more emphasis on project‑based teaching. |
|
||||
| **Curriculum design** | Spread over 4 years with prerequisites | Shorter, modular courses that can be taken in a single 2–3 month stint; prerequisites become “boot‑camp” modules. | "Stackable" credits – learners acquire packages of knowledge within each active half. |
|
||||
| **Student life & co‑curricular** | Sports, clubs, internships spread throughout the year | Clubs may operate only during active periods, but the break can be used for internships or research projects. | New *“Intercalated Summer Internships”* that replace long “summer breaks.” |
|
||||
| **Higher education financing** | Tuition + scholarships; rising debt burden | Tuition costs might fall (less per‑hour instruction), but also student life expenses may rise if people stay in dorms for the whole active period. | Debt consolidation may shift toward *“hibernate‑interest”* packages: lower cost during sleeping months. |
|
||||
|
||||
### Bottom line
|
||||
|
||||
- **Intensification** – learning compressed into fewer but denser periods.
|
||||
- **Flexibility** – more modular, stackable courses that align with the two active windows.
|
||||
- **Tech‑driven teaching** – AI‑facilitated tutoring between periods, reduced need for continuous faculty presence.
|
||||
- **Policy changes** – accreditation bodies may adjust standards to reflect new calendar structures.
|
||||
|
||||
---
|
||||
|
||||
## What Makes These Shifts Work (or Not)?
|
||||
|
||||
1. **Adoption Rate**
|
||||
- If only a subset of the population hibernates, we’ll see *segmented* labor markets and housing usage: some houses stay live, others become “sleep lofts.”
|
||||
- High adoption could trigger near‑universal “half‑year work” norms.
|
||||
|
||||
2. **Cost & Accessibility**
|
||||
- If cryo‑pods are expensive, only the wealthy will hibernate; widening the gap between “wake” and “hibernation” labor pools.
|
||||
- Subsidies or public sectors might need to intervene to avoid exploitation.
|
||||
|
||||
3. **Health & Safety Regulations**
|
||||
- Need for standards: carbon‑ dioxide levels, monitoring devices, emergency protocols if a hibernation fails.
|
||||
- Occupational health laws will need new guidelines for “continuous rest cycles.”
|
||||
|
||||
4. **Cultural Acceptance**
|
||||
- Some industries (healthcare, emergency services) may resist any downtime; others may embrace it.
|
||||
- Public sentiment may shape policy: mass public‑service workers might lobby against mandatory sleep breaks.
|
||||
|
||||
5. **Technological Reliability**
|
||||
- The best‑case scenario presumes error‑free hibernation.
|
||||
- Failure modes could lead to emergency work shifts, eroding trust.
|
||||
|
||||
---
|
||||
|
||||
## A Quick “Future‑Scenario” Sketch
|
||||
|
||||
1. **The Workplace:**
|
||||
*A software firm runs three 6‑month cycles a year. Two teams are on duty simultaneously, each covering a 12‑hour shift. Between cycles, the whole staff sleeps in cryo‑pods in a dedicated facility built beneath the office. AI bots handle routine deployments while humans supervise.*
|
||||
|
||||
2. **The Home:**
|
||||
*Families invest in “smart‑sleep” houses that automatically adjust temperature and lighting for hibernation months. Each house has a private pod, but houses are shared with neighbors who are active when you’re asleep.*
|
||||
|
||||
3. **The Classroom:**
|
||||
*A university’s biology program now consists of 4 intense “boot camps,” each lasting 3 weeks. Students attend all four during the active period and sleep through the rest. Between boot camps, they work on lab projects or internships, guided by AI‑based mentors.*
|
||||
|
||||
---
|
||||
|
||||
## Final Takeaway
|
||||
|
||||
A breakthrough that lets humans hibernate half the year would **halve the annual human‑time input** for every sector that currently relies on continuous presence. The resulting shifts could be powerful:
|
||||
|
||||
- **Labour:** *Supply compression → wage pressure, new gig structures, automated oversight.*
|
||||
- **Housing:** *Lower energy use → re‑priced real estate, specialized sleep‑pods, redesigned urban cores.*
|
||||
- **Education:** *Intensified, modular curricula, stronger reliance on tech tutors, re‑structured calendars.*
|
||||
|
||||
Everything would revolve around two “seasons”: the **awake season** (high activity, high productivity) and the **sleep season** (low demand, low consumption). The balance between these seasons would set the rhythm of economies worldwide. Whether societies choose to dance to that rhythm—or fight against it—will depend on the technology’s cost, accessibility, and the values we attach to continuous labor versus restorative rest.
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Sonnet-flavored:</b></summary>
|
||||
|
||||
Honest assessment requires recognizing this creates fundamentally opposing pressures:
|
||||
|
||||
**What's saving money vs. what's causing new costs**
|
||||
|
||||
Savings would likely be outweighed or redirected rather than creating a surplus - enormous structural disruption hasn't eliminated conflict (automation did similarly but slowly)
|
||||
|
||||
## Labor Markets
|
||||
|
||||
**Actual productivity loss**
|
||||
- 6 months active per year reduces labor supply by half *if work can't be simply compressed*
|
||||
- Gig economy collapses without continuous availability
|
||||
- Supervision that was redundant between shifts becomes necessary
|
||||
- Medical/critical monitoring would require replacement -> drives *more* automation concurrently
|
||||
- The research productivity puzzle: projects extended, funding cycles disrupted
|
||||
|
||||
**Who benefits anyway?**
|
||||
- Low-wage workers using seasonal unemployment barely once may not see savings
|
||||
- Pre-retirement cohorts already commuting - may opt-out entirely
|
||||
- This could compress the labor force, expanding precariat status
|
||||
|
||||
**Honest counterarguments**
|
||||
- Work-life productivity research suggests current continuous hours have diminishing returns
|
||||
- Time shift could enable staggered process cycles, reducing queueing/lags
|
||||
- But these are subtle effects requiring architectural redesign
|
||||
|
||||
## Real Estate
|
||||
|
||||
**Market disruptions with mixed winners/losers**
|
||||
- Short-term rentals doomed -> net loss for cities dependent on tourist turnover
|
||||
- Unforced: shorter occupancy duration increases depreciation pressure - capital tied up faster
|
||||
- Office space: significant underutilization but potentially because manufacturing/production moved online before change, allowing gradual compression
|
||||
- **But commercial needs persist** while humans aren't working
|
||||
|
||||
## Education
|
||||
|
||||
**Logical structural changes less compelling**
|
||||
- Education could redistribute over the year, but existing curriculum structured around seasonal work cycles will resist modification
|
||||
- Onboarding retreats replaced by interruptions in learning continuity
|
||||
- Physical campuses genuinely compress values
|
||||
|
||||
**What would meaningfully change?**
|
||||
- Less pressure on "enforced attendance" for non-essential classes -> real cultural shift required
|
||||
|
||||
## Significantly Harder Problems Hidden
|
||||
|
||||
- Age discrimination: older workers could claim greater need for biomedical intervention, raising liability costs for employers
|
||||
- Global south inequality multiplies (automation + institutional dependency creates technology debt)
|
||||
- Retiree patterns collapse - pension systems restructured beyond obvious replacement
|
||||
|
||||
## What Most Analyses Get Wrong
|
||||
|
||||
Assuming savings just becomes extra leisure time, whereas frameworks show:
|
||||
- Occupational scarcity reactions distort markets away from productive direction
|
||||
- Institutional dependency on regular labor supply creates regressive burden on low-income users
|
||||
|
||||
**Bottom line:** Energy and money savings are peripheral in economic modeling. Labor output contraction would likely produce the dominant macro effect. **I'd expect partial or uneven adoption for years before any genuine net increase in capacity.**
|
||||
|
||||
What's your interest here?
|
||||
</details>
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| File | Type | Purpose |
|
||||
|---|---|---|
|
||||
| `hf/gpt-oss-20b-sonnet-flavored-v1/` | safetensors | Merged HF format with MXFP4 experts native; for transformers / vLLM |
|
||||
| `gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf` | GGUF | Q8_0 non-experts, MXFP4 experts |
|
||||
| `gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf` | GGUF | Q6_K non-experts, MXFP4 experts |
|
||||
| `gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf` | GGUF | Q5_K_M non-experts (with fallbacks, see note), MXFP4 experts |
|
||||
| `gpt-oss-20B-sonnet-flavored-v1-BF16.gguf` | GGUF | Reference BF16 non-experts, MXFP4 experts |
|
||||
|
||||
---
|
||||
|
||||
## Training Details
|
||||
|
||||
| Detail | Value |
|
||||
|---|---|
|
||||
| Base model | [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) |
|
||||
| Architecture | MoE — 20B total / 3.6B active, 32 experts top-4 |
|
||||
| Method | LoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
|
||||
| Training dataset | Sonnet 4.6 distilled responses by [TeichAI](https://huggingface.co/TeichAI) |
|
||||
| Training type | Final-answer-only (no reasoning targeting) |
|
||||
|
||||
The LoRA adapter is not included in this v1 release. Future versions will publish the standalone adapter alongside the merged weights.
|
||||
|
||||
---
|
||||
|
||||
## Quantization Details
|
||||
|
||||
| Detail | Value |
|
||||
|---|---|
|
||||
| Quants | BF16, Q8_0, Q6_K, Q5_K_M (all with native MXFP4 experts) |
|
||||
| Quantized by | [jorge-erdb](https://huggingface.co/jorge-erdb) |
|
||||
| Method | [llama.cpp](https://github.com/ggml-org/llama.cpp) with explicit MXFP4 expert preservation |
|
||||
|
||||
**MXFP4 experts preserved natively across the entire ladder.** The MoE expert tensors (~19B of 20B params) remain in their original MXFP4 format in every quant. Only non-expert tensors (router, attention, embeddings, layernorms) vary in precision across the ladder. No experts are decompressed and requantized at any point.
|
||||
|
||||
---
|
||||
|
||||
## Download
|
||||
|
||||
```bash
|
||||
pip install -U "huggingface_hub[cli]"
|
||||
|
||||
# BF16 (full precision non-experts, MXFP4 experts)
|
||||
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*BF16*.gguf" --local-dir ./
|
||||
|
||||
# Q8_0
|
||||
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q8_0.gguf" --local-dir ./
|
||||
|
||||
# Q6_K
|
||||
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q6_K.gguf" --local-dir ./
|
||||
|
||||
# Q5_K_M
|
||||
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q5_K_M.gguf" --local-dir ./
|
||||
|
||||
# Merged HF safetensors (for transformers / vLLM)
|
||||
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "hf/*" --local-dir ./
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Credits
|
||||
|
||||
- **LoRA training & quantization**: [jorge-erdb](https://huggingface.co/jorge-erdb)
|
||||
- **Distilled training data**: [TeichAI](https://huggingface.co/TeichAI) — Sonnet 4.6 distillation
|
||||
- **Base model**: [OpenAI](https://huggingface.co/openai) — GPT-OSS-20B
|
||||
- **Style source**: Anthropic — Claude Sonnet 4.6 (indirectly via distillation)
|
||||
- **Tooling**: [Unsloth](https://github.com/unslothai/unsloth), [llama.cpp](https://github.com/ggml-org/llama.cpp)
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://gpt-oss.com"><strong>Try gpt-oss</strong></a> ·
|
||||
<a href="https://cookbook.openai.com/topic/gpt-oss"><strong>Guides</strong></a> ·
|
||||
<a href="https://arxiv.org/abs/2508.10925"><strong>Model card</strong></a> ·
|
||||
<a href="https://openai.com/index/introducing-gpt-oss/"><strong>OpenAI blog</strong></a>
|
||||
</p>
|
||||
|
||||
<br>
|
||||
|
||||
Welcome to the gpt-oss series, [OpenAI’s open-weight models](https://openai.com/open-models) designed for powerful reasoning, agentic tasks, and versatile developer use cases.
|
||||
|
||||
We’re releasing two flavors of these open models:
|
||||
- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
|
||||
- `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
|
||||
|
||||
Both models were trained on our [harmony response format](https://github.com/openai/harmony) and should only be used with the harmony format as it will not work correctly otherwise.
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> This model card is dedicated to the smaller `gpt-oss-20b` model. Check out [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) for the larger model.
|
||||
|
||||
# Highlights
|
||||
|
||||
* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
|
||||
* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
|
||||
* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
|
||||
* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.
|
||||
* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.
|
||||
* **MXFP4 quantization:** The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
|
||||
|
||||
---
|
||||
|
||||
# Inference examples
|
||||
|
||||
## Transformers
|
||||
|
||||
You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
|
||||
|
||||
To get started, install the necessary dependencies to setup your environment:
|
||||
|
||||
```
|
||||
pip install -U transformers kernels torch
|
||||
```
|
||||
|
||||
Once, setup you can proceed to run the model by running the snippet below:
|
||||
|
||||
```py
|
||||
from transformers import pipeline
|
||||
import torch
|
||||
|
||||
model_id = "openai/gpt-oss-20b"
|
||||
|
||||
pipe = pipeline(
|
||||
"text-generation",
|
||||
model=model_id,
|
||||
torch_dtype="auto",
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
|
||||
]
|
||||
|
||||
outputs = pipe(
|
||||
messages,
|
||||
max_new_tokens=256,
|
||||
)
|
||||
print(outputs[0]["generated_text"][-1])
|
||||
```
|
||||
|
||||
Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
|
||||
|
||||
```
|
||||
transformers serve
|
||||
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
|
||||
```
|
||||
|
||||
[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
|
||||
|
||||
## vLLM
|
||||
|
||||
vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
|
||||
|
||||
```bash
|
||||
uv pip install --pre vllm==0.10.1+gptoss \
|
||||
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
|
||||
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
|
||||
--index-strategy unsafe-best-match
|
||||
|
||||
vllm serve openai/gpt-oss-20b
|
||||
```
|
||||
|
||||
[Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
|
||||
|
||||
## PyTorch / Triton
|
||||
|
||||
To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
|
||||
|
||||
## Ollama
|
||||
|
||||
If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
|
||||
|
||||
```bash
|
||||
# gpt-oss-20b
|
||||
ollama pull gpt-oss:20b
|
||||
ollama run gpt-oss:20b
|
||||
```
|
||||
|
||||
[Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
|
||||
|
||||
#### LM Studio
|
||||
|
||||
If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.
|
||||
|
||||
```bash
|
||||
# gpt-oss-20b
|
||||
lms get openai/gpt-oss-20b
|
||||
```
|
||||
|
||||
Check out our [awesome list](https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md) for a broader collection of gpt-oss resources and inference partners.
|
||||
|
||||
---
|
||||
|
||||
# Download the model
|
||||
|
||||
You can download the model weights from the [Hugging Face Hub](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) directly from Hugging Face CLI:
|
||||
|
||||
```shell
|
||||
# gpt-oss-20b
|
||||
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
|
||||
pip install gpt-oss
|
||||
python -m gpt_oss.chat model/
|
||||
```
|
||||
|
||||
# Reasoning levels
|
||||
|
||||
You can adjust the reasoning level that suits your task across three levels:
|
||||
|
||||
* **Low:** Fast responses for general dialogue.
|
||||
* **Medium:** Balanced speed and detail.
|
||||
* **High:** Deep and detailed analysis.
|
||||
|
||||
The reasoning level can be set in the system prompts, e.g., "Reasoning: high".
|
||||
|
||||
# Tool use
|
||||
|
||||
The gpt-oss models are excellent for:
|
||||
* Web browsing (using built-in browsing tools)
|
||||
* Function calling with defined schemas
|
||||
* Agentic operations like browser tasks
|
||||
|
||||
# Fine-tuning
|
||||
|
||||
Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
|
||||
|
||||
This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) can be fine-tuned on a single H100 node.
|
||||
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@misc{openai2025gptoss120bgptoss20bmodel,
|
||||
title={gpt-oss-120b & gpt-oss-20b Model Card},
|
||||
author={OpenAI},
|
||||
year={2025},
|
||||
eprint={2508.10925},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL},
|
||||
url={https://arxiv.org/abs/2508.10925},
|
||||
}
|
||||
```
|
||||
3
gpt-oss-20B-sonnet-flavored-v1-BF16.gguf
Normal file
3
gpt-oss-20B-sonnet-flavored-v1-BF16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d047da3da2db423bfb351c77269c4d22ae20c977bd3dddad9391d9d286d7b8a9
|
||||
size 13792636992
|
||||
3
gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf
Normal file
3
gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:825307c8e47c878f17b81f62201ec178cbe785de0f385ce6e3d9b57a2d6b0fc3
|
||||
size 11760485952
|
||||
3
gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf
Normal file
3
gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2edc732ab14e59c6ce6c782735c3144a36a8f61f19bc7bc7129518b5cd89dd6b
|
||||
size 12040997952
|
||||
3
gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf
Normal file
3
gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7e34a2e7fa19024467435b00b0a07475e3afa538ab799ddbd207738c56ee87aa
|
||||
size 12109564992
|
||||
315
hf/gpt-oss-20b-sonnet-flavored-v1/chat_template.jinja
Normal file
315
hf/gpt-oss-20b-sonnet-flavored-v1/chat_template.jinja
Normal file
@@ -0,0 +1,315 @@
|
||||
{# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth chat template fixes. Edited from ggml-org & OpenAI #}
|
||||
{#-
|
||||
In addition to the normal inputs of `messages` and `tools`, this template also accepts the
|
||||
following kwargs:
|
||||
- "builtin_tools": A list, can contain "browser" and/or "python".
|
||||
- "model_identity": A string that optionally describes the model identity.
|
||||
- "reasoning_effort": A string that describes the reasoning effort, defaults to "medium".
|
||||
#}
|
||||
|
||||
{#- Tool Definition Rendering ============================================== #}
|
||||
{%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%}
|
||||
{%- if param_spec.type == "array" -%}
|
||||
{%- if param_spec['items'] -%}
|
||||
{%- if param_spec['items']['type'] == "string" -%}
|
||||
{{- "string[]" }}
|
||||
{%- elif param_spec['items']['type'] == "number" -%}
|
||||
{{- "number[]" }}
|
||||
{%- elif param_spec['items']['type'] == "integer" -%}
|
||||
{{- "number[]" }}
|
||||
{%- elif param_spec['items']['type'] == "boolean" -%}
|
||||
{{- "boolean[]" }}
|
||||
{%- else -%}
|
||||
{%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%}
|
||||
{%- if inner_type == "object | object" or inner_type|length > 50 -%}
|
||||
{{- "any[]" }}
|
||||
{%- else -%}
|
||||
{{- inner_type + "[]" }}
|
||||
{%- endif -%}
|
||||
{%- endif -%}
|
||||
{%- if param_spec.nullable -%}
|
||||
{{- " | null" }}
|
||||
{%- endif -%}
|
||||
{%- else -%}
|
||||
{{- "any[]" }}
|
||||
{%- if param_spec.nullable -%}
|
||||
{{- " | null" }}
|
||||
{%- endif -%}
|
||||
{%- endif -%}
|
||||
{%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%}
|
||||
{#- Handle array of types like ["object", "object"] from Union[dict, list] #}
|
||||
{%- if param_spec.type | length > 1 -%}
|
||||
{{- param_spec.type | join(" | ") }}
|
||||
{%- else -%}
|
||||
{{- param_spec.type[0] }}
|
||||
{%- endif -%}
|
||||
{%- elif param_spec.oneOf -%}
|
||||
{#- Handle oneOf schemas - check for complex unions and fallback to any #}
|
||||
{%- set has_object_variants = false -%}
|
||||
{%- for variant in param_spec.oneOf -%}
|
||||
{%- if variant.type == "object" -%}
|
||||
{%- set has_object_variants = true -%}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
{%- if has_object_variants and param_spec.oneOf|length > 1 -%}
|
||||
{{- "any" }}
|
||||
{%- else -%}
|
||||
{%- for variant in param_spec.oneOf -%}
|
||||
{{- render_typescript_type(variant, required_params) -}}
|
||||
{%- if variant.description %}
|
||||
{{- "// " + variant.description }}
|
||||
{%- endif -%}
|
||||
{%- if variant.default is defined %}
|
||||
{{ "// default: " + variant.default|tojson }}
|
||||
{%- endif -%}
|
||||
{%- if not loop.last %}
|
||||
{{- " | " }}
|
||||
{% endif -%}
|
||||
{%- endfor -%}
|
||||
{%- endif -%}
|
||||
{%- elif param_spec.type == "string" -%}
|
||||
{%- if param_spec.enum -%}
|
||||
{{- '"' + param_spec.enum|join('" | "') + '"' -}}
|
||||
{%- else -%}
|
||||
{{- "string" }}
|
||||
{%- if param_spec.nullable %}
|
||||
{{- " | null" }}
|
||||
{%- endif -%}
|
||||
{%- endif -%}
|
||||
{%- elif param_spec.type == "number" -%}
|
||||
{{- "number" }}
|
||||
{%- elif param_spec.type == "integer" -%}
|
||||
{{- "number" }}
|
||||
{%- elif param_spec.type == "boolean" -%}
|
||||
{{- "boolean" }}
|
||||
|
||||
{%- elif param_spec.type == "object" -%}
|
||||
{%- if param_spec.properties -%}
|
||||
{{- "{\n" }}
|
||||
{%- for prop_name, prop_spec in param_spec.properties.items() -%}
|
||||
{{- prop_name -}}
|
||||
{%- if prop_name not in (param_spec.required or []) -%}
|
||||
{{- "?" }}
|
||||
{%- endif -%}
|
||||
{{- ": " }}
|
||||
{{ render_typescript_type(prop_spec, param_spec.required or []) }}
|
||||
{%- if not loop.last -%}
|
||||
{{-", " }}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
{{- "}" }}
|
||||
{%- else -%}
|
||||
{{- "object" }}
|
||||
{%- endif -%}
|
||||
{%- else -%}
|
||||
{{- "any" }}
|
||||
{%- endif -%}
|
||||
{%- endmacro -%}
|
||||
|
||||
{%- macro render_tool_namespace(namespace_name, tools) -%}
|
||||
{{- "## " + namespace_name + "\n\n" }}
|
||||
{{- "namespace " + namespace_name + " {\n\n" }}
|
||||
{%- for tool in tools %}
|
||||
{%- set tool = tool.function %}
|
||||
{{- "// " + tool.description + "\n" }}
|
||||
{{- "type "+ tool.name + " = " }}
|
||||
{%- if tool.parameters and tool.parameters.properties -%}
|
||||
{{- "(_: " }}
|
||||
{{- "{\n" }}
|
||||
{%- for param_name, param_spec in tool.parameters.properties.items() %}
|
||||
{{- "// " + param_spec.description + "\n" }}
|
||||
{{- param_name }}
|
||||
{%- if param_name not in (tool.parameters.required or []) -%}
|
||||
{{- "?" }}
|
||||
{%- endif -%}
|
||||
{{- ": " }}
|
||||
{{- render_typescript_type(param_spec, tool.parameters.required or []) }}
|
||||
{%- if param_spec.default is defined -%}
|
||||
{%- if param_spec.enum %}
|
||||
{{- ", // default: " + param_spec.default }}
|
||||
{%- elif param_spec.oneOf %}
|
||||
{{- "// default: " + param_spec.default }}
|
||||
{%- else %}
|
||||
{{- ", // default: " + param_spec.default|tojson }}
|
||||
{%- endif -%}
|
||||
{%- endif -%}
|
||||
{%- if not loop.last %}
|
||||
{{- ",\n" }}
|
||||
{%- else %}
|
||||
{{- "\n" }}
|
||||
{%- endif -%}
|
||||
{%- endfor %}
|
||||
{{- "}) => any;\n\n" }}
|
||||
{%- else -%}
|
||||
{{- "() => any;\n\n" }}
|
||||
{%- endif -%}
|
||||
{%- endfor %}
|
||||
{{- "} // namespace " + namespace_name }}
|
||||
{%- endmacro -%}
|
||||
|
||||
{%- macro render_builtin_tools(browser_tool, python_tool) -%}
|
||||
{%- if browser_tool %}
|
||||
{{- "## browser\n\n" }}
|
||||
{{- "// Tool for browsing.\n" }}
|
||||
{{- "// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.\n" }}
|
||||
{{- "// Cite information from the tool using the following format:\n" }}
|
||||
{{- "// `【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`.\n" }}
|
||||
{{- "// Do not quote more than 10 words directly from the tool output.\n" }}
|
||||
{{- "// sources=web (default: web)\n" }}
|
||||
{{- "namespace browser {\n\n" }}
|
||||
{{- "// Searches for information related to `query` and displays `topn` results.\n" }}
|
||||
{{- "type search = (_: {\n" }}
|
||||
{{- "query: string,\n" }}
|
||||
{{- "topn?: number, // default: 10\n" }}
|
||||
{{- "source?: string,\n" }}
|
||||
{{- "}) => any;\n\n" }}
|
||||
{{- "// Opens the link `id` from the page indicated by `cursor` starting at line number `loc`, showing `num_lines` lines.\n" }}
|
||||
{{- "// Valid link ids are displayed with the formatting: `【{id}†.*】`.\n" }}
|
||||
{{- "// If `cursor` is not provided, the most recent page is implied.\n" }}
|
||||
{{- "// If `id` is a string, it is treated as a fully qualified URL associated with `source`.\n" }}
|
||||
{{- "// If `loc` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.\n" }}
|
||||
{{- "// Use this function without `id` to scroll to a new location of an opened page.\n" }}
|
||||
{{- "type open = (_: {\n" }}
|
||||
{{- "id?: number | string, // default: -1\n" }}
|
||||
{{- "cursor?: number, // default: -1\n" }}
|
||||
{{- "loc?: number, // default: -1\n" }}
|
||||
{{- "num_lines?: number, // default: -1\n" }}
|
||||
{{- "view_source?: boolean, // default: false\n" }}
|
||||
{{- "source?: string,\n" }}
|
||||
{{- "}) => any;\n\n" }}
|
||||
{{- "// Finds exact matches of `pattern` in the current page, or the page given by `cursor`.\n" }}
|
||||
{{- "type find = (_: {\n" }}
|
||||
{{- "pattern: string,\n" }}
|
||||
{{- "cursor?: number, // default: -1\n" }}
|
||||
{{- "}) => any;\n\n" }}
|
||||
{{- "} // namespace browser\n\n" }}
|
||||
{%- endif -%}
|
||||
|
||||
{%- if python_tool %}
|
||||
{{- "## python\n\n" }}
|
||||
{{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).\n\n" }}
|
||||
{{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.\n\n" }}
|
||||
{%- endif -%}
|
||||
{%- endmacro -%}
|
||||
|
||||
{#- System Message Construction ============================================ #}
|
||||
{%- macro build_system_message() -%}
|
||||
{%- if model_identity is not defined %}
|
||||
{{- "You are ChatGPT, a large language model trained by OpenAI.\n" -}}
|
||||
{%- else %}
|
||||
{{- model_identity }}
|
||||
{%- endif %}
|
||||
{{- "Knowledge cutoff: 2024-06\n" }}
|
||||
{{- "Current date: " + strftime_now("%Y-%m-%d") + "\n\n" }}
|
||||
{%- if reasoning_effort is not defined %}
|
||||
{%- set reasoning_effort = "medium" %}
|
||||
{%- endif %}
|
||||
{{- "Reasoning: " + reasoning_effort + "\n\n" }}
|
||||
{%- if builtin_tools is defined %}
|
||||
{{- "# Tools\n\n" }}
|
||||
{%- set available_builtin_tools = namespace(browser=false, python=false) %}
|
||||
{%- for tool in builtin_tools %}
|
||||
{%- if tool == "browser" %}
|
||||
{%- set available_builtin_tools.browser = true %}
|
||||
{%- elif tool == "python" %}
|
||||
{%- set available_builtin_tools.python = true %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }}
|
||||
{%- endif -%}
|
||||
{{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }}
|
||||
{%- if tools is defined -%}
|
||||
{{- "\nCalls to these tools must go to the commentary channel: 'functions'." }}
|
||||
{%- endif -%}
|
||||
{%- endmacro -%}
|
||||
|
||||
{#- Main Template Logic ================================================= #}
|
||||
{#- Set defaults #}
|
||||
|
||||
{#- Render system message #}
|
||||
{{- "<|start|>system<|message|>" }}
|
||||
{{- build_system_message() }}
|
||||
{{- "<|end|>" }}
|
||||
|
||||
{#- Extract developer message #}
|
||||
{%- if messages[0].role == "developer" or messages[0].role == "system" %}
|
||||
{%- set developer_message = messages[0].content %}
|
||||
{%- set loop_messages = messages[1:] %}
|
||||
{%- else %}
|
||||
{%- set developer_message = "" %}
|
||||
{%- set loop_messages = messages %}
|
||||
{%- endif %}
|
||||
|
||||
{#- Render developer message #}
|
||||
{%- if developer_message or tools %}
|
||||
{{- "<|start|>developer<|message|>" }}
|
||||
{%- if developer_message %}
|
||||
{{- "# Instructions\n\n" }}
|
||||
{{- developer_message }}
|
||||
{%- endif %}
|
||||
{%- if tools -%}
|
||||
{{- "\n\n" }}
|
||||
{{- "# Tools\n\n" }}
|
||||
{{- render_tool_namespace("functions", tools) }}
|
||||
{%- endif -%}
|
||||
{{- "<|end|>" }}
|
||||
{%- endif %}
|
||||
|
||||
{#- Render messages #}
|
||||
{%- set last_tool_call = namespace(name=none) %}
|
||||
{%- for message in loop_messages -%}
|
||||
{#- At this point only assistant/user/tool messages should remain #}
|
||||
{%- if message.role == 'assistant' -%}
|
||||
{%- if "tool_calls" in message %}
|
||||
{#- We assume max 1 tool call per message, and so we infer the tool call name #}
|
||||
{#- in "tool" messages from the most recent assistant tool call name #}
|
||||
{%- set tool_call = message.tool_calls[0] %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{%- if message.content %}
|
||||
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
|
||||
{%- endif %}
|
||||
{{- "<|start|>assistant to=" }}
|
||||
{{- "functions." + tool_call.name + "<|channel|>commentary json<|message|>" }}
|
||||
{{- tool_call.arguments|tojson }}
|
||||
{{- "<|call|>" }}
|
||||
{%- set last_tool_call.name = tool_call.name %}
|
||||
{%- elif "thinking" in message and loop.last and not add_generation_prompt %}
|
||||
{#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
|
||||
{#- This is a situation that should only occur in training, never in inference. #}
|
||||
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
|
||||
{#- <|return|> indicates the end of generation, but <|end|> does not #}
|
||||
{#- <|return|> should never be an input to the model, but we include it as the final token #}
|
||||
{#- when training, so the model learns to emit it. #}
|
||||
{{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
|
||||
{%- set last_tool_call.name = none %}
|
||||
{%- elif "thinking" in message %}
|
||||
{#- CoT is dropped during all previous turns, so we never render it for inference #}
|
||||
{{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
|
||||
{%- set last_tool_call.name = none %}
|
||||
{%- elif loop.last and not add_generation_prompt %}
|
||||
{#- <|return|> indicates the end of generation, but <|end|> does not #}
|
||||
{#- <|return|> should never be an input to the model, but we include it as the final token #}
|
||||
{#- when training, so the model learns to emit it. #}
|
||||
{{- "<|start|>assistant<|message|>" + message.content + "<|return|>" }}
|
||||
{%- else %}
|
||||
{{- "<|start|>assistant<|message|>" + message.content + "<|end|>" }}
|
||||
{%- set last_tool_call.name = none %}
|
||||
{%- endif %}
|
||||
{%- elif message.role == 'tool' -%}
|
||||
{%- if last_tool_call.name is none %}
|
||||
{{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
|
||||
{%- endif %}
|
||||
{{- "<|start|>functions." + last_tool_call.name }}
|
||||
{{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }}
|
||||
{%- else -%}
|
||||
{{- "<|start|>user<|message|>" + message.content + "<|end|>" }}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
|
||||
{#- Generation prompt #}
|
||||
{%- if add_generation_prompt -%}
|
||||
<|start|>assistant
|
||||
{%- endif -%}
|
||||
{# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth chat template fixes. Edited from ggml-org & OpenAI #}
|
||||
77
hf/gpt-oss-20b-sonnet-flavored-v1/config.json
Normal file
77
hf/gpt-oss-20b-sonnet-flavored-v1/config.json
Normal file
@@ -0,0 +1,77 @@
|
||||
{
|
||||
"architectures": [
|
||||
"GptOssForCausalLM"
|
||||
],
|
||||
"attention_bias": true,
|
||||
"attention_dropout": 0.0,
|
||||
"eos_token_id": 200002,
|
||||
"experts_per_token": 4,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2880,
|
||||
"initial_context_length": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 2880,
|
||||
"layer_types": [
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"full_attention"
|
||||
],
|
||||
"max_position_embeddings": 131072,
|
||||
"model_type": "gpt_oss",
|
||||
"num_attention_heads": 64,
|
||||
"num_experts_per_tok": 4,
|
||||
"num_hidden_layers": 24,
|
||||
"num_key_value_heads": 8,
|
||||
"num_local_experts": 32,
|
||||
"output_router_logits": false,
|
||||
"pad_token_id": 200017,
|
||||
"quantization_config": {
|
||||
"modules_to_not_convert": [
|
||||
"model.layers.*.self_attn",
|
||||
"model.layers.*.mlp.router",
|
||||
"model.embed_tokens",
|
||||
"lm_head"
|
||||
],
|
||||
"quant_method": "mxfp4"
|
||||
},
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": {
|
||||
"beta_fast": 32.0,
|
||||
"beta_slow": 1.0,
|
||||
"factor": 32.0,
|
||||
"original_max_position_embeddings": 4096,
|
||||
"rope_type": "yarn",
|
||||
"truncate": false
|
||||
},
|
||||
"rope_theta": 150000,
|
||||
"router_aux_loss_coef": 0.9,
|
||||
"sliding_window": 128,
|
||||
"swiglu_limit": 7.0,
|
||||
"tie_word_embeddings": false,
|
||||
"transformers_version": "4.56.2",
|
||||
"unsloth_fixed": true,
|
||||
"use_cache": true,
|
||||
"vocab_size": 201088
|
||||
}
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c6a37ec5be3fb8695de43f1dfe3fb2052ab4d2d84f2b498c8019e7fba1acdf94
|
||||
size 4792272488
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:dade2f964049ef0681510ac27fc68c73bd2c3b14d567f50a3493fd40b431d921
|
||||
size 4798702184
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:59c0be828be7b4a21cf62bcbe536b05a9c333e5dbc8973874d28030674ffde73
|
||||
size 4170342232
|
||||
File diff suppressed because one or more lines are too long
23
hf/gpt-oss-20b-sonnet-flavored-v1/special_tokens_map.json
Normal file
23
hf/gpt-oss-20b-sonnet-flavored-v1/special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|startoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|return|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|reserved_200017|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
hf/gpt-oss-20b-sonnet-flavored-v1/tokenizer.json
Normal file
3
hf/gpt-oss-20b-sonnet-flavored-v1/tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0614fe83cadab421296e664e1f48f4261fa8fef6e03e63bb75c20f38e37d07d3
|
||||
size 27868174
|
||||
186
hf/gpt-oss-20b-sonnet-flavored-v1/tokenizer_config.json
Normal file
186
hf/gpt-oss-20b-sonnet-flavored-v1/tokenizer_config.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user