license, pipeline_tag, tags, base_model, language
license pipeline_tag tags base_model language
apache-2.0 text-generation
sonnet-flavored
gpt-oss-20b
gpt-oss
moe
mxfp4
lora
fine-tune
sonnet-style
claude-style
distilled
gguf
openai/gpt-oss-20b
en

gpt-oss-20b-sonnet-flavored-v1

A LoRA fine-tune of openai/gpt-oss-20b that transfers Claude Sonnet 4.6's response style to GPT-OSS-20B. Final-answer-only training; original weight reasoning and capabilities are maintained, original weight flaws too.

This repo contains both the merged HF safetensors (in hf/) and a GGUF ladder with MXFP4 experts preserved natively across all variants.


What changed and what didn't

This LoRA targets response style only — phrasing, tone, structure, register. It does not target reasoning capability, factual knowledge, or numerical calibration. For factual or numerical tasks, the underlying GPT-OSS-20B's calibration limits apply: this model will produce confidently-wrong answers in the same situations the base model does, just phrased in Sonnet's voice. Style is not a substitute for capability.

The MoE experts (~19B of 20B parameters) are unchanged from base GPT-OSS-20B. The LoRA modified attention projections only, at rank 16. Reasoning happens through the same expert routing as the base model.


Response Comparisons

Prompt 1: Coffee shop loyalty program critique

Prompt:

A city planner argues that because neighborhoods with an abundance of boutique coffee shops have lower crime rates, the city should heavily subsidize new coffee shops in high-crime areas to make them safer. What are the specific logical and statistical flaws in this policy proposal?

Base GPT-OSS-20B:
Below is a concise but comprehensive checklist of the **logical** and **statistical** errors that make the proposal to “heavysubsidize boutique coffee shops in highcrime neighborhoods so they become safer” scientifically weak at best and policydangerous at worst.

---

## 1. Logical Flaws

| # | Flaw | Why it matters |
|---|------|----------------|
| **1** | **Correlation ≠ Causation** | The planner assumes that because coffeeshopdense areas *correlate* with lower crime, the shops *cause* the drop.  No evidence is presented that coffee shops drive crime reductions. |
| **2** | **Reverse causality / selfselection** | Business owners tend to open stores in places that are already safe, accessible and have enough disposable income.  Lower crime may *enable* cafés, not the other way around. |
| **3** | **Ecological fallacy** | The observation is made at the neighborhood level; it cannot be applied to individual neighborhoods or even to a single coffee shops effect on crime in its immediate vicinity. |
| **4** | **Missing causal mechanism** | The argument gives no plausible path  e.g., “more foot traffic → more eyes on the street → fewer crimes”  and ignores countereffects (e.g., increased commercial activity may invite shoplifting or vandalism). |
| **5** | **Oversimplification of crime dynamics** | Crime is driven by a bundle of factors (poverty, unemployment, policing, social networks, housing quality, etc.).  A single intervention (coffee shops) cannot reasonably be expected to shift that equilibrium. |
| **6** | **Neglecting opportunity cost** | Subsidizing cafés diverts public funds that might better reduce crime through proven measures (community policing, afterschool programs, affordable housing, mentalhealth services). |
| **7** | **Potentially antiequity** | Coffeeshop subsidies may accelerate gentrification, raise rents, and displace the very residents the policy is supposed to protect. |
| **8** | **No testable hypothesis** | The proposal does not specify an observable effect size, a target variable, or a way to evaluate success (e.g., crime statistics before/after, foottraffic counts). |

---

## 2. Statistical Flaws

| # | Flaw | Why it matters |
|---|------|----------------|
| **1** | **Omittedvariable bias** | Socioeconomic status, police presence, education, and community cohesion are all correlated with both coffeeshop density and crime.  Ignoring them confounds the estimated relationship. |
| **2** | **Multicollinearity** | Variables like median income, property values, and employment levels often covary with coffeeshop density.  In a regression that includes only coffeeshops, coefficients are unstable. |
| **3** | **Ecological data quality** | Crime counts may be underreported in lowpolicing areas; cafés are often “commercial clusters” that attract visitors, leading to *reporting bias* (more crimes recorded simply because more eyes are watching). |
| **4** | **Smallsample / aggregation bias** | Neighborhoods with few shops and few crimes yield noisy estimates.  Aggregating across heterogeneous census tracts can mask local variations. |
| **5** | **Spurious correlation** | Random chance or a lurking variable (e.g., historic districts) may produce a spurious link.  Correlation coefficients lack context without a causal model. |
| **6** | **Regression to the mean** | Highcrime neighborhoods tend to fluctuate; a temporary dip could be misattributed to coffeeshop openings. |
| **7** | **No causal inference design** | The data (presumably observational) is unsuited to estimate causality without instrumental variables, differenceindifferences, or randomized trials. |
| **8** | **Endogeneity of subsidies** | If subsidies themselves influence where coffee shops open (e.g., subsidizing a neighborhood raises its attractiveness), then coffeeshop density becomes endogenously determined by the policy you are evaluating  classic “simultaneity” problem. |

---

## 3. Practical Policy Issues

1. **Implementation barriers**  Highcrime neighborhoods often lack the foot traffic, infrastructure, or supply chain needed for a profitable café; subsidies alone may not generate business.
2. **Accountability & measurement**  How will “safety” be measured? Crime indices are noisy; alternative metrics (perceived safety, community cohesion) may be harder to quantify.
3. **Unintended outcomes**  More cafés could increase traffic congestion, noise, and risk of smallscale shoplifting or vandalism, potentially offsetting any benefit.
4. **Gentrification risk**  Café openings often precede rising rents; displaced residents may move to poorer neighborhoods, possibly spreading crime rather than containing it.
5. **Legal & regulatory hurdles**  Zoning, licensing, and permitting processes can delay or thwart even a subsidized venture.

---

## 4. A Better EvidenceBased Approach

| Step | What to Do | Why |
|------|------------|-----|
| **1** | **Collect highresolution data** on crime, foot traffic, income, policing, and business density at the block or establishment level. | Avoids ecological fallacy; captures the true environment surrounding a coffee shop. |
| **2** | **Use quasiexperimental designs** (differenceindifferences, instrumental variables, or regression discontinuity) to isolate the impact of cafés on crime. | Controls for confounders and endogeneity. |
| **3** | **Pilot subsidies** in a small set of comparable highcrime neighborhoods while leaving control groups untouched. | Allows measurement of causal effect and costeffectiveness before scaling. |
| **4** | **Include complementary interventions** (community centers, policing reforms, affordable housing) in the evaluation. | Recognizes that coffee shops are likely a small part of a larger toolbox for crime reduction. |
| **5** | **Set explicit metrics** (e.g., 10year trend in residential burglary rates, perceived safety survey scores). | Provides clear accountability and a way to revise or abandon the policy if it fails. |

---

### Bottom line

The planners proposal hinges on a **spurious correlation** that has not been verified by a rigorous causal analysis.  It ignores established drivers of crime, introduces potential displacement harms, and misallocates public resources without a defensible evidence base.  Prior to spending money on subsidies, the city should conduct an empirical study that either validates or refutes the coffeeshopcrime link under controlled conditions.  Only then can policymakers decide whether cafés should be part of a comprehensive crimereduction strategy.
Sonnet-flavored:
This is a classic policy proposal worth rigorously examining. I'll break it into several categories of flaw so it's clear what's wrong:

## The Fundamental Mistake: Correlation ≠ Causation
The planner assumes coffee shops *cause* lower crime, but the data actually points to the opposite

- High-income communities have both more disposable income and better policing/resources
- Retail density may be **resulting from** socioeconomic conditions that also reduce crime
- The underlying factor (community investment, employment, cohesion) could cause **both** outcomes

## Selection Bias Specifically Hard on Urban Planning
The planner ignores what the data actually tells them

- Coffee shops *choose* where profit exists - they're predicting unobserved variables (income level)
- What predicts coffee shop density is probably the *real* predictor of low crime

## Omitted Variable Problem
Multiple confounders simultaneously:
- Police presence/contact hours
- Neighborhood socioeconomic status
- Demographic composition
- Street lighting and design
- Community organization strength

The regressions planners use are likely missing all the actual drivers.

## Partial Substitution Fallacy
Even if coffee shops *actually* somewhat reduce crime,
the planner assumes each new shop works like **all** existing shops in coffee-dense neighborhoods, ignoring dose-response nuance.

## The Gentrification Accusation Gets It Wrong (Ironically)
This specific policy is the exact mechanism planners fear:
- New commercial districts attract higher income residents
- Property values rise
- Original residents are displaced before any safety benefits reach them
This addresses symptoms in transformed neighborhoods rather than problems unchanged

## Cost vs. Effectiveness Issues
The planner focuses on "creating coffee shops," not measurable outcomes
- What evidence supports one shop = X level reduction?
- Wouldn't same funding better support improved lighting, increased police shifts, job programs?
- Opportunity costs aren't being considered

## Bad Counterfactual Reasoning
The planner treats present coffee shop concentration as evidence about absent shops:
- This directionality is suspect because the baseline condition varies across neighborhoods

## Summary of What Should Be Asked Instead
- RCTs: Does adding a shop *in a controlled setting* reduce crime?
- Panel data with lagged independent variables
- Identifying strategies (difference-in-differences after comparable policy changes)
- Cost/benefit against alternative interventions

The strongest criticism is that this appeals to warm comfort images while the actual economics points elsewhere - which planners use to justify precisely this sort of top-down commercialism.
---
Prompt 2: Hibernation premise-questioning

Prompt:

If a technological breakthrough allowed humans to safely and comfortably hibernate for up to six months a year to save money and energy, how would this alter global labor markets, real estate, and the structure of the traditional education system?

Base GPT-OSS-20B:
**What would it mean for humans to be able to “hibernate” safely for half the year?**

If the technology were as reliable, inexpensive, and comfortable as a good mattress, it would act like a new category of **timeshare** for the human body: 6 months in a lowenergy state and 6 months of normal activity. That simple shift would ripple through every sector that depends on human presence—workplaces, housing markets, and schools. Below is a map of likely transformations, grouped by the three domains you asked about.

---

## 1. Global Labor Markets

| Feature | Current Reality | With 6month Hibernation | Likely Shift |
|---------|-----------------|-------------------------|--------------|
| **Annual labor supply** | 52 weeks ≈ 2,080hrs/year (fulltime) | ~1,040hrs/year | Supply per worker halves. The economy has to either get the same output with fewer people or pay more for shorter hours. |
| **Wages & PayPerHour** | Stable hourly rate; most workers earn 40hrs/week | Hourly rates might rise as supply shrinks, or employers pay a premium for “awake” periods | We could see a twotier wage structure: “awake wage” versus “hibernationallowance.” |
| **Hours worked per employer** | 5day week; oncall shifts, overtime | Employers may restructure to 6month blocks: e.g., “active quarter” followed by a “pause.” | Predictable quarterly patterns give firms a builtin rest cycle. |
| **Job design & staffing models** | Continuous cycles of production, service, or creative work | Rotational teams that keep a project going while others sleep; plugandplay gig workers for the 6 months when most of the workforce is “asleep.” | Many shortterm contracts, hybrid telework + onsite “shiftswap” schedules. |
| **Automation & AI** | Machines take over repetitive or hazardous tasks, but still need human oversight | During hibernation, AI and robots can run facilities autonomously; the “awake” workforce can focus on oversight, strategy, or valueadding tasks that only humans do. | Accelerated AI deployment in sectors that require 24/7 uptime; humans become supervisors rather than operators. |
| **Workforceflexibility & Gig Economy** | Growing but still anchored in weekly contracts | Perfect fit for gig workers: people can “log on” during the 6 active months and take the next 6 months off. | Gig platforms may create monthly/quarterly “turnover cycles” where each workers shift is known in advance. |
| **Crossborder labor & immigration** | Visa restrictions tied to continuous employment | Companies can hire globally for short bursts, reducing visa bottlenecks. | Potential rise in remote work visas; countries may open shorterterm “researcher” or “contractor” permits. |
| **Equity / Stratification Risks** | Overtime drives inequality | If hibernation is expensive, those who cannot afford it may work longer, increasing burnout and health disparities. | Policy may need to guard against a class of “perpetual laborers.” |

### Bottom line

- **Supply shock**  fewer hours per worker → higher wages or more workers hired.
- **Demand shift**  firms need to plan around the predictable rhythm of workers “on” and “off” periods.
- **Productivity gains**  deeper rest could improve focus, reduce errors, and heighten creativity during awake periods.
- **New industries**  hibernation maintenance, cryogenic storage leasing, sleephealth tech, AIsupervision firms.

---

## 2. Real Estate

| Feature | Current Reality | With 6month Hibernation | Likely Shift |
|---------|-----------------|--------------------------|--------------|
| **Occupancy rates** | 24hrs/day, full occupancy; houses consume heating, cooling, utilities constantly | Half the time consumption drops to near zero | Residential space can be revalued by “active” days rather than sheer square footage. |
| **Property taxes & insurance** | Often based on property value and full-year occupancy | Potential for tax rebates or reduced premiums for homes used only half the year; new “hibernationfriendly” insurance products. | Policy adjustments to reward lower utility consumption. |
| **Urban density** | High density in cities to serve constant demand for commuting & services | Less daily traffic, fewer commuters → city centers could pivot to cultural, recreational, or retail uses during active months. | Some neighborhoods may become “sleep hubs” with communal hibernation pods. |
| **Home design** | Focus on allyear comfort, heating/cooling systems | New design elements: integrated cryopods, isolated thermal control zones, ultralowenergy architecture for extended dark periods. | Architecture firms may offer “hibernationready” homes as a niche market. |
| **Commercial real estate** | 9to5 office occupancy; warehouses operate around the clock | Offices can be reduced in size or repurposed: one “awake” coworking space per 6month block, with additional robotic labs running during hibernation. | Increase in mixeduse developments that host both human offices and autonomous manufacturing. |
| **Resort & vacation markets** | People travel during their free time (often winter or summer) | “Winter to bed” people may shift vacations to the active months; conversely, tourism could expand into the hibernation months if visitors are comfortable spending nights in “sleep stations.” | Travel industry might create “sleeplifted” packages: stay at a hibernation facility while the host sleeps. |
| **Infrastructure** | Power grid must supply 24/7 loads | Significant opportunity for microgrids, battery storage, and demandresponse systems that take advantage of long, lowload periods. | Renewableenergy projects may become more viable with predictable “night” curtailment periods. |

### Bottom line

- **Reduced utility footprint** → cheaper homes, lower energy taxes, new property valuations.
- **Design innovation**  specialized pods, adaptive architecture, and modular housing that can be “offline.”
- **Urban reimagining**  lower commuter traffic, more public spaces, potential shift from highdensity office cores to lowdensity mixeduse hubs.

---

## 3. Traditional Education System

| Feature | Current Reality | With 6month Hibernation | Likely Shift |
|---------|-----------------|--------------------------|--------------|
| **Academic calendar** | Fallspring semesters + summer break (~4 months) | Two “active” halves (e.g., JanJune and JulyDec); each half could become a full academic year. | Schools may adopt a **biannual model**: compact, intensive periods of 23 months separated by a hibernationbreak. |
| **School hours & days** | 5 days/week, 68 hrs/day | Potential for *allday* immersive modules during the active period (e.g., 12hour “lab” sessions). | Accelerated curricula that cram subjects into tighter windows. |
| **Distance vs. inperson** | Shifted heavily during COVID, but core remains physical | Lowered necessity for continuous oncampus presence → more hybrid / virtual learning when students are awake. | Massive adoption of “synchronous learning” platforms that allow realtime engagement only during the 6 months, with recorded modules stored for later completion. |
| **Student lifecycle** | 1822 yrs for undergrad; structured apprenticeship tracks | Possible *dualtrack* programs: a “sleepbreak” year after freshman year to fund tuition or life experience. | Universities may offer *“SleepBreak Credits”* that count toward graduation. |
| **Faculty workload** | Continuous tenure cycles, semesterbased contracts | Reduced teaching load: faculty can focus on research or mentorship during the 6month window; meanwhile AI tutors fill gaps. | More emphasis on “compressed teaching” and microteaching units. |
| **Teacher pay & job security** | Annual contracts, some reliance on overtime | Possible shift to quarterly or semiannual contracts with builtin sabbaticals. | Rise in *“InstructorAtoZ”* parttime roles, more emphasis on projectbased teaching. |
| **Curriculum design** | Spread over 4 years with prerequisites | Shorter, modular courses that can be taken in a single 23 month stint; prerequisites become “bootcamp” modules. | "Stackable" credits  learners acquire packages of knowledge within each active half. |
| **Student life & cocurricular** | Sports, clubs, internships spread throughout the year | Clubs may operate only during active periods, but the break can be used for internships or research projects. | New *“Intercalated Summer Internships”* that replace long “summer breaks.” |
| **Higher education financing** | Tuition + scholarships; rising debt burden | Tuition costs might fall (less perhour instruction), but also student life expenses may rise if people stay in dorms for the whole active period. | Debt consolidation may shift toward *“hibernateinterest”* packages: lower cost during sleeping months. |

### Bottom line

- **Intensification**  learning compressed into fewer but denser periods.
- **Flexibility**  more modular, stackable courses that align with the two active windows.
- **Techdriven teaching**  AIfacilitated tutoring between periods, reduced need for continuous faculty presence.
- **Policy changes**  accreditation bodies may adjust standards to reflect new calendar structures.

---

## What Makes These Shifts Work (or Not)?

1. **Adoption Rate**  
   - If only a subset of the population hibernates, well see *segmented* labor markets and housing usage: some houses stay live, others become “sleep lofts.”  
   - High adoption could trigger nearuniversal “halfyear work” norms.

2. **Cost & Accessibility**  
   - If cryopods are expensive, only the wealthy will hibernate; widening the gap between “wake” and “hibernation” labor pools.  
   - Subsidies or public sectors might need to intervene to avoid exploitation.

3. **Health & Safety Regulations**  
   - Need for standards: carbon dioxide levels, monitoring devices, emergency protocols if a hibernation fails.  
   - Occupational health laws will need new guidelines for “continuous rest cycles.”

4. **Cultural Acceptance**  
   - Some industries (healthcare, emergency services) may resist any downtime; others may embrace it.  
   - Public sentiment may shape policy: mass publicservice workers might lobby against mandatory sleep breaks.

5. **Technological Reliability**  
   - The bestcase scenario presumes errorfree hibernation.  
   - Failure modes could lead to emergency work shifts, eroding trust.

---

## A Quick “FutureScenario” Sketch

1. **The Workplace:**  
   *A software firm runs three 6month cycles a year. Two teams are on duty simultaneously, each covering a 12hour shift. Between cycles, the whole staff sleeps in cryopods in a dedicated facility built beneath the office. AI bots handle routine deployments while humans supervise.*

2. **The Home:**  
   *Families invest in “smartsleep” houses that automatically adjust temperature and lighting for hibernation months. Each house has a private pod, but houses are shared with neighbors who are active when youre asleep.*

3. **The Classroom:**  
   *A universitys biology program now consists of 4 intense “boot camps,” each lasting 3 weeks. Students attend all four during the active period and sleep through the rest. Between boot camps, they work on lab projects or internships, guided by AIbased mentors.*

---

## Final Takeaway

A breakthrough that lets humans hibernate half the year would **halve the annual humantime input** for every sector that currently relies on continuous presence. The resulting shifts could be powerful:

- **Labour:** *Supply compression → wage pressure, new gig structures, automated oversight.*  
- **Housing:** *Lower energy use → repriced real estate, specialized sleeppods, redesigned urban cores.*  
- **Education:** *Intensified, modular curricula, stronger reliance on tech tutors, restructured calendars.*

Everything would revolve around two “seasons”: the **awake season** (high activity, high productivity) and the **sleep season** (low demand, low consumption). The balance between these seasons would set the rhythm of economies worldwide. Whether societies choose to dance to that rhythm—or fight against it—will depend on the technologys cost, accessibility, and the values we attach to continuous labor versus restorative rest.
Sonnet-flavored:
Honest assessment requires recognizing this creates fundamentally opposing pressures:

**What's saving money vs. what's causing new costs**

Savings would likely be outweighed or redirected rather than creating a surplus - enormous structural disruption hasn't eliminated conflict (automation did similarly but slowly)

## Labor Markets

**Actual productivity loss**
- 6 months active per year reduces labor supply by half *if work can't be simply compressed*
- Gig economy collapses without continuous availability
- Supervision that was redundant between shifts becomes necessary
- Medical/critical monitoring would require replacement -> drives *more* automation concurrently
- The research productivity puzzle: projects extended, funding cycles disrupted

**Who benefits anyway?**
- Low-wage workers using seasonal unemployment barely once may not see savings
- Pre-retirement cohorts already commuting - may opt-out entirely
- This could compress the labor force, expanding precariat status

**Honest counterarguments**
- Work-life productivity research suggests current continuous hours have diminishing returns
- Time shift could enable staggered process cycles, reducing queueing/lags
- But these are subtle effects requiring architectural redesign

## Real Estate

**Market disruptions with mixed winners/losers**
- Short-term rentals doomed -> net loss for cities dependent on tourist turnover
- Unforced: shorter occupancy duration increases depreciation pressure - capital tied up faster
- Office space: significant underutilization but potentially because manufacturing/production moved online before change, allowing gradual compression
- **But commercial needs persist** while humans aren't working

## Education

**Logical structural changes less compelling**
- Education could redistribute over the year, but existing curriculum structured around seasonal work cycles will resist modification
- Onboarding retreats replaced by interruptions in learning continuity
- Physical campuses genuinely compress values

**What would meaningfully change?**
- Less pressure on "enforced attendance" for non-essential classes -> real cultural shift required

## Significantly Harder Problems Hidden

- Age discrimination: older workers could claim greater need for biomedical intervention, raising liability costs for employers
- Global south inequality multiplies (automation + institutional dependency creates technology debt)
- Retiree patterns collapse - pension systems restructured beyond obvious replacement

## What Most Analyses Get Wrong

Assuming savings just becomes extra leisure time, whereas frameworks show:
- Occupational scarcity reactions distort markets away from productive direction
- Institutional dependency on regular labor supply creates regressive burden on low-income users

**Bottom line:** Energy and money savings are peripheral in economic modeling. Labor output contraction would likely produce the dominant macro effect. **I'd expect partial or uneven adoption for years before any genuine net increase in capacity.**

What's your interest here?

Files

File Type Purpose
hf/gpt-oss-20b-sonnet-flavored-v1/ safetensors Merged HF format with MXFP4 experts native; for transformers / vLLM
gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf GGUF Q8_0 non-experts, MXFP4 experts
gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf GGUF Q6_K non-experts, MXFP4 experts
gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf GGUF Q5_K_M non-experts (with fallbacks, see note), MXFP4 experts
gpt-oss-20B-sonnet-flavored-v1-BF16.gguf GGUF Reference BF16 non-experts, MXFP4 experts

Training Details

Detail Value
Base model openai/gpt-oss-20b
Architecture MoE — 20B total / 3.6B active, 32 experts top-4
Method LoRA (rank 16) via Unsloth
Training dataset Sonnet 4.6 distilled responses by TeichAI
Training type Final-answer-only (no reasoning targeting)

The LoRA adapter is not included in this v1 release. Future versions will publish the standalone adapter alongside the merged weights.


Quantization Details

Detail Value
Quants BF16, Q8_0, Q6_K, Q5_K_M (all with native MXFP4 experts)
Quantized by jorge-erdb
Method llama.cpp with explicit MXFP4 expert preservation

MXFP4 experts preserved natively across the entire ladder. The MoE expert tensors (~19B of 20B params) remain in their original MXFP4 format in every quant. Only non-expert tensors (router, attention, embeddings, layernorms) vary in precision across the ladder. No experts are decompressed and requantized at any point.


Download

pip install -U "huggingface_hub[cli]"

# BF16 (full precision non-experts, MXFP4 experts)
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*BF16*.gguf" --local-dir ./

# Q8_0
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q8_0.gguf" --local-dir ./

# Q6_K
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q6_K.gguf" --local-dir ./

# Q5_K_M
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q5_K_M.gguf" --local-dir ./

# Merged HF safetensors (for transformers / vLLM)
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "hf/*" --local-dir ./

Credits

  • LoRA training & quantization: jorge-erdb
  • Distilled training data: TeichAI — Sonnet 4.6 distillation
  • Base model: OpenAI — GPT-OSS-20B
  • Style source: Anthropic — Claude Sonnet 4.6 (indirectly via distillation)
  • Tooling: Unsloth, llama.cpp

gpt-oss-20b

Try gpt-oss · Guides · Model card · OpenAI blog


Welcome to the gpt-oss series, OpenAIs open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Were releasing two flavors of these open models:

  • gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
  • gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)

Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise.

Note

This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model.

Highlights

  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Full chain-of-thought: Gain complete access to the models reasoning process, facilitating easier debugging and increased trust in outputs. Its not intended to be shown to end users.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Agentic capabilities: Use the models native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.

Inference examples

Transformers

You can use gpt-oss-120b and gpt-oss-20b with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.

To get started, install the necessary dependencies to setup your environment:

pip install -U transformers kernels torch 

Once, setup you can proceed to run the model by running the snippet below:

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-20b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Alternatively, you can run the model via Transformers Serve to spin up a OpenAI-compatible webserver:

transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b

Learn more about how to use gpt-oss with Transformers.

vLLM

vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match

vllm serve openai/gpt-oss-20b

Learn more about how to use gpt-oss with vLLM.

PyTorch / Triton

To learn about how to use this model with PyTorch and Triton, check out our reference implementations in the gpt-oss repository.

Ollama

If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama.

# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b

Learn more about how to use gpt-oss with Ollama.

LM Studio

If you are using LM Studio you can use the following commands to download.

# gpt-oss-20b
lms get openai/gpt-oss-20b

Check out our awesome list for a broader collection of gpt-oss resources and inference partners.


Download the model

You can download the model weights from the Hugging Face Hub directly from Hugging Face CLI:

# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
pip install gpt-oss
python -m gpt_oss.chat model/

Reasoning levels

You can adjust the reasoning level that suits your task across three levels:

  • Low: Fast responses for general dialogue.
  • Medium: Balanced speed and detail.
  • High: Deep and detailed analysis.

The reasoning level can be set in the system prompts, e.g., "Reasoning: high".

Tool use

The gpt-oss models are excellent for:

  • Web browsing (using built-in browsing tools)
  • Function calling with defined schemas
  • Agentic operations like browser tasks

Fine-tuning

Both gpt-oss models can be fine-tuned for a variety of specialized use cases.

This smaller model gpt-oss-20b can be fine-tuned on consumer hardware, whereas the larger gpt-oss-120b can be fine-tuned on a single H100 node.

Citation

@misc{openai2025gptoss120bgptoss20bmodel,
      title={gpt-oss-120b & gpt-oss-20b Model Card}, 
      author={OpenAI},
      year={2025},
      eprint={2508.10925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10925}, 
}
Description
Model synced from source: jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf
Readme 54 KiB
Languages
Jinja 100%