OpenAI's o1 and o3 Are Hiding Their Thinking Process: Why That Matters for Your AI Workflow

A thinking trace is the step-by-step reasoning an AI model works through before delivering its final answer, and OpenAI is deliberately hiding it from most users. Unlike standard AI responses that jump straight to conclusions, thinking trace models show their logic, catch their own mistakes, and reconsider assumptions before committing to an answer. This capability has become standard across major AI labs in 2026, but OpenAI's approach to restricting access sets it apart from competitors like Anthropic and Google .

What Exactly Is a Thinking Trace, and How Does It Work?

When you ask a standard AI model like GPT-4o a question, it predicts the next word and fires back an answer. Fast, but shallow. A thinking trace model does something fundamentally different. It generates a hidden scratchpad of reasoning first, working through the problem like a human would, and only then writes the final answer you see .

The mechanism is straightforward: you send a question, the model doesn't answer immediately. Instead, it generates a block of internal reasoning, often called "thinking tokens" or the "scratchpad." During this phase, the model breaks the problem into sub-problems, considers multiple approaches, catches its own errors mid-reasoning, and revises its work before finalizing. Only after this thinking phase does the model produce the response you see. The final answer is conditioned on everything the model worked through .

Here's what surprises most users: the thinking trace is not a post-hoc explanation. The model isn't explaining what it already did. The reasoning happens in real time and genuinely affects the output. Experiments show that cutting off thinking mid-way produces noticeably worse answers on hard tasks .

Why Is OpenAI Hiding Its Thinking Traces While Competitors Show Theirs?

OpenAI actively restricts access to o1 and o3 thinking tokens, a deliberate policy decision that is not a technical limitation. The company cites AI safety and competitive advantage as reasons for this restriction . In contrast, Claude Sonnet 4.6 and Claude Opus 4.6 display adaptive thinking that users can toggle on or off, and Google's Gemini 2.0 Flash Thinking shows the full trace to users .

For o1, OpenAI shows only a summary of the thinking process, keeping the detailed reasoning hidden by policy. For o3, the consumer interface shows a summary, but the API version hides thinking tokens entirely by default. This creates a transparency gap that affects how users understand and trust the model's reasoning .

How to Evaluate Which Thinking Model Fits Your Needs

  • Visibility Level: Claude Sonnet 4.6 and Opus 4.6 show adaptive thinking automatically, adjusting effort based on task complexity. OpenAI o1 shows summary-only reasoning. Gemini 2.0 Flash Thinking displays the full trace. Choose based on whether you need to audit the model's reasoning process.
  • Token Cost: Thinking trace models cost 2x to 20x more tokens than standard responses. Claude Sonnet 4.6 on a complex math problem uses approximately 4,200 output tokens plus 4,580 thinking tokens. OpenAI o3 on the same problem uses roughly 6,800 output tokens plus 7,090 thinking tokens. Gemini 2.0 Flash Thinking uses approximately 3,100 output tokens plus 3,510 thinking tokens .
  • Task Suitability: Thinking traces excel at complex math, multi-step coding, and logical puzzles. For simple questions, thinking traces can actually produce worse answers due to overthinking, a failure mode that most articles don't discuss .

The token cost comparison reveals a critical insight: thinking trace models are not created equal. Testing conducted in March 2026 across Claude Sonnet 4.6 with adaptive thinking at high effort, OpenAI o3, and Gemini 2.0 Flash Thinking on identical advanced math problems showed significant variation in efficiency. Claude's adaptive approach at low effort uses only 1,140 output tokens on the same problem, demonstrating that effort level dramatically affects cost .

Thinking Traces vs. Chain of Thought: A Critical Distinction

Most people confuse thinking traces with chain of thought, but they are fundamentally different. Chain of thought is a prompting technique where you tell the model "think step by step," and the model writes reasoning in its output alongside the answer. You're prompting it to reason. A thinking trace is a model architecture feature where reasoning happens in a separate, dedicated pass, often in a different token stream, before the final output is written. You don't prompt for it; the model is trained to do it automatically on hard tasks .

This distinction matters when you're using AI agents, the kind of multi-step AI agent systems that chain multiple model calls together. Knowing which model is thinking and which is just responding changes how you architect the whole pipeline. Chain of thought is something you do to a model. A thinking trace is something the model does to itself .

The Real Cost of OpenAI's Transparency Restriction

OpenAI's policy to hide thinking tokens creates a practical problem for users who want to understand or debug model behavior. When you can't see the reasoning, you can't catch errors before they cost you. Claude's approach of showing adaptive thinking allows users to review the model's logic and identify where it went wrong. Gemini's full trace visibility provides complete transparency into the reasoning process .

For organizations building AI systems, this transparency gap affects trust and auditability. If you're using o1 or o3 in a high-stakes application, you're paying for the thinking tokens but can't inspect what the model actually thought. This creates a black box problem that competitors have solved by default .

The landscape of thinking trace models continues to evolve. As of March 2026, every top-tier AI lab ships at least one model with visible reasoning. OpenAI has o1 and o3. Anthropic has Claude Sonnet 4.6 and Opus 4.6. Google has Gemini Flash Thinking. The reasoning era is here, but how you access that reasoning depends on which model you choose .