GPT-5.5 is OpenAI's clearest bet yet on AI that can finish the job
OpenAI launched GPT-5.5 on April 23, 2026, with stronger long-context, coding, and tool-use results, a wider Codex push, and a tighter safety posture that makes the version bump more consequential than it first looks.
Maya Chen
Enterprise AI correspondent
Published Apr 24, 2026
Updated Apr 24, 2026
6 min read

Overview
GPT-5.5 landed on April 23, 2026, and OpenAI is selling it less like a chatbot refresh and more like a work model that can keep moving without constant supervision. The headline claim is not that it answers one hard question better. It is that it can take a messy job, check context across tabs and files, use tools, and keep pushing toward an output with less hand-holding than GPT-5.4 needed.
That framing matters because the launch arrived after a short, noisy rumor cycle on X. Earlier on April 23, the OpenAI Developers account posted a cryptic "NS41" teaser that many people read as a 5.5 signal. But the hard facts only arrived later that day, when OpenAI published the launch page and safety report with the rollout tiers, benchmark tables, pricing, and the reason API access is not live yet.
GPT-5.5 turns OpenAI's pitch from chat to execution
GPT-5.5 is OpenAI's clearest attempt yet to move the conversation away from one-shot answers and toward AI that can carry real work from start to finish. In the official launch post, OpenAI said the model is stronger at agentic coding, knowledge work, long-context reasoning, computer use, and even early scientific research. That is a bigger claim than a normal model bump. It says the company thinks the next buying decision is about whether an AI model can stay useful across a longer chain of actions, not just whether it can win a benchmark screenshot.
The early examples point in the same direction. OpenAI said 85% of users in the new Codex experience were choosing GPT-5.5 for work inside their coding flows. The company also highlighted finance workloads that involved reviewing 24,771 K-1 forms and said GPT-5.5 can finish the same work with far fewer tokens than GPT-5.4. That matters because token efficiency changes the economics. A model that costs more per token can still be the cheaper tool if it needs fewer retries, less scaffolding, and less cleanup after the first pass.
GPT-5.5 availability, pricing, and API timing are part of the story
OpenAI rolled GPT-5.5 out in ChatGPT to Plus, Pro, Business, and Enterprise on April 23, 2026. The same day, the company started shipping it in Codex for Plus, Pro, Business, Enterprise, Edu, and Go, with a 400,000-token context window there. The API is close, but not live yet. OpenAI said developers will get 1 million tokens of context when API access opens, and Microsoft said GPT-5.5 would become generally available in Foundry on April 24.
Pricing is not a footnote here. OpenAI listed GPT-5.5 at $5 per million input tokens and $30 per million output tokens, while GPT-5.5 Pro comes in much higher at $30 input and $180 output. Flex and Batch pricing cut that in half, and priority processing is priced at 2.5 times the normal rate. On paper, that is a material step up from GPT-5.4. But OpenAI and outside testing both point to the same counterargument: GPT-5.5 often needs materially fewer tokens to finish a task. Artificial Analysis estimated that the net cost increase on its overall index looked closer to about 20% once token savings were factored in. That is still more expensive. It is just not as punishing as the raw rate card first suggests.
GPT-5.5 benchmarks show where the gains look real and where caution still matters
The benchmark picture is strong, but it is not a clean sweep. OpenAI's own tables show GPT-5.5 beating GPT-5.4 on Terminal-Bench, GDPval, OSWorld, Tau2, and long-context retrieval work such as MRCR. The jump on some long-context tasks is especially striking. On OpenAI's chart, GPT-5.5 reached 74.0 on MRCR at 512K to 1M context, versus 36.6 for GPT-5.4. Graphwalks-BFS at 1M context showed a similar gap, with GPT-5.5 at 45.4 and GPT-5.4 at 9.4. That is the kind of spread that changes what teams are willing to test in document-heavy or browser-heavy flows.
Still, the launch is not a license to stop reading carefully. On public SWE-Bench Pro, OpenAI's chart puts GPT-5.5 at 58.6, behind Claude Opus 4.7 at 64.3. BrowseComp Pro also stays tight. Humanity's Last Exam with tools remains more of a clustered race than a knockout. Artificial Analysis ranked GPT-5.5 first overall on its Intelligence Index, which is a serious result, but the same group also said the model still shows a high hallucination rate on its Omniscience benchmark. So the right read is not that GPT-5.5 has solved reliability. The better read is that it appears to be more capable across a wider spread of work, while still demanding the kind of review serious teams already expect.
X spotted the launch window, but not the hard facts
X was useful before the announcement, just not in the way rumor accounts like to claim. It helped show that something was close. Earlier on April 23, OpenAI Developers posted the "NS41" teaser. Other posts and screenshots floating around the platform pointed to a model name, a near-term release, and a stronger Codex tie-in. In that limited sense, the chatter was directionally right.
But the parts that actually matter for builders and buyers did not come from the rumor cycle. X did not settle the rollout tiers, the API schedule, the prepared safety posture, or the pricing ladder. Those details came from OpenAI's launch materials, the safety write-up, Microsoft's Foundry note, and later coverage from outlets that got the full brief. After launch, OpenAI's own posts about GPT-5.5 in Codex reinforced the same theme as the blog post: this is a model meant to browse, inspect, use tools, and keep working across a broader surface than a plain chat tab.
Why GPT-5.5 matters beyond one version bump
The most important part of the GPT-5.5 launch is not the number. It is the mix of capability, packaging, and restraint. OpenAI is pitching a model that can do more across coding, research, and browser-driven work, while also admitting that wider access needs tighter controls. The company said GPT-5.5 is treated as High for biological and chemical capability and for cyber capability under its preparedness framework, and that is part of why the API rollout is more controlled than the ChatGPT rollout.
That makes GPT-5.5 feel like a product for people who already know where AI breaks. Enterprises want more initiative from a model, but they also want clearer guardrails when that model starts touching higher-stakes work. Builders want a longer context window and better tool use, but they also need a pricing curve that does not explode the minute a task runs long. GPT-5.5 does not answer every one of those concerns. What it does do is show where OpenAI thinks the market is going next: away from isolated chat and toward AI that can take a job, hold context, use software, and finish more of the work on its own.
Reader questions
Quick answers to the follow-up questions this story is most likely to leave behind.