NEWS GPT-5.3-Spark — a model that writes code faster than you can blink

pinkman · Feb 13, 2026

It doesn't write code for you, but literally anticipates your desires in real time.

OpenAI has unveiled the GPT-5.3-Codex-Spark model , the first in its lineup designed from the ground up for real-time code processing. The system generates over 1,000 tokens per second and is designed for rapid dialogue with developers rather than long, autonomous chains of actions. The model is now available in a research preview for ChatGPT Pro users.

Codex-Spark is built on GPT-5.3-Codex, but is significantly more compact and operates with minimal latency in mind. It runs on a specialized computing platform developed in collaboration with Cerebras. This mode allows for editing program fragments , rebuilding logic, refining interfaces, and immediately seeing the results, without lengthy response times. The format is designed for collaborative programming, where not only accuracy but also instant response is crucial.

At launch, the model supports contexts of up to 128,000 tokens and works only with text. During the preview period, separate request limits apply; these do not consume the standard quota. Under high load, queues and access delays are possible.

The developers emphasize that the system's behavior is tuned for interactive work. By default, it makes targeted adjustments and doesn't run tests unless explicitly requested. This lightweight mode allows for stopping mid-response, changing tasks, and quickly iterating. In tests on the SWE-Bench Pro and Terminal-Bench 2.0 engineering problem sets, the model demonstrated high accuracy and performed significantly faster than GPT-5.3-Codex.

The speed increase is due not only to the architecture itself but also to changes in the server side. OpenAI redesigned the request processing chain from the client to the response. According to the company, latency for the full request-response cycle has been reduced by 80%, the overhead for each token has been reduced by 30%, and the time to first token has been halved. For Codex-Spark, persistent WebSocket connections are used by default, and they plan to later enable this for other models.

Computations are performed on the Cerebras Wafer Scale Engine 3 accelerator, a specialized AI chip designed specifically for fast inference. This adds a dedicated low-latency service layer to the OpenAI infrastructure. GPUs, meanwhile, remain the primary platform for training and large-scale inference. The developers envision combining both hardware types in a single task to balance speed and cost.

Codex-Spark is trained with the same security mechanisms as OpenAI's core models, including cyber-risk-related restrictions. According to the company's internal assessment, the system does not reach high-risk thresholds in the areas of cybersecurity and biology.

The release of Codex-Spark is seen as the first step toward a dual-mode Codex system, combining a fast interactive mode with longer reasoning for complex problems. Future updates are expected to include larger versions, expanded context, and support for multimodal inputs.

NEWS GPT-5.3-Spark — a model that writes code faster than you can blink

pinkman

BOSS

Similar threads