Split image of a dark server room on the left and a red-lit processor chip on the right, representing custom inference silicon inside a data center.
Jalapeno is a processor designed for one job: running OpenAI's models inside data centers at scale.

On 24 June 2026, OpenAI and Broadcom unveiled Jalapeno, OpenAI’s first custom processor. It is an application-specific integrated circuit (ASIC) built only for inference, the work of running a trained model to answer a request. For anyone building products on OpenAI’s models, the announcement signals that the company is moving to control its own hardware, the layer that sets the recurring cost of every API call.

What Jalapeno is

Jalapeno is a custom accelerator, not a general-purpose chip. OpenAI calls it an “Intelligence Processor” and says it is the first in a multi-generation hardware platform the two companies are building together. OpenAI handled the chip design. Broadcom contributed silicon implementation and networking technology. Celestica manages boards, racks, and system integration.

The chip targets inference rather than training. Training is the one-time process of building a model. Inference is the repeated process of serving answers from that model. Inference is where the cost shows up every day, because a product like ChatGPT runs inference on every message a user sends.

OpenAI says the design reached tape-out, the point where a finished design is sent for manufacturing, in about nine months. OpenAI describes this as the fastest ASIC development cycle for high-performance semiconductors it is aware of. The company says its own models helped accelerate parts of the design and optimization work.

How an inference chip fits the stack

A custom inference chip sits at the bottom of the stack that serves an AI product. The model and the application run on top of it. Owning this layer lets OpenAI tune the hardware to its specific models.

Application
ChatGPTAPI products
Model
Trained LLM weights
Serving software
Inference runtimeSchedules and batches requests onto the hardware
Silicon
Jalapeno ASICBroadcom networking

Performance and cost claims

OpenAI says early testing shows performance per watt “substantially better” than current state-of-the-art hardware for its target inference workloads. OpenAI also notes these are its own numbers and have not been independently verified.

Broadcom chief executive Hock Tan told Reuters the chip delivers performance on par with Nvidia’s Blackwell processors and Google’s Tensor Processing Units. He claimed roughly 50% cost savings per inference token compared with current-generation graphics processing units (GPUs).

JalapenoNvidia GPUGoogle TPU
TypeCustom ASICGeneral-purpose GPUCustom ASIC
Primary useInferenceTraining and inferenceTraining and inference
OwnerOpenAINvidia (sold to all)Google (internal)
AvailabilityOpenAI data centersOpen marketGoogle Cloud
StatusTape-out, deploying late 2026ShippingShipping

The token cost claim matters because inference is the steady operating expense behind any AI product. A lower cost per token lowers the unit economics of serving millions of users.

Deployment timeline

Initial large-scale deployment is targeted for late 2026 at gigawatt scale, a measure of the electrical capacity the chips will draw across data centers. OpenAI and Broadcom have described a commitment to deploy OpenAI-designed accelerators at 10 gigawatts of capacity through 2029.

Step 1DesignOpenAI designs the ASIC for its own models.
Step 2Tape-outFinished design sent to manufacturing in about nine months.
Step 3DeployInitial rollout targeted for late 2026 at gigawatt scale.
Step 4ScaleCapacity grows toward 10 gigawatts through 2029.

The buildout involves partners including Microsoft. Reporting indicates Microsoft has committed to a large share of the initial production run.

Why it matters

Designing its own inference silicon puts OpenAI alongside the other large model operators that already build custom chips. Google runs its TPUs. Amazon runs Trainium for training and Inferentia for inference. Each company built its own AI hardware to cut its dependence on Nvidia GPUs and to lower the cost of serving models.

For OpenAI, the move marks a shift toward a full-stack infrastructure company that owns the chip, the data center capacity, the model, and the product. It also reduces exposure to GPU supply and pricing. For developers building on OpenAI’s API, the relevant question is whether cheaper inference flows through to lower prices over time. The chip is one input into the broader AI factory model, where compute capacity is treated as the core production asset.

Further reading

Sources