The OpenAI API provides programmatic access to GPT-4, GPT-4o, GPT-3.5-turbo, DALL-E, Whisper, and embedding models. It is the most widely used LLM API and has become the de facto standard that other providers emulate. For enterprise AI projects, the OpenAI API is often the first integration point for proof-of-concept work, though production deployments may migrate to Azure OpenAI or Amazon Bedrock for compliance and enterprise support reasons.

Official documentation: https://platform.openai.com/docs

Core API Endpoints

Chat Completions - The primary endpoint for text generation. You send a list of messages (system, user, assistant roles) and receive a model response. This endpoint powers conversational AI, content generation, summarization, classification, and virtually every text-based AI task.

Embeddings - Converts text into numerical vectors for semantic search, clustering, and classification. The text-embedding-3-small and text-embedding-3-large models offer different cost-performance trade-offs. Embeddings are the foundation for RAG systems: embed your documents, store the vectors, and retrieve semantically similar content at query time.

Images (DALL-E) - Generates images from text descriptions. DALL-E 3 produces high-quality images and supports natural language refinement. Useful for creative content generation, product mockups, and marketing asset prototyping.

Audio (Whisper) - Speech-to-text transcription and translation. Whisper handles multiple languages with high accuracy and is available both through the API and as an open-source model for self-hosting.

Function Calling

Function calling enables the model to generate structured JSON that maps to functions you define. Instead of parsing free-text responses, you declare functions with typed parameters, and the model returns a function call when appropriate. This is the foundation for tool-using agents: the model decides which tool to use and provides the arguments.

Define functions in the API request, and the model either responds with text or requests a function call. Your application executes the function and returns the result to the model for further processing. This loop continues until the model has enough information to provide a final response.

Function calling is more reliable than prompt-based JSON extraction because the model is specifically trained to generate valid function call structures. It supports parallel function calls (requesting multiple tools simultaneously) and strict mode (guaranteeing JSON schema conformance).

Structured Outputs

The response_format parameter constrains model output to valid JSON matching a provided schema. This eliminates the parsing failures that plague free-text extraction. For enterprise applications that feed model outputs into downstream systems, structured outputs are essential for reliability.

Key Model Selection Considerations

GPT-4o - The most capable model for most tasks. Strong reasoning, instruction following, and multi-modal capabilities (accepts images and text). Use as the default for complex tasks.

GPT-4o-mini - A smaller, faster, cheaper model. Use for high-volume tasks where cost matters and the task complexity is moderate: classification, simple extraction, routine summarization.

GPT-3.5-turbo - The legacy cost-optimized model. Still useful for very simple tasks at high volume but increasingly displaced by GPT-4o-mini.

Rate Limits and Scaling

OpenAI enforces rate limits based on tokens per minute (TPM) and requests per minute (RPM). Limits vary by model and account tier. For production workloads, implement retry logic with exponential backoff and request rate limit increases through the OpenAI dashboard.

For high-throughput batch processing, the Batch API processes requests asynchronously at 50% lower cost. Submit a batch of requests, and results are available within 24 hours. This is ideal for document processing, classification, and extraction tasks that do not require real-time responses.

Enterprise Considerations

OpenAI’s standard API processes data on OpenAI infrastructure, which may not meet enterprise compliance requirements. Key considerations: data is not used for training by default on API requests (but verify the current data usage policy), SOC 2 Type II compliance is available, and data residency is limited to OpenAI’s infrastructure locations. For organizations requiring Azure-hosted infrastructure, data residency guarantees, and enterprise support, Azure OpenAI provides the same models with Microsoft’s enterprise compliance framework.

Pricing

OpenAI charges per token (input and output separately). GPT-4o is significantly more expensive than GPT-4o-mini, which in turn is more expensive than GPT-3.5-turbo. Embedding models charge per token. Estimate costs by measuring average token counts for your use case and multiplying by the per-token rate. For cost optimization, use the cheapest model that meets quality requirements and leverage caching for repeated queries.