Llm
Recent articles
Showing 24 of 78
Level 4: AI and Building
Production AI, vibe coding, and language models. How AI systems actually work in production, how to direct AI …What is AI?
AI is software that learns patterns from data instead of following hand-written rules. Here is what that …Tool Use (in Language Models)
The capability of a language model to invoke external tools: APIs, code execution, retrieval, computation: and …Structured Output
Constraining a language model to emit output that conforms to a specified schema (JSON, regex, grammar). The …Reasoning Models
Language models post-trained to allocate substantial inference-time compute to internal reasoning before …Prompt Caching
Server-side caching of attention key/value tensors for repeated prompt prefixes, reducing latency and cost for …Mixture of Experts (MoE)
A neural network architecture in which only a small subset of parameters is activated for each input, enabling …LLM-as-a-Judge
Using a language model as an automated evaluator of another model's outputs: methodology, calibration with …LLM Routing
Architectures that direct each request to one of several available language models based on cost, capability, …Function Calling
Structured tool invocation by language models: how the model emits typed function calls, how runtimes execute …Chain-of-Thought (CoT) Prompting
Eliciting intermediate reasoning steps from language models to improve performance on multi-step problems, …Zero-Shot Learning
What zero-shot learning is, how models perform tasks without examples, and when zero-shot approaches are …WebSocket
What WebSockets are, how they enable real-time bidirectional communication, and why they are used for …vLLM - High-Performance LLM Serving Engine
vLLM is an open-source library for high-throughput, low-latency serving of large language models using …Token Budget
The maximum number of tokens allocated for an LLM request or workflow, used to control costs, latency, and …Testing LLM Applications
LLM-specific testing strategies: prompt template testing, structured output validation, guardrail …Single Agent vs Multi-Agent Architectures
When to use a single AI agent versus a multi-agent system, covering complexity, reliability, cost, and …Reducing LLM Inference Costs in Production
Practical strategies for reducing LLM API and hosting costs without sacrificing quality, from caching and …Rate Limiting for LLM and AI Endpoints
How to implement rate limiting for AI API endpoints: token bucket and sliding window algorithms, per-user and …RAG vs Long Context Windows for Knowledge Access
Comparing retrieval-augmented generation and long context windows as strategies for giving LLMs access to …Prompt Injection Defense
Layered defense strategies against prompt injection attacks in production LLM applications: input validation, …Prompt Injection
An attack technique where malicious input manipulates an LLM into ignoring its instructions, executing …Prompt Chaining - Breaking Complex Tasks into Steps
How to design and implement prompt chains for complex AI tasks, covering chain architecture, error handling, …PII Redaction Pipeline
Automated detection and removal of personally identifiable information from LLM inputs and outputs: detection …
78 articles in this section. Search for a specific topic.
Open source projects