Llm

78 articles Use search to find specific topics
Showing 24 of 78
Level 4: AI and Building Production AI, vibe coding, and language models. How AI systems actually work in production, how to direct AI …What is AI? AI is software that learns patterns from data instead of following hand-written rules. Here is what that …Tool Use (in Language Models) The capability of a language model to invoke external tools: APIs, code execution, retrieval, computation: and …Structured Output Constraining a language model to emit output that conforms to a specified schema (JSON, regex, grammar). The …Reasoning Models Language models post-trained to allocate substantial inference-time compute to internal reasoning before …Prompt Caching Server-side caching of attention key/value tensors for repeated prompt prefixes, reducing latency and cost for …Mixture of Experts (MoE) A neural network architecture in which only a small subset of parameters is activated for each input, enabling …LLM-as-a-Judge Using a language model as an automated evaluator of another model's outputs: methodology, calibration with …LLM Routing Architectures that direct each request to one of several available language models based on cost, capability, …Function Calling Structured tool invocation by language models: how the model emits typed function calls, how runtimes execute …Chain-of-Thought (CoT) Prompting Eliciting intermediate reasoning steps from language models to improve performance on multi-step problems, …Zero-Shot Learning What zero-shot learning is, how models perform tasks without examples, and when zero-shot approaches are …WebSocket What WebSockets are, how they enable real-time bidirectional communication, and why they are used for …vLLM - High-Performance LLM Serving Engine vLLM is an open-source library for high-throughput, low-latency serving of large language models using …Token Budget The maximum number of tokens allocated for an LLM request or workflow, used to control costs, latency, and …Testing LLM Applications LLM-specific testing strategies: prompt template testing, structured output validation, guardrail …Single Agent vs Multi-Agent Architectures When to use a single AI agent versus a multi-agent system, covering complexity, reliability, cost, and …Reducing LLM Inference Costs in Production Practical strategies for reducing LLM API and hosting costs without sacrificing quality, from caching and …Rate Limiting for LLM and AI Endpoints How to implement rate limiting for AI API endpoints: token bucket and sliding window algorithms, per-user and …RAG vs Long Context Windows for Knowledge Access Comparing retrieval-augmented generation and long context windows as strategies for giving LLMs access to …Prompt Injection Defense Layered defense strategies against prompt injection attacks in production LLM applications: input validation, …Prompt Injection An attack technique where malicious input manipulates an LLM into ignoring its instructions, executing …Prompt Chaining - Breaking Complex Tasks into Steps How to design and implement prompt chains for complex AI tasks, covering chain architecture, error handling, …PII Redaction Pipeline Automated detection and removal of personally identifiable information from LLM inputs and outputs: detection …

78 articles in this section. Search for a specific topic.