Amazon Athena - Serverless SQL Analytics
A comprehensive reference for Amazon Athena: serverless query engine for S3 data, integration with Glue Data Catalog, and analytics patterns …
A comprehensive reference for Amazon Athena: serverless query engine for S3 data, integration with Glue Data Catalog, and analytics patterns …
A comprehensive reference for Amazon Connect: cloud contact center platform, AI integration with Lex and Bedrock, real-time analytics, and …
Amazon DynamoDB is a fully managed, serverless NoSQL database service that delivers single-digit millisecond performance at any scale for …
A comprehensive reference for Amazon EMR: managed Spark and Hadoop clusters, large-scale data processing, and feature engineering for …
A comprehensive reference for Amazon Forecast: managed time series prediction, predictor training, and integration patterns for demand …
A comprehensive reference for Amazon Fraud Detector: building fraud detection models, defining rules, and integrating real-time fraud …
A comprehensive reference for Amazon Glue: serverless data integration, ETL jobs, data catalog, and data preparation for AI/ML pipelines.
A comprehensive reference for Amazon HealthLake: FHIR-compliant healthcare data storage, NLP enrichment, and analytics for health AI …
A comprehensive reference for Amazon Kendra: ML-powered enterprise search, document indexing, natural language queries, and integration …
A comprehensive reference for Amazon Lex: building chatbots and voice interfaces, intent recognition, slot filling, and integration with …
A comprehensive reference for Amazon Lookout for Metrics: automated anomaly detection in business and operational metrics, alerting, and …
A comprehensive reference for Amazon Lookout for Vision: automated visual inspection, defect detection, and deployment patterns for …
A comprehensive reference for Amazon Managed Grafana: managed visualization service, data source integration, and dashboard patterns for …
A comprehensive reference for Amazon MSK: managed Kafka clusters, event streaming patterns, and integration with AI/ML data pipelines.
Amazon MWAA is a fully managed service that runs Apache Airflow on AWS, providing workflow orchestration for data pipelines, ETL jobs, and …
A comprehensive reference for Amazon Neptune: graph data modeling, knowledge graphs, fraud detection patterns, and integration with AI/ML …
A comprehensive reference for Amazon Personalize: building recommendation engines, real-time personalization, and campaign management for …
A comprehensive reference for Amazon Pinpoint: multi-channel messaging, audience segmentation, campaign analytics, and ML-powered engagement …
A comprehensive reference for Amazon QuickSight: managed BI dashboards, ML-powered insights, natural language queries, and embedded …
A comprehensive reference for Amazon Redshift: columnar data warehousing, ML integration, and analytics patterns for AI-driven enterprise …
A comprehensive reference for Amazon Timestream: purpose-built time series storage, query patterns, and integration with IoT and operational …
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring data workflows and ETL pipelines.
Apache Flink is a distributed stream processing framework for stateful computations over unbounded and bounded data streams.
Apache Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of commodity hardware.
Apache Hive is a data warehouse infrastructure built on top of Apache Hadoop that provides SQL-like querying capabilities for large-scale …
Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant real-time data pipelines and streaming …
Apache Spark is a multi-language engine for large-scale data processing, machine learning, and streaming analytics.
Apache Superset is a modern, open-source data exploration and visualization platform designed for interactive analytics and dashboard …
A comprehensive reference for AutoGen: Microsoft's framework for multi-agent AI systems, conversational patterns, code execution, and …
Google AutoML enables users to train custom ML models for vision, language, tabular data, and video with minimal machine learning expertise …
AWS Fargate is a serverless compute engine for containers that eliminates the need to manage underlying EC2 instances when running …
A comprehensive reference for AWS IoT Core: device connectivity, message routing, rules engine, and integration patterns for IoT-driven AI …
AWS WAF is a web application firewall that protects web applications and APIs from common exploits, bot traffic, and malicious requests at …
Azure Active Directory B2C is a customer identity management service that provides authentication, authorization, and user profile …
Azure AI Document Intelligence (formerly Form Recognizer) extracts text, key-value pairs, tables, and structured data from documents using …
Azure AI Search is a fully managed search service that provides keyword, vector, and hybrid search capabilities for building intelligent …
Azure AI Services (formerly Cognitive Services) provides pre-built AI models accessible via REST APIs for vision, language, speech, and …
Azure Anomaly Detector is an AI service that identifies anomalies in time series data using machine learning models that automatically adapt …
Azure Blob Storage provides massively scalable object storage for unstructured data, serving as the primary data layer for AI and machine …
Azure Bot Service provides a managed environment for building, deploying, and managing intelligent conversational bots across multiple …
Azure Communication Services provides APIs and SDKs for adding voice calling, video calling, SMS, email, and chat capabilities to …
Azure Computer Vision is an AI service that analyzes images and videos to extract visual features, detect objects, read text, and generate …
Azure Cosmos DB is a fully managed, globally distributed NoSQL and relational database service designed for low-latency, high-throughput …
Azure Custom Vision is an AI service for building custom image classification and object detection models with minimal training data and no …
Azure Data Explorer is a fast, fully managed data analytics service optimized for real-time analysis of large volumes of streaming and time …
Azure Data Factory is a managed cloud ETL service for building data integration pipelines that move and transform data at scale across cloud …
Azure Event Grid is a fully managed event routing service that enables event-driven architectures with publish-subscribe messaging across …
Azure Event Hubs is a fully managed real-time data streaming platform capable of ingesting millions of events per second for big data and AI …
Azure Functions is Microsoft's serverless compute platform that executes event-driven code without managing infrastructure, commonly used to …
Azure HDInsight is a managed cloud service for running open-source big data frameworks including Apache Spark, Hadoop, Hive, HBase, and …
Azure Health Data Services is a managed platform for ingesting, persisting, and connecting healthcare data using industry standards like …
Azure IoT Hub is a managed service that enables reliable, secure bidirectional communication between IoT devices and cloud-based AI and …
Azure Logic Apps is a cloud-based platform for creating and running automated workflows that integrate apps, data, services, and systems …
Azure Machine Learning is Microsoft's fully managed platform for building, training, deploying, and managing machine learning models at …
Azure Managed Grafana is a fully managed Grafana instance that provides rich data visualization and monitoring dashboards natively …
Azure Media Services is a cloud-based platform for encoding, packaging, protecting, and streaming video and audio content at scale.
Azure Monitor is Microsoft's comprehensive observability platform that collects, analyzes, and acts on telemetry from cloud and on-premises …
A comprehensive reference for Azure OpenAI Service: enterprise-grade GPT access, content filtering, data residency, and integration with the …
Azure Personalizer is a reinforcement learning service that selects the best content, layout, or action for individual users based on …
Azure Speech Services provides cloud-based APIs for speech recognition, speech synthesis, real-time translation, and speaker identification …
Azure Static Web Apps is a service that automatically builds and deploys full-stack web applications from a code repository with integrated …
Azure Synapse Analytics is an integrated analytics platform that combines enterprise data warehousing, big data processing, and data …
Azure Translator is a cloud-based neural machine translation service that translates text and documents across more than 100 languages in …
Google BigQuery is a serverless, highly scalable data warehouse that supports SQL analytics, ML model training, and real-time streaming …
A comprehensive reference for Chroma: the open-source embedding database for AI applications, local development, and lightweight production …
ClickHouse is an open-source columnar database management system optimized for real-time analytical queries on large datasets.
Google Cloud Armor provides web application firewall (WAF), DDoS protection, and adaptive security policies for applications deployed on …
Google Cloud Bigtable is a fully managed, scalable NoSQL wide-column database designed for low-latency, high-throughput workloads including …
Google Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow for authoring, scheduling, and monitoring …
Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines for both stream and batch data processing at scale.
Google Cloud Dataproc is a fully managed service for running Apache Spark, Hadoop, Flink, and Presto clusters for big data processing and ML …
Google Cloud Firestore is a serverless, scalable NoSQL document database with real-time synchronization, offline support, and strong …
Google Cloud Healthcare API provides managed storage and access for healthcare data in FHIR, HL7v2, and DICOM formats with ML-ready data …
Google Cloud IoT Core was a managed service for connecting, managing, and ingesting data from IoT devices, deprecated in August 2023.
Google Cloud Monitoring provides metrics collection, dashboards, alerting, and uptime checks for GCP resources, applications, and AI/ML …
Google Cloud Natural Language API provides pre-trained models for sentiment analysis, entity recognition, syntax analysis, and content …
Google Cloud Pub/Sub is a fully managed real-time messaging service for asynchronous event-driven architectures, data streaming, and service …
Google Cloud Run is a fully managed serverless platform for running containerized applications that scale automatically from zero to …
Google Cloud Spanner is a fully managed, globally distributed relational database that combines the consistency of traditional databases …
Google Cloud Speech-to-Text converts audio to text using deep learning, while Text-to-Speech synthesizes natural-sounding speech from text …
Google Cloud Translation API provides neural machine translation between over 130 languages with support for custom glossaries and model …
Google Cloud Vision AI provides pre-trained models for image labeling, object detection, OCR, face detection, and explicit content …
Google Cloud Workflows is a serverless orchestration service that sequences HTTP-based API calls, Cloud Functions, and GCP services into …
Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, and machine learning on a …
dbt (data build tool) is an open-source transformation framework that enables analytics engineers to transform data in warehouses using SQL …
Google Dialogflow is a natural language understanding platform for building chatbots, voice bots, and conversational interfaces powered by …
A comprehensive reference for DSPy: declarative language model programming, automatic prompt optimization, and systematic LLM pipeline …
DuckDB is an in-process analytical database management system designed for fast OLAP queries on local data without requiring a separate …
Eclipse Mosquitto is an open-source lightweight MQTT message broker for implementing publish/subscribe messaging in IoT and M2M …
A comprehensive reference for Elasticsearch: full-text search, vector search, hybrid retrieval, and integration patterns for AI …
Google Firebase is a comprehensive application development platform providing authentication, real-time databases, hosting, analytics, and …
Google Cloud Functions is a lightweight serverless compute platform for building event-driven microservices and AI pipeline glue logic on …
Google Cloud Storage is a unified object storage service for storing and accessing data across analytics, AI/ML, and application workloads.
Google Document AI extracts structured data from documents using pre-trained and custom ML models for forms, invoices, receipts, and other …
A comprehensive reference for Google Vertex AI: Gemini models, AutoML, model training, and enterprise ML workflows on Google Cloud Platform.
Grafana is an open-source analytics and interactive visualization platform for monitoring data from Prometheus, Elasticsearch, InfluxDB, and …
Great Expectations is an open-source Python library for validating, documenting, and profiling data to ensure data quality in pipelines.
A comprehensive reference for Guardrails AI: validating and structuring LLM outputs, the Guardrails Hub, and integration patterns for …
A comprehensive reference for Hugging Face: the model hub, Transformers library, datasets, and deployment options for open-source AI models.
Hugging Face Transformers is an open-source library providing thousands of pretrained models for NLP, computer vision, audio, and multimodal …
InfluxDB is an open-source time series database designed for high-write-throughput storage and real-time querying of timestamped data from …
A comprehensive reference for Instructor: extracting structured, validated data from LLM responses using Pydantic models, retry logic, and …
Keycloak is an open-source identity and access management solution providing single sign-on, user federation, and identity brokering for …
Knative is an open-source platform that extends Kubernetes to provide serverless workload management with automatic scaling to zero and …
Kubeflow is an open-source machine learning platform that makes deploying, scaling, and managing ML workflows on Kubernetes simple and …
A comprehensive reference for LangChain: building LLM-powered applications, chains, retrievers, agents, and integration patterns for …
Google Looker is a business intelligence and data analytics platform that uses a semantic modeling layer (LookML) to deliver consistent, …
Google Media CDN is a high-performance content delivery network optimized for streaming video, large file delivery, and media-rich …
Metabase is an open-source business intelligence tool that enables non-technical users to ask questions about data and visualize results …
MinIO is a high-performance, S3-compatible object storage system designed for large-scale AI and data infrastructure workloads.
A comprehensive reference for MLflow: experiment tracking, model registry, deployment, and lifecycle management for enterprise ML and AI …
A comprehensive reference for NVIDIA NeMo Guardrails: programmable safety rails for LLM conversations, Colang, topic control, and enterprise …
Neo4j is an open-source native graph database that stores and queries data as nodes and relationships, optimized for connected data …
Novu is an open-source notification infrastructure platform for managing multi-channel notifications across email, SMS, push, in-app, and …
Ollama is an open-source tool for running large language models locally on personal hardware with a simple command-line interface.
A comprehensive reference for the OpenAI API: GPT models, embeddings, function calling, and integration patterns for enterprise AI …
Whisper is an open-source automatic speech recognition model by OpenAI that provides robust, multilingual speech-to-text transcription.
OpenFaaS is an open-source framework for building and deploying serverless functions and microservices on Kubernetes and Docker Swarm.
OpenTelemetry is a vendor-neutral open-source observability framework for generating, collecting, and exporting telemetry data (traces, …
A comprehensive reference for pgvector: adding vector similarity search to PostgreSQL, indexing strategies, and patterns for combining …
A comprehensive reference for Pinecone: managed vector storage, similarity search, namespace management, and RAG integration patterns.
Power BI is Microsoft's business intelligence platform that transforms data into interactive visualizations and reports, integrating with …
Prefect is an open-source workflow orchestration framework that makes it easy to build, observe, and react to data pipelines using Python.
Prometheus is an open-source systems monitoring and alerting toolkit designed for reliability, featuring a dimensional data model and …
A comprehensive reference for Qdrant: vector similarity search, payload filtering, collection management, and deployment patterns for …
Rasa is an open-source framework for building contextual AI assistants and chatbots with natural language understanding and dialogue …
A comprehensive reference for Ray: distributed Python computing, Ray Train for ML training, Ray Serve for inference, and scaling AI …
Google Recommendations AI delivers personalized product recommendations for retail and media using Google's deep learning models trained on …
A comprehensive reference for Semantic Kernel: Microsoft's SDK for integrating LLMs into applications, plugin architecture, planners, and …
spaCy is an open-source library for advanced natural language processing in Python, designed for production use with fast, accurate NLP …
Supabase is an open-source backend-as-a-service platform providing a PostgreSQL database, authentication, real-time subscriptions, storage, …
Temporal is an open-source durable execution platform for building reliable, long-running workflows and distributed applications.
Tesseract is an open-source optical character recognition engine that extracts text from images and scanned documents in over 100 languages.
TimescaleDB is an open-source time-series database built as a PostgreSQL extension, optimized for fast ingest and complex queries on …
vLLM is an open-source library for high-throughput, low-latency serving of large language models using PagedAttention memory management.
A comprehensive reference for Weaviate: open-source vector search, hybrid retrieval, generative search modules, and self-hosted deployment …
A comprehensive reference for Weights & Biases: experiment tracking, hyperparameter sweeps, model evaluation, and team collaboration for ML …
How Amazon Bedrock AgentCore provides managed infrastructure for running AI agents at scale without managing servers.
Using Amazon CloudWatch for AI workloads: custom metrics for LLM cost and token usage, alarms for model quality, log insights for inference …
Using Amazon EventBridge to connect AWS AI services, trigger pipelines from S3 events, and build loosely coupled multi-step workflows.
Using Amazon OpenSearch Service for vector search, full-text search, and log analytics in AI-powered applications.
Using Amazon Polly to generate natural-sounding speech from text in AI applications, with SSML control and neural voice options.
How Amazon S3 functions as the storage backbone for AI data pipelines: ingest, staging, output, and lifecycle management.
Using Amazon Translate for real-time and batch document translation in multilingual AI applications.
Using AWS Elemental MediaConvert for transcoding, format conversion, and video processing in AI media pipelines.
GitHub Actions workflow syntax, Hugo deployment pattern, Python testing pipelines, Docker builds, Terraform plan/apply, and model evaluation …
What the Model Context Protocol is, how it enables AI agents to use tools through a standard interface, and server/client architecture.
Using Pydantic AI to build AI agents with validated inputs and outputs, Bedrock backend support, and Python type annotations.
What Strands Agents is, how it differs from CrewAI and LangGraph, and when to use it for AWS-hosted agent applications.
A comprehensive reference for Amazon Bedrock: available models, key features, use cases, and pricing patterns for enterprise teams.
Amazon Cognito User Pools and Identity Pools: JWT token structure and expiry, MFA options, SAML/OIDC federation, Lambda triggers, rate …
Sentiment analysis, entity extraction, topic modeling, and language detection with Amazon Comprehend. When to use Comprehend vs Bedrock for …
What Rekognition does, which features work well in enterprise applications, accuracy considerations, pricing, and common integration …
What SageMaker is, when to use it instead of Bedrock, key capabilities, pricing model, and the workflows that suit it best.
A reference guide to Amazon Textract: OCR capabilities, table and form extraction, query-based extraction, and integration patterns for …
Amazon Transcribe capabilities, accuracy characteristics, pricing, and the integration patterns that work well for enterprise transcription …
Using AWS Amplify to deploy front-end applications, host static sites, and connect to AWS AI backends.
Serverless inference, event-driven processing, and integration patterns with Bedrock, SageMaker, and Step Functions. Cost optimization for …
How Step Functions orchestrates multi-step AI workflows, handles retries and errors, and integrates with other AWS services - with practical …
What makes Claude useful for enterprise applications, model tiers, key strengths, access options including through Amazon Bedrock, and …
What CrewAI is, how it models multi-agent systems as crews with roles and tasks, integration with LLM backends, and when to use it versus …
Using FFmpeg in AWS Lambda layers and EC2 for video processing in AI pipelines, including common operations and integration with Rekognition …
Using Hugo to build fast, maintainable documentation sites and AI solution landing pages, with GitHub Pages and Amplify deployment.
Using Langfuse to trace LLM calls, evaluate outputs, and monitor AI application quality in production.
How LangGraph models AI agent workflows as stateful graphs, enabling cyclic execution, human-in-the-loop, and complex multi-step agent …
Using LlamaIndex for retrieval-augmented generation, data connectors, and agent workflows, with Bedrock and OpenSearch integration.
Using Remotion to generate videos programmatically from React components, with Lambda rendering for scalable AI-driven video production.
Using Terraform to provision and manage AWS infrastructure for AI projects: modular design, state management, and multi-environment …
Notion API for structured data, MCP integration, and using Notion databases as knowledge stores for AI agents. When it works and when to …