Data-Engineering

25 articles Use search to find specific topics
Showing 24 of 25
Stream Processing What stream processing is, how Flink, Spark Streaming, and Kafka Streams enable real-time data transformation, …Real-Time Data Pipelines for AI Workloads Implementation guide for real-time streaming data pipelines: four-layer architecture, Flink feature …Medallion Architecture - Bronze, Silver, Gold Data Quality Layers How the medallion architecture organizes data lakehouses into progressive quality layers to support analytics …Implementing a Data Catalog for AI Teams How to implement metadata management with DataHub or OpenMetadata: automated ingestion, data lineage, …Feature Stores for Machine Learning - A Practical Guide What feature stores are, why they matter, how to choose one, and practical implementation guidance for ML …Feature Store What a feature store is, how it serves as a centralized repository for ML features, and why it solves the …ETL - Extract, Transform, Load What ETL is, how it powers data pipelines, and how it compares to ELT for modern data architectures.ELT - Extract, Load, Transform What ELT is, how it differs from ETL, and why modern data architectures favor loading raw data before …Designing a Data Lakehouse for AI/ML Workloads A practical guide to designing and implementing a data lakehouse architecture optimized for AI and machine …Delta Lake vs Apache Iceberg for Lakehouse Architecture Comparing Delta Lake and Apache Iceberg as open table formats for lakehouse architectures supporting AI/ML …dbt vs AWS Glue for AI Data Transformation Comparing dbt and AWS Glue for data transformation in AI pipelines, covering capabilities, developer …Databricks - Unified Analytics and AI Platform Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, …Data Quality Validation for AI Systems How to implement data quality validation for AI workloads using Great Expectations and Deequ: profiling, …Data Quality What data quality means for AI systems, the dimensions of data quality, and how validation, profiling, and …Data Lake What a data lake is, how it stores raw data at scale, and when to use a data lake versus a data warehouse.Data Contract Pattern for AI Systems Implementing schema contracts between data producers and AI consumers: contract specification, validation …Data Contract What data contracts are, how schema-first agreements between data producers and consumers prevent breaking …Data Catalog What a data catalog is, how metadata management and data discovery tools help AI teams find, understand, and …Change Data Capture What change data capture (CDC) is, how Debezium and AWS DMS enable real-time data replication, and why CDC …Apache Kafka What Kafka is, how it provides distributed event streaming, and when to choose Kafka for AI data pipelines.Amazon Kinesis What Amazon Kinesis is, how it processes streaming data in real time, and when to use Kinesis versus other …AI Data Cleaning and Normalization AI detects and fixes data quality issues - inconsistent formats, duplicates, missing values, and outliers - …Amazon OpenSearch Service - Search and Analytics for AI Using Amazon OpenSearch Service for vector search, full-text search, and log analytics in AI-powered …Data Preparation for AI Projects - A Practical Guide How to prepare data for AI projects: assessing what you have, cleaning and normalizing it, building evaluation …

25 articles in this section. Search for a specific topic.