Data-Engineering

25 articles Use search to find specific topics

Recent articles Showing 24 of 25

Stream Processing What stream processing is, how Flink, Spark Streaming, and Kafka Streams enable real-time data transformation, …

stream-processing flink

Real-Time Data Pipelines for AI Workloads Implementation guide for real-time streaming data pipelines: four-layer architecture, Flink feature …

stream-processing flink

Medallion Architecture - Bronze, Silver, Gold Data Quality Layers How the medallion architecture organizes data lakehouses into progressive quality layers to support analytics …

frameworks medallion

Implementing a Data Catalog for AI Teams How to implement metadata management with DataHub or OpenMetadata: automated ingestion, data lineage, …

data-catalog metadata

Feature Stores for Machine Learning - A Practical Guide What feature stores are, why they matter, how to choose one, and practical implementation guidance for ML …

feature-store MLOps

Feature Store What a feature store is, how it serves as a centralized repository for ML features, and why it solves the …

feature-store mlops

ETL - Extract, Transform, Load What ETL is, how it powers data pipelines, and how it compares to ELT for modern data architectures.

ETL data-engineering

ELT - Extract, Load, Transform What ELT is, how it differs from ETL, and why modern data architectures favor loading raw data before …

ELT data-engineering

Designing a Data Lakehouse for AI/ML Workloads A practical guide to designing and implementing a data lakehouse architecture optimized for AI and machine …

data-lakehouse data-architecture

Delta Lake vs Apache Iceberg for Lakehouse Architecture Comparing Delta Lake and Apache Iceberg as open table formats for lakehouse architectures supporting AI/ML …

Delta-Lake Iceberg

dbt vs AWS Glue for AI Data Transformation Comparing dbt and AWS Glue for data transformation in AI pipelines, covering capabilities, developer …

Databricks - Unified Analytics and AI Platform Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, …

databricks spark

Data Quality Validation for AI Systems How to implement data quality validation for AI workloads using Great Expectations and Deequ: profiling, …

data-quality great-expectations

Data Quality What data quality means for AI systems, the dimensions of data quality, and how validation, profiling, and …

data-quality data-engineering

Data Lake What a data lake is, how it stores raw data at scale, and when to use a data lake versus a data warehouse.

Data Contract Pattern for AI Systems Implementing schema contracts between data producers and AI consumers: contract specification, validation …

data-contract schema

Data Contract What data contracts are, how schema-first agreements between data producers and consumers prevent breaking …

data-contract data-engineering

Data Catalog What a data catalog is, how metadata management and data discovery tools help AI teams find, understand, and …

data-catalog metadata

Change Data Capture What change data capture (CDC) is, how Debezium and AWS DMS enable real-time data replication, and why CDC …

change-data-capture debezium

Apache Kafka What Kafka is, how it provides distributed event streaming, and when to choose Kafka for AI data pipelines.

kafka streaming

Amazon Kinesis What Amazon Kinesis is, how it processes streaming data in real time, and when to use Kinesis versus other …

kinesis streaming

AI Data Cleaning and Normalization AI detects and fixes data quality issues - inconsistent formats, duplicates, missing values, and outliers - …

data-quality data-cleaning

Amazon OpenSearch Service - Search and Analytics for AI Using Amazon OpenSearch Service for vector search, full-text search, and log analytics in AI-powered …

data-engineering intermediate

Data Preparation for AI Projects - A Practical Guide How to prepare data for AI projects: assessing what you have, cleaning and normalizing it, building evaluation …

data-engineering intermediate

25 articles in this section. Search for a specific topic.

Open source projects

Freelancer Templates Contracts, proposals, SOWs

Freelancer Automation Workflow recipes, AI playbooks

Work with Linda

Workshop Series €2,000/mo x 3

1:1 Consulting 60 min session