Data Engineering articles

7/18/2025 • EN

Interesting links - July 2025

A monthly roundup of data engineering links covering Apache Iceberg, Kafka, Debezium, Spark, and lakehouse architecture.

Apache Iceberg Data Engineering Data Lakehouse Flink SQL Streaming Data

Robin Moffatt

7/15/2025 • EN

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.

Apache Iceberg Data Engineering Data Lakehouse Metadata Management Table Optimization

Alex Merced

7/14/2025 • EN

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Explains the importance of table maintenance in Apache Iceberg for data lakehouses, covering metadata and file management.

Apache Iceberg Data Engineering Data Lakehouse Metadata Management Table Maintenance

Robin Moffatt

6/2/2025 • EN

Digging into Ducklake

An analysis of DuckLake, a new open table format and catalog specification for data engineering, comparing it to existing solutions like Iceberg and Delta Lake.

Data Engineering Duckdb Ducklake Open Table Format Parquet

Robin Moffatt

5/23/2025 • EN

Interesting links - May 2025

A monthly roundup of curated links and articles covering data engineering, Kafka, stream processing, and AI, with top picks highlighted.

Apache Iceberg Data Engineering Data Modeling Kafka Snowflake

Robin Moffatt

5/2/2025 • EN

Introduction to Data Engineering Concepts | Building Scalable Pipelines

Explores core principles of scalable data engineering, including parallelism, minimizing data movement, and designing adaptable pipelines for growing data volumes.

Apache Iceberg Data Architecture Data Engineering parallelism Scalable Pipelines

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

Explores workflow orchestration in data engineering, covering DAGs, tools, and best practices for managing complex data pipelines.

Data Engineering Directed Acyclic Graphs Etl Scheduling Workflow Orchestration

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Storage Formats and Compression

Explains the importance of data storage formats and compression for performance and cost in large-scale data engineering systems.

Apache Iceberg Columnar Storage compression Data Engineering Storage Formats

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | DevOps for Data Engineering

Explores how DevOps principles like CI/CD, infrastructure as code, and monitoring are applied to data engineering for reliable, scalable data pipelines.

Data Engineering Data Pipelines DevOps Infrastructure As Code version control

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Cloud Data Platforms and the Modern Stack

Explores the modern data stack, cloud platforms, and principles for building flexible, cloud-native data engineering architectures.

Cloud Platforms Data Architecture Data Engineering Managed Services Modern Data Stack

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Explains core data engineering concepts, comparing ETL and ELT data pipeline strategies and their use cases.

Data Engineering Data Pipelines data transformation Elt Etl

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Streaming Data Fundamentals

Explains streaming data fundamentals, how streaming systems work, their use cases, and challenges compared to batch processing.

Batch Processing Data Engineering Data Pipelines Real Time Processing Streaming Data

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Batch Processing Fundamentals

Explains batch processing fundamentals for data engineering, covering concepts, tools, and its ongoing relevance in data workflows.

Apache Iceberg Batch Processing Data Engineering Data Pipelines Data Workflows

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Modeling Basics

An introduction to data modeling concepts, covering OLTP vs OLAP systems, normalization, and common schema designs for data engineering.

Data Engineering Data Modeling Database Design Olap Oltp

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Understanding Data Sources and Ingestion

An introduction to data engineering concepts, focusing on data sources and ingestion strategies like batch vs. streaming.

Batch Processing Data Engineering Data Ingestion Data Sources streaming

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | What is Data Engineering?

An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.

Apache Iceberg Data Engineering Data Infrastructure Data Pipelines Data Warehouse

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Lakes Explained

Explains data lakes, their key characteristics, and how they differ from data warehouses in modern data architecture.

Apache Iceberg cloud storage Data Architecture Data Engineering Data Lakes

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Metadata, Lineage, and Governance

Explains core data engineering concepts: metadata, data lineage, and governance, and their importance for scalable, compliant data systems.

Apache Iceberg Data Engineering Data Governance Data Lineage metadata

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Quality and Validation

Explores the importance of data quality and validation in data engineering, covering key dimensions and tools for reliable pipelines.

Apache Iceberg Data Engineering Data Pipelines Data Quality Data Validation

Alex Merced

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Warehousing Fundamentals

An introduction to data warehousing concepts, covering architecture, components, and performance optimization for analytical workloads.

Apache Iceberg Data Architecture Data Engineering Data Warehousing performance optimization

Alex Merced

Data Engineering Articles

Interesting links - July 2025

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Digging into Ducklake

Interesting links - May 2025

Introduction to Data Engineering Concepts | Building Scalable Pipelines

Introduction to Data Engineering Concepts | Scheduling and Workflow Orchestration

Introduction to Data Engineering Concepts | Storage Formats and Compression

Introduction to Data Engineering Concepts | DevOps for Data Engineering

Introduction to Data Engineering Concepts | Cloud Data Platforms and the Modern Stack

Introduction to Data Engineering Concepts | ETL vs ELT – Understanding Data Pipelines

Introduction to Data Engineering Concepts | Streaming Data Fundamentals

Introduction to Data Engineering Concepts | Batch Processing Fundamentals

Introduction to Data Engineering Concepts | Data Modeling Basics

Introduction to Data Engineering Concepts | Understanding Data Sources and Ingestion

Introduction to Data Engineering Concepts | What is Data Engineering?

Introduction to Data Engineering Concepts | Data Lakes Explained

Introduction to Data Engineering Concepts | Metadata, Lineage, and Governance

Introduction to Data Engineering Concepts | Data Quality and Validation

Introduction to Data Engineering Concepts | Data Warehousing Fundamentals

Select Language

We use cookies