Data Engineering articles

5/2/2025 • EN

Introduction to Data Engineering Concepts | Data Lakehouse Architecture Explained

Explains the data lakehouse architecture, a unified approach combining data lake scalability with warehouse management features like ACID transactions.

Apache Iceberg Data Architecture Data Engineering Data Lakehouse Data Management

Alex Merced

4/22/2025 • EN

Interesting links - April 2025

A monthly roundup of curated links and articles on data engineering, Kafka, CDC, stream processing, and AI/ML topics.

Apache Flink change data capture Data Engineering Kafka Stream Processing

Robin Moffatt

3/20/2025 • EN

Building a data pipeline with DuckDB

A guide to building a data pipeline using DuckDB, covering data ingestion, transformation, and analytics with real-world environmental data.

Data Engineering Data Pipeline Duckdb Etl Slowly Changing Dimensions

Robin Moffatt

2/3/2025 • EN

Interesting links - February 2025

A monthly roundup of interesting links and articles about data engineering, databases, streaming tech, and data infrastructure.

Apache Kafka Data Architecture Data Engineering Databases streaming

Robin Moffatt

1/20/2025 • EN

2025 Comprehensive Guide to Apache Iceberg

A comprehensive 2025 guide to Apache Iceberg, covering its architecture, ecosystem, and practical use for data lakehouse management.

Apache Iceberg Big Data Data Engineering Data Lakehouse Table Format

Alex Merced

1/6/2025 • EN

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

Argues that RAG system failures stem from data engineering issues like fragmented data and governance, not from model or vector database choices.

Data Engineering Hybrid Search latency Rag Vector Databases

Alex Merced

12/19/2024 • EN

Overture Maps' Refreshed Global Geospatial Datasets

Overview of Overture Maps Foundation's updated global, open geospatial datasets, their partners, and data refresh strategy.

cloud storage Data Engineering Geospatial Data Open Data 깃

Mark Litwintschik

12/19/2024 • EN

Checkpoint Chronicle - December 2024

Monthly roundup of news and resources in data streaming, stream processing, and the Apache Kafka ecosystem, curated by industry experts.

Apache Flink Apache Kafka Data Engineering Event Streaming Stream Processing

Robin Moffatt

12/11/2024 • EN

Exploring Flink CDC

An overview of Apache Flink CDC, its declarative pipeline feature, and how it simplifies data integration from databases like MySQL to sinks like Elasticsearch.

Apache Flink change data capture Data Engineering Flink Cdc sql

Robin Moffatt

11/4/2024 • EN

dbt Community Spotlight

A profile of a Senior Analytics Engineer specializing in dbt, data mesh architecture, and applying library science principles to modern data teams.

Analytics Engineering Data Engineering Data Governance Data Mesh Dbt

Jenna Jordan

10/30/2024 • EN

Checkpoint Chronicle - October 2024

Monthly roundup of news and developments in data streaming, stream processing, and the data ecosystem, featuring Apache Flink, Kafka, and open-source tools.

Apache Flink Data Engineering Event Streaming Stream Processing Streaming SQL

Robin Moffatt