Data Engineering articles

8/26/2024 • EN

Data Engineering Duke Fall 2023-2024

Overview of a university-level Data Engineering course syllabus covering tools, pipelines, AI pair programming, and project-based learning for Fall 2024.

AI Pair Programming Cloud Platforms Data Engineering Data Pipelines Syllabus

Noah Gift

7/20/2024 • EN

Upcoming Data Talks from Alex Merced (And how to follow)

A list of upcoming tech talks and events by Alex Merced, focusing on Apache Iceberg, data lakehouses, and data engineering topics.

Apache Iceberg Data Engineering Data Lakehouse Dremio Table Format

Alex Merced

6/26/2024 • EN

Video Course - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

A video course covering the fundamentals of lakehouse engineering using Apache Iceberg, Nessie, and Dremio for data management.

Apache Iceberg Data Engineering Dremio Lakehouse Nessie

Alex Merced

5/22/2024 • EN

How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

A data professional shares their curated list of data tech blogs and explains their return to using RSS feeds to stay current in the field.

Data Blogs Data Engineering Data Technology rss Stream Processing

Robin Moffatt

5/15/2024 • EN

3 Reasons Data Engineers Should Embrace Apache Iceberg

Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.

Apache Iceberg Data Engineering Data Lake Partitioning Table Format

Alex Merced

4/8/2024 • EN

Reflecting on my tenure at the City of Boston

A data engineer reflects on their 2-year career journey at the City of Boston, sharing lessons learned in data warehousing, ETL, and civic tech.

analytics Civic Tech Data Engineering Etl Pipelines

Jenna Jordan

4/4/2024 • EN

Understanding the Future of Apache Iceberg Catalogs

Explores the evolution of Apache Iceberg catalogs, focusing on the current REST Catalog and future proposals for server-side optimizations.

Apache Iceberg Catalog Data Engineering Data Lakehouse rest api

Alex Merced

4/4/2024 • EN

A Deep Intro to Apache Iceberg and Resources for Learning More

An introduction to Apache Iceberg, a table format for data lakehouses, explaining its architecture and providing learning resources.

Apache Iceberg Big Data Data Engineering Data Lakehouse Table Format

Alex Merced

4/1/2024 • EN

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.

Apache Superset Data Engineering Data Lakehouse Dremio Spark

Alex Merced

3/12/2024 • EN

Exploring the Flink SQL Gateway REST API

A guide to using Apache Flink's SQL Gateway REST API for submitting and managing SQL jobs, including practical examples with Postman and HTTPie.

Apache Flink Data Engineering rest api SQL Gateway Stream Processing

Robin Moffatt

2/22/2024 • EN

Checkpoint Chronicle - February 2024

Monthly roundup of articles and resources on data streaming, covering Flink, Kafka, Debezium, and streaming SQL developments.

Apache Kafka Data Engineering Event Streaming Flink Stream Processing

Robin Moffatt

2/16/2024 • EN

Catalogs in Flink SQL—A Primer

Explains the role and types of catalogs in Apache Flink SQL, comparing them to traditional RDBMS systems and highlighting their importance in data management.

Apache Flink Catalog Data Engineering Flink SQL SQL Ddl

Robin Moffatt

2/13/2024 • EN

Datacast Episode 132: Big Data Engineering, Data Culture from First Principles, and Reimagined Metadata with Suresh Srinivas

Interview with Suresh Srinivas on his career in big data, founding Hortonworks, scaling Uber's data platform, and leading the OpenMetadata project.

Apache Hadoop Big Data Data Engineering metadata Openmetadata

James Le

1/19/2024 • EN

Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources

A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.

Apache Iceberg Data Engineering Data Lakehouse Dremio Open Standards

Alex Merced

1/8/2024 • EN

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.

Apache Iceberg Data Catalog Data Engineering Metadata Management Nessie

Alex Merced

11/14/2023 • EN

Can Debezium Lose Events?

Explores whether Debezium can lose database change events, explaining its at-least-once semantics and operational pitfalls like log retention.

change data capture Data Engineering Debezium Event Streaming Transaction Log

Gunnar Morling

11/14/2023 • EN

Can Debezium Lose Events?

Explores whether the Debezium change data capture tool can lose database events, discussing its at-least-once semantics and operational pitfalls.

change data capture Data Engineering Debezium Event Streaming Transaction Log

Gunnar Morling

11/14/2023 • EN

Checkpoint Chronicle - November 2023

Monthly roundup of data streaming trends, featuring Apache Iceberg, Kafka Streams, Flink deployments, and streaming SQL insights.

Apache Flink Apache Iceberg Apache Kafka Data Engineering Stream Processing

Robin Moffatt

11/2/2023 • EN

CDC Use Cases: 7 Ways to Put CDC to Work

Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.

change data capture Data Engineering Database Integration Debezium Real Time Data

Gunnar Morling

11/2/2023 • EN

CDC Use Cases: 7 Ways to Put CDC to Work

Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.

change data capture Data Engineering Database Integration Debezium Real Time Data

Gunnar Morling

Data Engineering Articles

Data Engineering Duke Fall 2023-2024

Upcoming Data Talks from Alex Merced (And how to follow)

Video Course - Basics of Lakehouse Engineering - Apache Iceberg, Nessie, Dremio

How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

3 Reasons Data Engineers Should Embrace Apache Iceberg

Reflecting on my tenure at the City of Boston

Understanding the Future of Apache Iceberg Catalogs

A Deep Intro to Apache Iceberg and Resources for Learning More

End-to-End Basic Data Engineering Tutorial (Spark, Dremio, Superset)

Exploring the Flink SQL Gateway REST API

Checkpoint Chronicle - February 2024

Catalogs in Flink SQL—A Primer

Datacast Episode 132: Big Data Engineering, Data Culture from First Principles, and Reimagined Metadata with Suresh Srinivas

Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

Can Debezium Lose Events?

Can Debezium Lose Events?

Checkpoint Chronicle - November 2023

CDC Use Cases: 7 Ways to Put CDC to Work

CDC Use Cases: 7 Ways to Put CDC to Work

Select Language

We use cookies