Data Engineering articles

10/2/2023 • EN

Learning Apache Flink S01E02: What is Flink?

An introductory overview of Apache Flink, explaining its core concepts as a distributed stream processing framework, its history, and primary use cases.

Apache Flink Big Data Data Engineering distributed systems Stream Processing

Robin Moffatt

9/21/2023 • EN

An Itch That Just Has to Be Scratched… (Or, Why Am I Joining Decodable?)

Author explains their move to Decodable to dive deeper into stream processing, Apache Flink, and work with experts in the field.

Apache Flink Apache Kafka Data Engineering Stream Processing streaming

Robin Moffatt

9/10/2023 • EN

TWIL: September 10, 2023

A weekly tech learning digest covering Microsoft Fabric, AI topics, computer vision, Azure AI Document Intelligence, embeddings, and vector search.

Azure AI computer vision Data Engineering Etl Microsoft Fabric

André Vala

8/13/2023 • EN

Analytical Data Warehouses - an introduction

An introduction to analytical data warehouses, explaining their purpose, differences from transactional databases, and their role in team-based analytics.

analytics Data Engineering Data Warehousing database Dbt

Jenna Jordan

8/12/2023 • EN

Analysis of the data job market using "Ask HN: Who is hiring?" posts

Analysis of Hacker News job posts shows the Data Scientist role declining while ML Engineer roles rise, indicating a shift in the data job market.

Data Engineering Data Science Hackernews Analysis Job Market Analysis Machine Learning Engineering

Emir U

6/16/2023 • EN

Datacast Episode 119: Experimentation Culture, Immutable Data Warehouse, The Data Collaboration Problem, and The Rise of Data Contracts with Chad Sanderson

Interview with Chad Sanderson on data platform leadership, experimentation culture, data quality, and the rise of data contracts.

Data Contracts Data Engineering Data Platform Data Quality Experimentation Culture

James Le

5/30/2023 • EN

What is Nessie and Why as a Data Engineer or Architect you should care?

Explains Project Nessie, an open-source data catalog for Apache Iceberg tables, and its importance for data engineers and architects building data lakehouses.

Apache Iceberg Data Catalog Data Engineering Data Lakehouse Table Format

Alex Merced

3/3/2023 • EN

Aligning mismatched Parquet schemas in DuckDB

How to handle mismatched Parquet file schemas when querying multiple files in DuckDB using the UNION_BY_NAME option.

Data Engineering Duckdb Parquet S3 Schema Evolution

Robin Moffatt

12/19/2022 • EN

Machine Learning @ Monzo in 2022

An update on how Monzo integrated machine learning across its organization in 2022, covering team structure, growth, and new initiatives.

Data Engineering Machine Learning Mlop Organizational Structure Staff Engineer

Neal Lathia

11/8/2022 • EN

Data Engineering in 2022: ELT tools

Explores the shift to ELT in data engineering, focusing on modern tools like dbt, Fivetran, and Airbyte for loading and transforming data.

Airbyte Data Engineering Dbt Elt Fivetran

Robin Moffatt

11/3/2022 • EN

Why I Joined Decodable

A software engineer explains their decision to join Decodable, a startup building a serverless real-time data platform, focusing on stream processing.

Apache Kafka change data capture Data Engineering Real Time Stream Processing serverless

Gunnar Morling

10/24/2022 • EN

Data Engineering in 2022: Wrangling the feedback data from Current 22 with dbt

A technical walkthrough of using dbt and DuckDB to clean and analyze session feedback data from a tech conference.

Data Engineering data transformation Dbt Duckdb Feedback Analysis

Robin Moffatt

10/20/2022 • EN

Data Engineering in 2022: Exploring dbt with DuckDB

A hands-on exploration of using dbt (data build tool) with DuckDB for local data engineering, based on a tutorial project.

Analytics Engineering Data Engineering data transformation Dbt Duckdb

Robin Moffatt

10/2/2022 • EN

Data Engineering in 2022: Architectures & Terminology

Explains the evolution from ETL to ELT in data engineering, clarifying the role of modern tools like dbt in the transformation process.

Data Engineering Data Warehouse Dbt Elt Etl

Robin Moffatt

9/16/2022 • EN

Data Engineering in 2022: Exploring LakeFS with Jupyter and PySpark

A hands-on tutorial exploring LakeFS for data versioning and branching using PySpark and Jupyter notebooks in a data engineering context.

Data Engineering Jupyter Lakefs Pyspark S3

Robin Moffatt

9/14/2022 • EN

Data Engineering: Resources

A curated list of essential resources for data engineering, including articles, newsletters, podcasts, and tools.

Data Engineering Data Lake Data Warehouse Dbt Modern Data Stack

Robin Moffatt

9/14/2022 • EN

Data Engineering in 2022: Storage and Access

Explores modern data engineering trends in 2022, focusing on analytical data storage formats, organization, and access patterns.

Apache Hudi Apache Iceberg Data Engineering Delta Lake Table Formats

Robin Moffatt

9/14/2022 • EN

Stretching my Legs in the Data Engineering Ecosystem in 2022

A data engineer explores the evolution of the data ecosystem, comparing past practices with modern tools and trends in 2022.

Apache Kafka Big Data Data Engineering Data Warehousing Stream Processing

Robin Moffatt

7/18/2022 • EN

Thoughts on ML Engineering After a Year of my PhD

A PhD student reflects on the complexities of ML engineering, distinguishing between Task MLEs and Platform MLEs, and shares practical lessons from production systems.

Data Engineering Machine Learning Lifecycle ML Engineering Model Monitoring Production Pipelines

Shreya Shankar

6/20/2022 • EN

Introduction to The World of Data - (OLTP, OLAP, Data Warehouses, Data Lakes and more)

An introduction to modern data systems, explaining OLTP, OLAP, data warehouses, data lakes, and the roles of data engineers, analysts, and scientists.

Data Engineering Data Lakes Data Warehousing Olap Oltp

Alex Merced