Alex Merced

Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.

https://tuts.alexmercedcoder.dev

RSS Feed

12/31/2025

data engineering apache iceberg data lakehouse python analytics

Articles from this Blog

388 articles from this blog

11/12/2025 • EN

Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)

A hands-on tutorial exploring Dremio Cloud Next Gen's new free trial, covering its lakehouse platform, AI features, and SQL capabilities.

sql Cloud Platform Apache Iceberg

10/23/2025 • EN

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

A comprehensive guide to learning Apache Iceberg, data lakehouse architecture, and Agentic AI with curated tutorials, tools, and resources.

agentic ai Data Engineering Apache Iceberg

10/21/2025 • EN

An Exploration of the Commercial Iceberg Catalog Ecosystem

Explores the commercial Apache Iceberg catalog ecosystem, focusing on REST Catalog standards, optimization strategies, and architectural trade-offs.

optimization Metadata Management Data Lakehouse

10/17/2025 • EN

Building a Universal Lakehouse Catalog - Beyond Iceberg Tables

Explores two paths for building a universal lakehouse catalog that extends beyond Apache Iceberg tables to manage diverse data formats and sources.

Data Lakehouse Table Format REST Catalog

10/16/2025 • EN

Intro to Apache Iceberg with Apache Polaris and Apache Spark

A technical guide on using Apache Iceberg with Apache Spark and Polaris for building and managing a data lakehouse, covering setup, operations, and optimization.

Apache Spark Data Engineering Apache Iceberg

10/14/2025 • EN

The State of Apache Iceberg v4 - October 2025 Edition

Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.

Data Engineering Apache Iceberg Data Lakehouse

9/24/2025 • EN

The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake

A comprehensive guide comparing five major open table formats (Iceberg, Delta Lake, Hudi, Paimon, DuckLake) for modern data lakehouses, covering their internals and use cases.

Apache Iceberg Apache Hudi Delta Lake

9/23/2025 • EN

The 2025 & 2026 Ultimate Guide to the Data Lakehouse and the Data Lakehouse Ecosystem

A comprehensive guide to the data lakehouse architecture, its core components (Iceberg, Delta, Hudi, Paimon), and the surrounding ecosystem for modern data platforms.

Data Architecture Apache Iceberg Data Lakehouse

9/16/2025 • EN

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

A guide to building an autonomous, self-healing optimization pipeline for Apache Iceberg tables to maintain performance and cost efficiency.

Metadata Management Apache Iceberg Data Lakehouse

9/9/2025 • EN

Managing Large-Scale Optimizations — Parallelism, Checkpointing, and Fail Recovery

Strategies for scaling and optimizing Apache Iceberg data compaction jobs, including parallelism, checkpointing, and failure recovery.

parallelism Checkpointing Apache Iceberg

9/2/2025 • EN

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

Explores challenges and best practices for managing partition evolution and compaction in Apache Iceberg to maintain query performance.

Metadata Management Apache Iceberg Data Lakehouse

8/26/2025 • EN

Using Iceberg Metadata Tables to Determine When Compaction Is Needed

Explains how to use Apache Iceberg's metadata tables to dynamically trigger data compaction based on file size, manifest health, and snapshot patterns.

Apache Iceberg Data Lakehouse Table Optimization

8/19/2025 • EN

Designing the Ideal Cadence for Compaction and Snapshot Expiration

A guide to scheduling compaction and snapshot expiration in Apache Iceberg tables based on workload patterns and infrastructure constraints.

Data Engineering Apache Iceberg Data Lakehouse

8/12/2025 • EN

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Explains how to manage Apache Iceberg table metadata by expiring old snapshots and rewriting manifests to prevent performance and cost issues.

Metadata Management Apache Iceberg Data Lakehouse

8/5/2025 • EN

Smarter Data Layout — Sorting and Clustering Iceberg Tables

Explains how to use sorting and Z-order clustering in Apache Iceberg tables to optimize query performance and data layout.

Sorting Clustering Apache Iceberg

7/29/2025 • EN

Optimizing Compaction for Streaming Workloads in Apache Iceberg

Explains techniques for incremental, non-disruptive compaction in Apache Iceberg tables under continuous streaming data ingestion.

Apache Iceberg Data Lakehouse Data Compaction

7/22/2025 • EN

The Basics of Compaction — Bin Packing Your Data for Efficiency

Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.

Spark Apache Iceberg Data Compaction

7/15/2025 • EN

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.

Metadata Management Data Engineering Apache Iceberg

7/3/2025 • EN

How to Discover or Organize Lakehouse & Apache Iceberg Meetups

A guide on how to find, join, and organize community meetups focused on Apache Iceberg and modern data lakehouse architectures.

Slack Meetup Organization Apache Iceberg

5/2/2025 • EN

Introduction to Data Engineering Concepts | What is Data Engineering?

An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.

Data Pipelines Data Engineering Data Warehouse

Previous 1 2 3 4 5 6 ... 20 Next

Alex Merced

Articles from this Blog

Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

An Exploration of the Commercial Iceberg Catalog Ecosystem

Building a Universal Lakehouse Catalog - Beyond Iceberg Tables

Intro to Apache Iceberg with Apache Polaris and Apache Spark

The State of Apache Iceberg v4 - October 2025 Edition

The Ultimate Guide to Open Table Formats - Iceberg, Delta Lake, Hudi, Paimon, and DuckLake

The 2025 & 2026 Ultimate Guide to the Data Lakehouse and the Data Lakehouse Ecosystem

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

Managing Large-Scale Optimizations — Parallelism, Checkpointing, and Fail Recovery

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

Using Iceberg Metadata Tables to Determine When Compaction Is Needed

Designing the Ideal Cadence for Compaction and Snapshot Expiration

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Smarter Data Layout — Sorting and Clustering Iceberg Tables

Optimizing Compaction for Streaming Workloads in Apache Iceberg

The Basics of Compaction — Bin Packing Your Data for Efficiency

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

How to Discover or Organize Lakehouse & Apache Iceberg Meetups

Introduction to Data Engineering Concepts | What is Data Engineering?

Select Language

We use cookies