Metadata Management articles

2/19/2026 • EN

How a Self-Documenting Semantic Layer Reduces Data Team Toil

Explains how a self-documenting semantic layer uses AI to automate data documentation, reducing manual work and governance risks for data teams.

ai automation Data Documentation Data Governance Metadata Management Semantic Layer

Alex Merced

10/21/2025 • EN

An Exploration of the Commercial Iceberg Catalog Ecosystem

Explores the commercial Apache Iceberg catalog ecosystem, focusing on REST Catalog standards, optimization strategies, and architectural trade-offs.

Data Lakehouse Iceberg Catalog Metadata Management optimization Table Format

Alex Merced

9/16/2025 • EN

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

A guide to building an autonomous, self-healing optimization pipeline for Apache Iceberg tables to maintain performance and cost efficiency.

Apache Iceberg Data Lakehouse Data Optimization Metadata Management Pipeline Automation

Alex Merced

9/2/2025 • EN

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

Explores challenges and best practices for managing partition evolution and compaction in Apache Iceberg to maintain query performance.

Apache Iceberg Data Compaction Data Lakehouse Metadata Management Partition Evolution

Alex Merced

8/12/2025 • EN

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Explains how to manage Apache Iceberg table metadata by expiring old snapshots and rewriting manifests to prevent performance and cost issues.

Apache Iceberg Data Lakehouse Manifest Rewriting Metadata Management Snapshot Expiration

Alex Merced

7/15/2025 • EN

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.

Apache Iceberg Data Engineering Data Lakehouse Metadata Management Table Optimization

Alex Merced

7/14/2025 • EN

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Explains the importance of table maintenance in Apache Iceberg for data lakehouses, covering metadata and file management.

Apache Iceberg Data Engineering Data Lakehouse Metadata Management Table Maintenance

Robin Moffatt

1/8/2024 • EN

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.

Apache Iceberg Data Catalog Data Engineering Metadata Management Nessie

Alex Merced

7/10/2023 • EN

Project Nessie: A Look in the Depths

Project Nessie is a version control system for data lakes, bringing Git-like operations to manage and track changes in data assets.

Data Lake Data Version Control GIT Like Operations Metadata Management Root Pointer Store

Alex Merced

5/17/2021 • EN

Exploring ZooKeeper-less Kafka

Overview of Kafka's new KRaft mode, which removes the ZooKeeper dependency for metadata management and controller election.

Apache Kafka distributed systems Metadata Management Raft Protocol Zookeeper

Gunnar Morling

5/17/2021 • EN

Exploring ZooKeeper-less Kafka

An overview of Kafka's new KRaft mode, which removes the ZooKeeper dependency for metadata management and controller election.

Apache Kafka distributed systems Metadata Management Raft Protocol Zookeeper

Gunnar Morling

10/25/2020 • EN

Data Discovery Platforms and Their Open Source Solutions

An analysis of data discovery platforms, their key features, and available open-source solutions to improve data findability in organizations.

Data Catalog Data Discovery Data Engineering Metadata Management open source

Eugene Yan

Metadata Management Articles

How a Self-Documenting Semantic Layer Reduces Data Team Toil

An Exploration of the Commercial Iceberg Catalog Ecosystem

The Endgame — Building an Autonomous Optimization Pipeline for Apache Iceberg

Hidden Pitfalls — Compaction and Partition Evolution in Apache Iceberg

Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Nessie - An Alternative to Hive & JDBC for Self-Managed Apache Iceberg Catalogs

Project Nessie: A Look in the Depths

Exploring ZooKeeper-less Kafka

Exploring ZooKeeper-less Kafka

Data Discovery Platforms and Their Open Source Solutions

Select Language

We use cookies