Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)
A hands-on tutorial exploring Dremio Cloud Next Gen's new free trial, covering its lakehouse platform, AI features, and SQL capabilities.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
388 articles from this blog
A hands-on tutorial exploring Dremio Cloud Next Gen's new free trial, covering its lakehouse platform, AI features, and SQL capabilities.
A comprehensive guide to learning Apache Iceberg, data lakehouse architecture, and Agentic AI with curated tutorials, tools, and resources.
Explores the commercial Apache Iceberg catalog ecosystem, focusing on REST Catalog standards, optimization strategies, and architectural trade-offs.
Explores two paths for building a universal lakehouse catalog that extends beyond Apache Iceberg tables to manage diverse data formats and sources.
A technical guide on using Apache Iceberg with Apache Spark and Polaris for building and managing a data lakehouse, covering setup, operations, and optimization.
Overview of key proposals in Apache Iceberg v4, focusing on performance, metadata efficiency, and portability for modern data workloads.
A comprehensive guide comparing five major open table formats (Iceberg, Delta Lake, Hudi, Paimon, DuckLake) for modern data lakehouses, covering their internals and use cases.
A comprehensive guide to the data lakehouse architecture, its core components (Iceberg, Delta, Hudi, Paimon), and the surrounding ecosystem for modern data platforms.
A guide to building an autonomous, self-healing optimization pipeline for Apache Iceberg tables to maintain performance and cost efficiency.
Strategies for scaling and optimizing Apache Iceberg data compaction jobs, including parallelism, checkpointing, and failure recovery.
Explores challenges and best practices for managing partition evolution and compaction in Apache Iceberg to maintain query performance.
Explains how to use Apache Iceberg's metadata tables to dynamically trigger data compaction based on file size, manifest health, and snapshot patterns.
A guide to scheduling compaction and snapshot expiration in Apache Iceberg tables based on workload patterns and infrastructure constraints.
Explains how to manage Apache Iceberg table metadata by expiring old snapshots and rewriting manifests to prevent performance and cost issues.
Explains how to use sorting and Z-order clustering in Apache Iceberg tables to optimize query performance and data layout.
Explains techniques for incremental, non-disruptive compaction in Apache Iceberg tables under continuous streaming data ingestion.
Explains data compaction using bin packing in Apache Iceberg to merge small files, improve query performance, and reduce metadata overhead.
Explains how Apache Iceberg tables degrade without optimization, covering small files, fragmented manifests, and performance impacts.
A guide on how to find, join, and organize community meetups focused on Apache Iceberg and modern data lakehouse architectures.
An introductory guide to data engineering, explaining its role, key concepts, and how it differs from data science in the modern data ecosystem.