Databases Deconstructed - The Value of Data Lakehouses and Table Formats
Explains the data lakehouse architecture, its layers (storage, table format, catalog, processing), and its advantages over traditional data warehouses.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
388 articles from this blog
Explains the data lakehouse architecture, its layers (storage, table format, catalog, processing), and its advantages over traditional data warehouses.
A video course covering the fundamentals of lakehouse engineering using Apache Iceberg, Nessie, and Dremio for data management.
An introduction to common sorting algorithms like Bubble Sort, Merge Sort, and Quick Sort, implemented and explained in JavaScript.
Explores Apache Iceberg's advanced partitioning features, including hidden partitioning and transformations, for optimizing query performance in data lakes.
Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.
A tutorial on using Dremio and Docker to run SQL queries directly on Excel files from your local machine.
A comprehensive guide to functional programming concepts in JavaScript, including pure functions, immutability, currying, memoization, and monads.
Explores the evolution of Apache Iceberg catalogs, focusing on the current REST Catalog and future proposals for server-side optimizations.
An introduction to Apache Iceberg, a table format for data lakehouses, explaining its architecture and providing learning resources.
A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.
An overview of five impactful open-source data projects, including Apache Iceberg and Arrow, that are revolutionizing data management and analytics.
Explains why Dremio is a top platform for Apache Iceberg lakehouses, highlighting features like dataset promotion and data reflections.
Explores Apache Iceberg's catalog system, its role in data lakehouse architecture, and key considerations for choosing the right catalog.
Explains the role, types, and selection criteria for catalogs in Apache Iceberg, a key component for managing data lakehouse tables.
Explores 10 reasons to adopt Apache Iceberg and Dremio for building a modern, flexible, and cost-effective data lakehouse architecture.
An introduction to ANSI SQL, covering its standardized syntax, key concepts like DDL, DML, joins, CTEs, and its importance for database interoperability.
Explains how ontologies structure data for better interoperability, integration, and analysis across domains like healthcare and finance.
An introductory guide to Python programming covering installation, syntax, data structures, and best practices for beginners.
Explains the data lakehouse architecture and the roles of Apache Iceberg, Nessie, and Dremio in modern data management.
A comprehensive guide to mastering the essential Git commands 'git pull' and 'git push', covering their anatomy, options, and best practices.