Partitioning Practices in Apache Hive and Apache Iceberg
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
Alex Merced — Developer and technical writer sharing in-depth insights on data engineering, Apache Iceberg, data lakehouse architectures, Python tooling, and modern analytics platforms, with a strong focus on practical, hands-on learning.
388 articles from this blog
Compares partitioning techniques in Apache Hive and Apache Iceberg, highlighting Iceberg's advantages for query performance and data management.
A comprehensive guide to JavaScript Promises, covering basics, error handling, advanced methods like Promise.all(), and real-world use cases.
Compares columnar vs. row-based data structures, explaining their optimal use in OLAP and OLTP systems for performance and scalability.
Table of Contents Context Introduction Short Version for Quick Readers My Journey with Table Formats and Lakehouses Ecosystem Over Features Key Takeaw
An introduction to Data Vault modeling, a flexible data warehouse design method using Hubs, Links, and Satellites for scalable data integration.
Explores the Data Lakehouse architecture and the roles of Apache Iceberg and Dremio in modern, integrated data management.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
A no-code tutorial on converting XLS/CSV files to Parquet format using Dremio, including setup via Docker.
An introduction to HTMX, a modern library for building dynamic web interfaces using HTML with minimal JavaScript, and how to use it.
Explores how Dremio's platform simplifies building and managing Apache Iceberg-based data lakehouses with governance, performance, and self-service.
A comprehensive guide to implementing Object-Oriented Programming (OOP) design patterns in JavaScript, covering creational, structural, and behavioral patterns.
Explores Apache Iceberg and Project Nessie, key open-source technologies powering the flexible and vendor-neutral Open Lakehouse data architecture.
A guide on learning software development effectively, covering language choice, early practice with simple challenges, and building a todo app.
A step-by-step tutorial on building a JSON API in Scala using the Play framework, covering project setup, database configuration, and controller creation.
A step-by-step tutorial on building a JSON API using Java Spring Boot, Maven, and PostgreSQL.
A guide to building a cost-effective, high-performance, and self-service data lakehouse architecture, addressing common pitfalls and outlining key principles.
A tutorial on building full CRUD REST APIs using Flask and FastAPI with the Psycopg2 PostgreSQL adapter, comparing it to ORMs.
A tutorial on building a local Data Lakehouse using Docker Compose with Apache Spark, Minio, Dremio, and Nessie.
Project Nessie is a version control system for data lakes, bringing Git-like operations to manage and track changes in data assets.