Data Engineering Duke Fall 2023-2024
Overview of a university-level Data Engineering course syllabus covering tools, pipelines, AI pair programming, and project-based learning for Fall 2024.
Overview of a university-level Data Engineering course syllabus covering tools, pipelines, AI pair programming, and project-based learning for Fall 2024.
A list of upcoming tech talks and events by Alex Merced, focusing on Apache Iceberg, data lakehouses, and data engineering topics.
A video course covering the fundamentals of lakehouse engineering using Apache Iceberg, Nessie, and Dremio for data management.
A data professional shares their curated list of data tech blogs and explains their return to using RSS feeds to stay current in the field.
Explains three key Apache Iceberg features for data engineers: hidden partitioning, partition evolution, and tool compatibility.
A data engineer reflects on their 2-year career journey at the City of Boston, sharing lessons learned in data warehousing, ETL, and civic tech.
Explores the evolution of Apache Iceberg catalogs, focusing on the current REST Catalog and future proposals for server-side optimizations.
An introduction to Apache Iceberg, a table format for data lakehouses, explaining its architecture and providing learning resources.
A hands-on tutorial on building a data lakehouse pipeline using Spark, Dremio, and Superset to move and analyze data.
A guide to using Apache Flink's SQL Gateway REST API for submitting and managing SQL jobs, including practical examples with Postman and HTTPie.
Monthly roundup of articles and resources on data streaming, covering Flink, Kafka, Debezium, and streaming SQL developments.
Explains the role and types of catalogs in Apache Flink SQL, comparing them to traditional RDBMS systems and highlighting their importance in data management.
Interview with Suresh Srinivas on his career in big data, founding Hortonworks, scaling Uber's data platform, and leading the OpenMetadata project.
A comprehensive directory of resources for learning about and building Open Lakehouses using Apache Iceberg, Nessie, and Dremio.
Introduces Nessie as a self-managed catalog alternative to Hive & JDBC for Apache Iceberg, addressing limitations and new features.
Explores whether Debezium can lose database change events, explaining its at-least-once semantics and operational pitfalls like log retention.
Explores whether the Debezium change data capture tool can lose database events, discussing its at-least-once semantics and operational pitfalls.
Monthly roundup of data streaming trends, featuring Apache Iceberg, Kafka Streams, Flink deployments, and streaming SQL insights.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.
Explores seven practical use cases for Change Data Capture (CDC) in data engineering, including analytics, caches, and microservices.