Data Quality Is a Pipeline Problem, Not a Dashboard Problem
Argues that data quality must be enforced at the pipeline's ingestion point, not patched in dashboards, to ensure consistent, reliable data.
Argues that data quality must be enforced at the pipeline's ingestion point, not patched in dashboards, to ensure consistent, reliable data.
Explains the importance of pipeline observability for data health, covering metrics, logs, and lineage to detect issues beyond simple execution monitoring.
Explains idempotent data pipelines, patterns like partition overwrite and MERGE, and how to prevent duplicate data during retries.
A guide to the core principles and systems thinking required for data engineering, beyond just learning specific tools.
A practical, tool-agnostic checklist of essential best practices for designing, building, and maintaining reliable data engineering pipelines.
A monthly roundup of curated links and articles focused on data engineering, Apache Kafka, and data platform technologies.
Explains the causes of bias in AI systems, focusing on training data and proxy variables, and offers practical steps for developers to mitigate it.
Explores the importance of data quality and validation in data engineering, covering key dimensions and tools for reliable pipelines.
Notes on dataset engineering from Chip Huyen's 'AI Engineering', covering data curation, quality, coverage, quantity, and acquisition for AI models.
An interview with Salma Bakouk, CEO of Sifflet, discussing data stack observability, data quality, lineage, and building a modern data team.
Explores the importance of high-quality human-annotated data for training AI models, covering task design, rater selection, and the wisdom of the crowd.
Interview with Chad Sanderson on data platform leadership, experimentation culture, data quality, and the rise of data contracts.
An introduction to Great Expectations, an open-source Python tool for data quality testing, documentation, and profiling.
Adding a PDF course completion report for students in a SaaS application built with Python and Django.
Explores six unexpected challenges that arise after deploying machine learning models in production, from data schema changes to organizational issues.
An enterprise architect discusses the challenges of data validation speed, automation, and the essential role of human intuition in ensuring data quality.