Data what??
A guide explaining key data engineering terms like data warehouses, data lakes, data mesh, and data pipelines, with definitions and comparisons.
A guide explaining key data engineering terms like data warehouses, data lakes, data mesh, and data pipelines, with definitions and comparisons.
Guide on configuring an external Apache Hive metastore with Azure SQL for use in an Azure Synapse Analytics Spark Pool, troubleshooting common connection errors.
A recap of 2021 conference talks on Debezium and Change Data Capture (CDC), exploring patterns and integrations with tools like Kafka and Pinot.
A recap of 2021 conference talks on Debezium and Change Data Capture (CDC), exploring patterns and integrations with tools like Kafka and Infinispan.
Introducing Data Fluent, an open-source Python package for analyzing and understanding PostgreSQL database structure, row counts, and growth trends.
Announcing the free release of 'Practical MongoDB Aggregations', a book with tips and examples for developers and data professionals.
Explores the concept of feature stores in machine learning, presenting a hierarchy of needs from basic access to full automation.
Using bash shell tools like kafkacat, jq, sort, and uniq to perform a GROUP BY-style analysis on data from a Kafka topic.
An analysis of data discovery platforms, their key features, and available open-source solutions to improve data findability in organizations.
Argues that data scientists should own the entire process from problem identification to solution deployment for greater impact and efficiency.
Notes from Spark+AI Summit 2020 covering application-specific talks on ML frameworks, data engineering, feature stores, and data quality from companies like Airbnb and Netflix.
Answers common questions about data science in business, covering requirements, model interpretability, web scraping, and team roles.
A summary of a panel discussion on various data roles (data scientist, ML engineer, etc.), including key skills and career insights.
A technical guide on using SQL window functions to group discrete time-series events into user sessions for data analysis.
A tutorial on using Julia's string interpolation for automating repetitive data engineering tasks like querying multiple database tables.
A data engineer shares five practical lessons and performance tips for working with Apache Hive, focusing on common pitfalls and optimizations.
A tutorial on installing and configuring an 18-node Hadoop cluster on Amazon EC2 using Cloudera Manager.