Batch vs. Streaming: Choose the Right Processing Model
A guide to choosing between batch and streaming data processing models based on actual freshness requirements and cost.
A guide to choosing between batch and streaming data processing models based on actual freshness requirements and cost.
A technical walkthrough of converting the US Wind Turbine Database to Parquet format and analyzing it using tools like GDAL, DuckDB, and QGIS.
A technical walkthrough of converting the massive OpenBuildingMap dataset (2.7B buildings) into a columnar Parquet format for efficient cloud analysis.
Exploring the GM-SEUS dataset of US solar farms using GIS tools like QGIS and DuckDB for spatial data analysis.
Explores building AI Agents as streaming SQL queries using platforms like Apache Flink for improved consistency, scalability, and developer experience.
Explores building AI Agents as streaming SQL queries using platforms like Apache Flink for improved consistency, scalability, and developer experience.
A technical guide on downloading and analyzing Canada's National Address Register (15.8M addresses) using Python, DuckDB, and QGIS to create settlement centroids.
A tutorial on building a beginner-friendly Model Context Protocol (MCP) server in Python to connect Claude AI with local CSV and Parquet files.
Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.
A developer documents the first steps in building a personalized content recommendation system using saved articles, text embeddings, and algorithms.
A technical tutorial on using the UNNEST operator in Flink SQL to explode nested arrays of sensor data into separate rows.
Introduces the 'leopards' Python library for filtering and aggregating lists, offering a lightweight alternative to pandas for basic data operations.
A technical guide on processing Overture Maps' global land cover dataset, focusing on extracting and analyzing Australia's data using DuckDB and QGIS.
Exploring Japan's building footprint data from the Flateau project, which converts 3D CityGML data into 2D Parquet files for analysis.
Analysis of a research paper detailing an AI model that extracted 281 million building footprints from satellite imagery across East Asia.
A technical analysis of Maxar's high-resolution global satellite imagery basemap, examining 60GB of data across 11 cities using GDAL, Python, and DuckDB.
A technical guide on downloading and analyzing free Synthetic Aperture Radar (SAR) satellite imagery from Umbra's open data program.
A technical guide to solving the One Billion Row Challenge (1BRC) using SQL and DuckDB, including data loading and aggregation.
A benchmark comparison of several Python libraries for reading Excel files, focusing on speed, type handling, and correctness.
A guide to jq, a powerful command-line JSON processor for developers, covering installation, basic usage, and productivity tips.