Canada's 13M Buildings
An analysis of Canada's new national building footprint dataset, exploring its sources, technical setup, and initial processing steps.
An analysis of Canada's new national building footprint dataset, exploring its sources, technical setup, and initial processing steps.
A statistical reasoning test with three practical problems on sorting uncertain fractions, highlighting anomalies, and estimating population sizes.
Argues that reading raw AI input/output data is essential for developing true intuition about system behavior, beyond just metrics.
Explains the statistical concept of included-variable bias in regression models, challenging the traditional 'omitted-variable bias' framing.
Argues that effective AI product evaluation requires a scientific, process-driven approach, not just adding LLM-as-judge tools.
A technical analysis using R to classify iris images from a dataset, applying PCA and LDA for machine learning classification.
A tutorial on using pandas to calculate scoring streaks or runs in basketball data, demonstrating data manipulation techniques.
Explains the key differences between the = and <- assignment operators in the R programming language, focusing on scoping and side effects.
A hands-on review of the new DuckDB UI, exploring its features for data analysis and comparing it to previous workflows with Rill Data.
A technical walkthrough of loading and exploring UK Environment Agency flood data using DuckDB and Rill for a streaming pipeline project.
A tutorial for R users on mastering data wrangling in 5 progressive levels, using the dplyr package and the Ames housing dataset.
A comparison of the native Base R pipe (|>) and the {magrittr} pipe (%>%), covering their syntax, strictness, and use cases for data analysis.
A tech blogger analyzes the decline in their blog's ad revenue, linking it to the rise of AI tools like ChatGPT and GitHub Copilot.
A tutorial on using R to parse Apple Music XML data and create personalized listening statistics similar to Spotify Wrapped.
Analyzing ORNL's AI-generated dataset of 131 million US building footprints, including download, setup, and technical analysis.
The Big Book of R adds five new free, open-source books covering R programming for production, survey analysis, causal inference, biodiversity data, and natural resources.
Analyzing Baltic Sea maritime traffic data using the Finnish Transport Infrastructure Agency's open AIS API with Python and DuckDB.
Explores Bayesian alternatives to the frequentist t-test for comparing two means, discussing non-parametric and resampling-based approaches.
Analysis of 5 years of Hacker News 'Who's Hiring' thread data using Deno and the HN API to visualize tech hiring trends.
A technical guide explaining why ggplot2 line charts sometimes appear blank and how to fix the issue, focusing on data structure and grouping.