Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups
A guide to six statistical methods (frequentist and Bayesian) for comparing group means, with R and Stan code examples.
A guide to six statistical methods (frequentist and Bayesian) for comparing group means, with R and Stan code examples.
A summary of a panel discussion on various data roles (data scientist, ML engineer, etc.), including key skills and career insights.
Announcing the completion of the open-source book 'Geocomputation with R', detailing its collaborative creation, purpose, and availability.
A guide on using the ELK Stack (Elasticsearch, Logstash, Kibana) to analyze and triage large-scale Nmap scan results for penetration testing and offensive security.
A guide to using functions and packages in R for data analysis, covering installation, recommended packages, and data manipulation tools.
Explores the 'waiting time paradox' using probability, simulation, and real bus data to explain why average wait times often exceed the scheduled interval.
A technical analysis of Stack Overflow's 2018 survey data, visualizing global developer response rates per capita using Python, pandas, and GeoPandas.
Analysis of the 2018 Stack Overflow Developer Survey results, ranking technologies developers worked with and want to work with.
A technical tutorial on using Apache Kafka, KSQL, and Elasticsearch to analyze network event data, detect anomalies, and set up automated alerts.
Explains why focusing on median or average performance metrics is misleading and advocates for analyzing the long-tail of data to improve user experience.
Presentation slides for a Power BI tips and tricks talk at DataBISummit, available for download.
Discusses the proposal to lower p-value thresholds in statistical analysis, arguing it addresses symptoms not root causes of unreliable research.
Explains Chebyshev's inequality, a probability bound, and its application to calculating Upper Confidence Limits (UCL) in environmental monitoring.
Analyzes decision-making quality in sports and board games, where clear data reveals the high cost of poor choices.
A technical guide on using R's rvest package to scrape book descriptions and genres from Goodreads, adapting code from an existing project.
Part 2 of a series analyzing gender differences in dating dynamics, focusing on challenges and perspectives for nerds.
An analysis of Hacker News moderation tools and practices, based on data scraped from the site's API.
A roundup of blog posts and resources discussing various data analysis workflows and tools in the R programming language.
A critique of data visualization choices in a KCSE exam analysis, comparing heat maps to line graphs for clarity.
A technical guide on using the rgoodreads R package to analyze personal Goodreads reading data and critique the 5-star rating system.