The secular Bayesian: Using belief distributions without really believing
A data scientist's journey from dogmatic Bayesianism to a pragmatic, 'secular' use of Bayesian tools without requiring belief in the model's literal existence.
A data scientist's journey from dogmatic Bayesianism to a pragmatic, 'secular' use of Bayesian tools without requiring belief in the model's literal existence.
A critique of the Oxford-Munich Code of Conduct for Data Scientists, focusing on its technical recommendations on sampling and data retention.
Explains the theory behind linear regression models, a fundamental machine learning algorithm for predicting continuous numerical values.
A technical guide exploring workarounds to update SQL Server statistics on secondary replicas in Availability Groups, including scripts and methods.
A statistical re-analysis of a published study on the mouse microbiome and autism, examining data and p-values from behavioral experiments.
Explains the mathematical derivation of logistic regression from Bayes' theorem, connecting fundamental statistics to machine learning.
A statistical analysis discussing the limitations of confidence intervals, using examples from small-area sampling to illustrate their weak properties.
A data scientist clarifies common misconceptions about the field, explaining that machine learning is only a small part of the job and advanced degrees aren't always required.
A technical analysis verifying a statistical calculation from an XKCD comic, involving normal distribution probabilities and R code.
A technical analysis of bus punctuality using Auckland Transport API data, with R code for data processing and visualization.
A technical exploration of Mean Squared Error, breaking it down into bias and variance to understand model performance and irreducible uncertainty.
A guide to six statistical methods (frequentist and Bayesian) for comparing group means, with R and Stan code examples.
Announcement for a lecture series on machine learning, covering topics like Weka, deep learning, algorithmic fairness, and sparse supervised learning.
A tutorial on using the infer package in R for hypothesis testing through simulation, following a modern statistical approach.
Analysis of a bug in New Zealand's official pseudo-random number generator used for electoral vote counting, based on the Wichmann-Hill algorithm.
Explores SQL Server 2019's improved DBCC CLONEDATABASE command for automatically extracting Columnstore Index statistics into a cloned database.
Explores the 'waiting time paradox' using probability, simulation, and real bus data to explain why average wait times often exceed the scheduled interval.
Discusses the proposal to lower p-value thresholds in statistical analysis, arguing it addresses symptoms not root causes of unreliable research.
Explains Chebyshev's inequality, a probability bound, and its application to calculating Upper Confidence Limits (UCL) in environmental monitoring.
Critiques a statistics position paper for ignoring computing, software, and reproducibility in modern statistical science and faculty evaluation.