Statistics articles

4/15/2022 • EN

Fast thinking on lichess.org

Analyzing chess game data from lichess.org to determine if fast thinking is the dominant factor in game outcomes across different time controls.

chess data analysis Lichess statistics Time Control

Emir U

2/20/2022 • EN

Nine and sixty ways

Discusses the practical choices in setting up asymptotic models for statistics, using examples from clinical trials and big data.

Asymptotic Analysis Probability Theory Randomized Trials statistics Test Statistics

Thomas Lumley

1/9/2022 • EN

Is December getting warmer? Modeling weather data in NJ

A data scientist uses NOAA data and statistical modeling to analyze if December temperatures in New Jersey are truly warming over time.

Climate Modeling data analysis Python statistics Weather Data

Will Kurt

10/3/2021 • EN

An Introduction to Fisher Information

An introduction to Fisher Information, a statistical concept that quantifies how much information data samples contain about unknown distribution parameters.

data analysis Fisher Information Parameter Estimation Probability Theory statistics

Awni Hannun

10/1/2021 • EN

The Logit-Normal: A ubiquitous but strange distribution!

Explores the logit-normal distribution, its mathematical properties, and its surprising role in statistical models like logistic regression.

Analytical Solutions Logistic Regression Logit Normal Probability Distributions statistics

Will Kurt

9/3/2021 • EN

Ordinal data, metadata, and models

Explores the mathematical and data science challenges of analyzing ordinal data, including tradeoffs in interpreting ordered scales and model limitations.

Data Modeling Measurement Scales metadata Ordinal Data statistics

Thomas Lumley

8/21/2021 • EN

Exploring R² and regression variance with Euler/Venn diagrams

A tutorial using Euler/Venn diagrams to visualize and explain the R² statistic and variance in regression models.

data visualization R R Squared Regression statistics

Andrew Heiss

8/21/2021 • EN

Pictures of code are not code

A critique of publishing code as images in academic papers, highlighting errors and reproducibility issues in statistical computing examples.

code quality Maximum Likelihood Estimation R Reproducible Research statistics

Thomas Lumley

7/16/2021 • EN

Not all strictly monotone functions are additive

A mathematical critique of additive scoring in grading and grant reviews, arguing for non-additive monotone functions.

education Grading mathematics Policy statistics

Thomas Lumley

6/10/2021 • EN

Causal inference 4: Causal Diagrams, Markov Factorization, Structural Equation Models

Explores the relationship between causal and statistical models, focusing on causal diagrams, Markov factorization, and structural equation models.

Causal Diagrams Causal Inference Markov Factorization statistics Structural Equation Models

Ferenc Huszár

5/2/2021 • EN

Generalisability, prediction, and causation

Explores the distinction between using regression models for causal inference versus predictive inference, and the role of generalizability in prediction.

Causal Inference Data Science Machine Learning Predictive Modeling statistics

Thomas Lumley

2/11/2021 • EN

Co-linearity

A statistical analysis of multicollinearity in regression models, discussing its impact on coefficient interpretation and prediction.

data analysis Modeling Multicollinearity Regression statistics

Thomas Lumley

1/5/2021 • EN

Inference and Prediction Part 2: Statistics

Explores the connection between machine learning and statistics by building a statistical inference model from a neural network example.

Inference Machine Learning Neural Network Perceptron statistics

Will Kurt

12/15/2020 • EN

Inference and Prediction Part 1: Machine Learning

Explores the difference between inference and prediction in data modeling, using a Click Through Rate (CTR) example to contrast Machine Learning and Statistics.

Data Modeling Inference Machine Learning Prediction statistics

Will Kurt

11/5/2020 • EN

Neyman Allocation, only exact

Explains Neyman allocation for optimal stratified sampling and its exact integer solution, linking it to US Electoral College apportionment.

Allocation Integer Programming optimization sampling statistics

Thomas Lumley

9/20/2020 • EN

Simple Anomaly Detection Using Plain SQL

A guide to implementing a simple anomaly detection system using only SQL and basic statistics, aimed at developers.

Anomaly Detection data analysis sql statistics Z Score

Haki Benita

8/4/2020 • EN

Weights in statistics

Explains the three main types of statistical weights (precision, frequency, sampling), their uses, and the software documentation challenges they create.

data analysis Software Documentation statistics Survey Sampling Weighted Least Squares

Thomas Lumley