Statistics articles

9/29/2023 • EN

Solve a medical mystery with a confusion matrix 🧪

A data science tutorial using a confusion matrix to calculate the real probability of having a disease after a positive diagnostic test result.

classification Confusion Matrix Data Science Diagnostic Testing statistics

Kevin Markham i

8/15/2023 • EN

Manually generate predicted values for logistic regression with matrix multiplication in R

A guide to manually generating predicted values for logistic regression using matrix multiplication in R, as an alternative to the predict() function.

Logistic Regression Matrix Multiplication Predict R statistics

Andrew Heiss

8/12/2023 • EN

The ultimate practical guide to multilevel multinomial conjoint analysis with R

A technical guide to performing multilevel multinomial conjoint analysis using R, Bayesian modeling, and statistical packages.

Bayesian Conjoint Analysis Hierarchical Models R statistics

Andrew Heiss

7/30/2023 • EN

An optimal-stopping quant riddle

A detailed analysis of an optimal stopping problem involving drawing cards for reward, exploring mathematical strategies and first-principles reasoning.

algorithm Optimal Stopping Probability Quantitative Finance statistics

Emir U

5/5/2023 • EN

Pairwise likelihood and cluster sizes

A technical exploration of using pairwise likelihood in linear mixed models with complex sampling, comparing results from svylme and lme4 packages.

Linear Mixed Models Pairwise Likelihood R statistics Svylme

Thomas Lumley

4/18/2023 • EN

Ranks in survey data

Explores the challenges of applying signed rank tests to complex survey data and proposes a design-independent rank transformation method.

Complex Sampling Rank Tests statistics Survey Data Svymean

Thomas Lumley

3/7/2023 • EN

The fourth-root thing

A technical discussion on the 'fourth-root' condition for estimator consistency in statistical models like GEE, exploring asymptotic theory and nuisance parameters.

Asymptotic Theory Estimation Theory Generalized Estimating Equations Nuisance Parameters statistics

Thomas Lumley

3/4/2023 • EN

Linear Regression, the essential theory

Explains the core theory behind linear regression models, a fundamental machine learning algorithm for predicting continuous numerical values.

Linear Regression Machine Learning Model Interpretability statistics

Stern Semasuka

2/4/2023 • EN

The Magic of Sampling, and its Limitations

Explains statistical sampling using a Go program example to estimate population percentages, highlighting its power and practical limits in tech contexts.

go programming sampling simulation statistics

Russ Cox

1/21/2023 • EN

Metal bands bring happiness (as chocolate brings Nobel Prizes)

A humorous analysis exploring the correlation between the number of metal bands per capita and national happiness scores in European countries.

Correlation data visualization Metal Music statistics

Piotr Migdał

12/31/2022 • EN

Visualize mixed effect regressions in R with GGplot2

A tutorial on visualizing mixed effect regression models and their uncertainty using non-parametric bootstrapping in R with ggplot2.

data visualization Ggplot2 Mixed Models R statistics

Andis Ariet

12/9/2022 • EN

Pairwise and joint independence

Explains why pairwise independence of variables does not imply joint independence, using a chessboard as an intuitive counterexample.

Independence mathematics Probability Random Variables statistics

Thomas Lumley

12/1/2022 • EN

Understanding Convolutions in Probability: A Mad-Science Perspective

Explores convolutions in probability theory, explaining how they combine distributions and compute sums of random variables.

Convolutions mathematics Probability Random Variables statistics

Will Kurt

12/1/2022 • EN

The sandwich and the t-test

Explores the connection between the Welch-Satterthwaite t-test and linear regression using the sandwich variance estimator.

Linear Regression Sandwich Estimator statistics T Test Welch Test

Thomas Lumley

11/29/2022 • EN

Marginal and conditional effects for GLMMs with {marginaleffects}

A guide to calculating marginal and conditional effects in generalized linear mixed models (GLMMs) using the R {marginaleffects} package.

Glmm Marginaleffects R Regression statistics

Andrew Heiss

10/10/2022 • EN

Using Censored Data to Estimate a Normal Distribution

A statistical analysis of estimating a normal distribution using binary (yes/no) predictions from multiple scientists, applied to a temperature forecasting problem.

Bayesian Inference Data Modeling Normal Distribution Probability statistics

Will Kurt

8/21/2022 • EN

SQLite has pretty limited builtin functions

Article discusses SQLite's limited built-in functions, compares it to other databases, and introduces a Go-based standard library extension.

Aggregate Functions go sqlite standard library statistics

Phil Eaton

5/20/2022 • EN

Marginalia: A guide to figuring out what the heck marginal effects, marginal slopes, average marginal effects, marginal effects at the mean, and all these other marginal things are

A guide explaining marginal effects in regression analysis, including definitions and differences between types like average marginal effects, using R packages.

data analysis Marginal Effects R Regression statistics

Andrew Heiss