Fast thinking on lichess.org
Analyzing chess game data from lichess.org to determine if fast thinking is the dominant factor in game outcomes across different time controls.
Analyzing chess game data from lichess.org to determine if fast thinking is the dominant factor in game outcomes across different time controls.
Discusses the practical choices in setting up asymptotic models for statistics, using examples from clinical trials and big data.
A data scientist uses NOAA data and statistical modeling to analyze if December temperatures in New Jersey are truly warming over time.
An introduction to Fisher Information, a statistical concept that quantifies how much information data samples contain about unknown distribution parameters.
Explores the logit-normal distribution, its mathematical properties, and its surprising role in statistical models like logistic regression.
Explores the mathematical and data science challenges of analyzing ordinal data, including tradeoffs in interpreting ordered scales and model limitations.
A tutorial using Euler/Venn diagrams to visualize and explain the R² statistic and variance in regression models.
A critique of publishing code as images in academic papers, highlighting errors and reproducibility issues in statistical computing examples.
A mathematical critique of additive scoring in grading and grant reviews, arguing for non-additive monotone functions.
Explores the relationship between causal and statistical models, focusing on causal diagrams, Markov factorization, and structural equation models.
Explores the distinction between using regression models for causal inference versus predictive inference, and the role of generalizability in prediction.
A statistical analysis of multicollinearity in regression models, discussing its impact on coefficient interpretation and prediction.
Explores the connection between machine learning and statistics by building a statistical inference model from a neural network example.
Explores the difference between inference and prediction in data modeling, using a Click Through Rate (CTR) example to contrast Machine Learning and Statistics.
Explains Neyman allocation for optimal stratified sampling and its exact integer solution, linking it to US Electoral College apportionment.
A guide to implementing a simple anomaly detection system using only SQL and basic statistics, aimed at developers.
Explains the three main types of statistical weights (precision, frequency, sampling), their uses, and the software documentation challenges they create.
Overview of new features in version 4.0 of the R survey package, focusing on improved contrast estimation and replicate handling.
Explores the statistical challenges and potential bias when adjusting stratification variables during multi-wave sampling for population estimation.
A tutorial on Probability and Statistics concepts, from basics to generalized linear models, presented at PyData NYC with Python examples.