Statistics on pairs
Explores statistical estimation for complex samples, focusing on design-weighted U-statistics and their Hoeffding projections for pair-based analyses.
Explores statistical estimation for complex samples, focusing on design-weighted U-statistics and their Hoeffding projections for pair-based analyses.
A data scientist's 2017 year in review, highlighting top R, Python, and data visualization resources and projects shared each month.
Explores methods for computing tail probabilities of linear combinations of chi-squared variables, focusing on applications in genetics with large datasets.
Explores Bayesian inference when data strongly contradicts prior expectations, analyzing how heavy-tailed priors and likelihoods affect posterior beliefs.
A technical article exploring tail probability bounds for sums of random variables under 'sparse correlation' conditions, extending concepts like Bernstein's Inequality.
A data scientist shares his career journey from psychology to Lazada, debunks common myths about the field, and offers practical advice for aspiring practitioners.
A technical discussion on asymptotic approximations in stratified sampling when sampling probabilities approach zero, relevant for rare disease studies.
A two-day workshop on survival analysis, covering data exploration, regression modeling, and practical sessions for time-to-event data.
Explores a potential 'Polymath' project on the Wilcoxon test's non-transitive behavior with dice, connecting math and statistics.
A statistician argues that advanced math like calculus isn't a strict prerequisite for learning statistics, using personal experience and examples.
Explores using machine learning algorithms to predict outcomes in the NCAA March Madness basketball tournament, analyzing data and modeling techniques.
Announcing a public lecture series honoring statistician Ross Ihaka, featuring talks on statistical computing, data visualization, and data journalism.
Explores statistical scenarios where the bootstrap resampling method fails to provide accurate variance estimates or confidence intervals.
Explores defining and computing design-based pseudo-R-squared statistics for logistic regression models under complex survey sampling, like case-control designs.
Analyzing the Monty Hall problem, exploring learning strategies and optimal decisions based on observed game history and host behavior.
Critique of the classic iris dataset as a misleading example in modern machine learning education, exploring its original scientific purpose.
A data scientist shares a technical interview task on linear regression, covering data cleaning, model fitting, and assumption validation.
Explores computational challenges of large quadratic forms in genomics, focusing on eigenvalue approximations for high-dimensional statistical tests like SKAT.
Analyzing the relationship between age and desired job roles among new coders using the 2016 Kaggle survey data.
Using R code to generate permutations of digits (2,2,5,5,9,9), analyzing divisibility by 11 and primality.