Getting Started: Adobe Analytics Clickstream Data Feed
A technical guide on analyzing Adobe Analytics Clickstream Data Feed using R, covering file structure, data verification, and initial processing.
A technical guide on analyzing Adobe Analytics Clickstream Data Feed using R, covering file structure, data verification, and initial processing.
Analyzes the historical and technical reasons behind R's controversial 'stringsAsFactors' default, explaining its origins and the problems it causes.
RSiteCatalyst v1.4.4 release notes detail a major bug fix for sparse data errors and minor updates to authentication messaging.
Critique of using Shapiro-Wilk normality tests on large, complex survey data like NHANES, explaining why it's statistically inappropriate.
Explains how to use SQL window functions and percentiles in Postgres for more meaningful data analysis than simple averages.
A guide to getting started with Structural Equation Modeling (SEM) in R using the Lavaan package, based on a user group presentation.
Interview with data scientist Jeroen Janssens about his background, work on data science at the command line, and his Data Science Toolbox project.
A guide to visualizing and diagnosing Generalized Linear Mixed Models (GLMMs) in R, based on a presentation and blog post by Jaime Ashander.
The article debunks common misinterpretations of the Dunning-Kruger effect by analyzing the original study's data and findings.
A tutorial explaining the internals of Principal Component Analysis (PCA) for dimensionality reduction in machine learning and data analysis.
A technical tutorial on sessionizing log data using the dplyr package in R, comparing it to a previous SQL-based approach.
Release notes for RSiteCatalyst v1.4.1, detailing bug fixes and new API functions for Adobe Analytics reporting in R.
A technical guide to Dixon's Q test for identifying outliers in small datasets, including its method, application, and criticisms.
A follow-up analysis of U.S. federal .gov domains, tracking changes in technology, security, and accessibility over three years.
A Python tutorial covering essential tools and techniques for machine learning, including data visualization, PCA, LDA, and classification.
A tutorial on using Python tools for machine learning, covering data loading, visualization, preprocessing, and classification with scikit-learn.
A practical guide to implementing Bayesian analysis in Python using MCMC packages like emcee, PyMC, and PyStan, with a line-fitting example.
A data scientist analyzes Seattle's bicycle counter data using Python to determine if cycling is truly increasing or just affected by good weather.
Article critiques a misleading report claiming no gender pay gap in tech, using evidence from the AAUW study to refute the claim.
A technical guide on using SQL window functions, specifically LAG, to calculate month-over-month revenue growth percentages for SaaS or recurring billing analysis.