What I Do Before a Data Science Project to Ensure Success
A data scientist shares three essential pre-project tasks—the one-pager, time-box, and breakdown—to avoid common pitfalls and ensure project success.
A data scientist shares three essential pre-project tasks—the one-pager, time-box, and breakdown—to avoid common pitfalls and ensure project success.
Overview of new features in version 4.0 of the R survey package, focusing on improved contrast estimation and replicate handling.
Explains how to use Monte Carlo analysis for product development, using TweetDeck screen capacity as a practical example.
A review of the best #TidyTuesday data visualization submissions from 2019, highlighting creative and insightful uses of R and ggplot2.
A guide on using PowerShell and a matrix/spreadsheet approach to visualize and audit Active Directory group memberships for IT administration.
Tips for using Google BigQuery's public datasets while managing and minimizing query costs, including using the free tier and setting budgets.
A guide to common SQL mistakes and optimization opportunities for developers and data professionals, covering integer division, UNION vs UNION ALL, and query performance.
Compares the runtime performance of pandas' crosstab, groupby, and pivot_table methods for data aggregation.
A statistical re-analysis of a published study on the mouse microbiome and autism, examining data and p-values from behavioral experiments.
A statistical analysis discussing the limitations of confidence intervals, using examples from small-area sampling to illustrate their weak properties.
A technical walkthrough of creating a word cloud visualization from highly-gilded Reddit comments using Python, spaCy, and BigQuery.
A data scientist clarifies common misconceptions about the field, explaining that machine learning is only a small part of the job and advanced degrees aren't always required.
An analysis of user-created Sankey diagrams from Reddit, visualizing personal Tinder match data and dating outcomes.
A tutorial on creating line graphs in R using the ggplot2 package's geom_line function, with examples using the built-in Orange dataset.
Blog author offers free 45-minute one-on-one R training sessions to 10 people, focusing on data analysis, visualization, and package development.
A developer explores investigative journalism, drawing parallels between source control diffs and uncovering truth in legal documents and online comments.
A technical analysis of bus punctuality using Auckland Transport API data, with R code for data processing and visualization.
An experiment testing if players with feminine usernames receive different in-game chat comments than those with masculine names in Overwatch.
An article arguing that SQL is one of the most valuable and enduring technical skills across various roles like engineering and product management.
Analysis of JSHeroes 2019 conference CFP data, revealing submission patterns and workshop details for the JavaScript event.