Getting Started With OmniSci, Part 2: Electricity Dataset
A technical tutorial on using Python and pandas to process electricity data and load it into OmniSci (formerly MapD) for dashboard creation.
A technical tutorial on using Python and pandas to process electricity data and load it into OmniSci (formerly MapD) for dashboard creation.
A summary of a two-day workshop introducing R programming, data processing, visualization, and spatial analysis for beginners in geography and GIS.
A technical tutorial on building a data product using Python, Markov chains, and a dataset of science questions to generate random quiz questions.
A technical guide on processing millions of small text files using GNU Parallel and stream processing, without needing Hadoop or a database.
A follow-up article demonstrating a third method for sessionizing log data using R's data.table and magrittr packages.
A technical deep-dive into building a tag engine similar to Stack Overflow's, covering data processing, memory usage, and performance.
A guide to using the Unix command-line for efficient data science workflows, including data processing, exploration, and modeling.
A technical guide on using SQLite and Python's sqlite3 module to efficiently manage and query large datasets, replacing slow text file processing.
A guide to using SQLite and Python's sqlite3 module to efficiently manage and query large datasets from text files.
A guide to seven essential command-line tools (jq, csvkit, Rio, etc.) for data scientists to obtain, scrub, explore, and model data.
A tutorial on using Apache Hive to create tables and views from data loaded into a Hadoop cluster, continuing a multi-part series.
A practical guide introducing Hadoop's ecosystem and setting up a proof-of-concept cluster on Amazon EC2 using Cloudera for big data processing.
Article on optimizing OBIEE performance by pushing data processing to the database layer instead of the application server.