Big Data articles

3/5/2018 • EN

Faster generalised linear models in largeish data

A method for faster generalized linear models on large datasets using a single database query and one Newton-Raphson iteration.

Big Data Generalized Linear Models optimization R Statistical Computing

Thomas Lumley

2/1/2018 • EN

Getting Started With OmniSci, Part 1: Docker Install and Loading Data

A tutorial on installing OmniSci (formerly MapD) using Docker and loading data for GPU-accelerated SQL analytics and visualization.

Big Data data visualization docker Gpu Computing SQL Analytics

Randy Zwitch

11/21/2017 • EN

Installing Oracle GoldenGate for Big Data 12.3.1 with Kafka Connect and Confluent Platform

A technical guide on installing and configuring Oracle GoldenGate for Big Data with Kafka Connect and Confluent Platform.

Apache Kafka Big Data Confluent Platform Data Integration Oracle Goldengate

Robin Moffatt

9/20/2017 • EN

Apache Kafka™ talks at Oracle OpenWorld, JavaOne, and Oak Table World 2017

A list of 19 Apache Kafka-related technical sessions at Oracle OpenWorld, JavaOne, and Oak Table World 2017 conferences.

Apache Kafka Big Data Data Pipeline Microservices Stream Processing

Robin Moffatt

10/17/2016 • EN

Is Privacy Dead?

A personal reflection on the trade-offs between convenience and privacy in an era of AI, IoT, and pervasive data collection.

artificial intelligence Big Data cybersecurity Internet Of Things privacy

Carlos Mendible

5/20/2016 • EN

Better Python compressed persistence in joblib

Explains improvements in joblib's compressed persistence for Python, focusing on reduced memory usage and single-file storage for large numpy arrays.

Big Data compression Joblib Persistence Python

Gael Varoquaux

2/7/2016 • EN

Sentiment analysis of tweets

Technical guide on building a real-time Twitter sentiment analysis system using Apache Kafka and Storm.

Apache Kafka Apache Storm Big Data Real Time Processing Sentiment Analysis

Marçal Serrate

12/13/2015 • EN

Big Data: streams and lambdas

Explains Lambda Architecture for Big Data, combining batch processing (Hadoop) and real-time stream processing (Spark, Storm) to handle large datasets.

Batch Processing Big Data Hadoop Lambda Architecture Stream Processing

Marçal Serrate

10/25/2015 • EN

Building Data Pipelines with Microsoft Azure Data Factory

A tutorial on building data pipelines using Microsoft Azure Data Factory, covering ingestion, transformation, and orchestration.

Azure Data Factory Big Data cloud computing Data Pipelines Etl

Rahul Rai

8/22/2014 • EN

Hacking Academia: Data Science and the University

A reflection on the challenges of data science in academia, discussing the 'brain drain' of data skills and the need for systemic change.

Academia Big Data conference Data Science Research

Jake VanderPlas

6/12/2014 • EN

Five Hard-Won Lessons Using Hive

A data engineer shares five practical lessons and performance tips for working with Apache Hive, focusing on common pitfalls and optimizations.

Big Data Data Engineering Hadoop Hive sql

Randy Zwitch

5/2/2014 • EN

MongoDB Connector for Hadoop with Authentication - Quick Tip

Fixing MongoDB Connector for Hadoop authentication errors by granting the clusterManager role to the user.

authentication Big Data Connector Hadoop mongodb

Paul Done

2/25/2014 • EN

Not Your Father's Cloud: Microsoft Azure HDInsights Explained

An explanation of Microsoft Azure HDInsights, a managed Apache Hadoop service for processing big data on Azure.

Azure Big Data cloud computing data analysis Hadoop

Simon Waight

1/12/2014 • EN

Getting Started With Hadoop, Final: Analysis Using Hive & Pig

Final tutorial on analyzing airline data with Hadoop using Hive for SQL queries and Pig for scripting, covering setup and basic analytics.

Big Data data analysis Hadoop Hive Pig

Randy Zwitch

10/26/2013 • EN

The Big Data Brain Drain: Why Science is in Trouble

Explores how the demand for big data skills in industry is draining talent from academic science, threatening research.

Academia Big Data Data Science Industry Research

Jake VanderPlas

8/22/2013 • EN

Getting Started Using Hadoop, Part 4: Creating Tables With Hive

A tutorial on using Apache Hive to create tables and views from data loaded into a Hadoop cluster, continuing a multi-part series.

Big Data data processing Hadoop Hive sql

Randy Zwitch

7/8/2013 • EN

Big data linear models

Explains how to parallelize QR decomposition for linear models on big data using R's biglm package and incremental merging.

Big Data Linear Models Parallel Computing Qr Decomposition R

Thomas Lumley

4/18/2013 • EN

Getting Started Using Hadoop, Part 1: Intro

A practical guide introducing Hadoop's ecosystem and setting up a proof-of-concept cluster on Amazon EC2 using Cloudera for big data processing.

Amazon Ec2 Big Data Cloudera data processing Hadoop

Randy Zwitch

4/8/2013 • EN

Instructions for Installing & Using R on Amazon EC2

A guide to installing and using R on Amazon EC2 instances to overcome in-memory limitations for big data analysis.

Amazon Ec2 Big Data cloud computing data analysis R

Randy Zwitch

11/21/2012 • EN

DevNexus 2013 - Feb 18/19 - Registration is Open

Announcement for DevNexus 2013, a Java/JVM technology conference in Atlanta, featuring sessions on cloud, mobile, web, and more.

Big Data Cloud Java jvm nosql

Gunnar Hillert

Big Data Articles

Faster generalised linear models in largeish data

Getting Started With OmniSci, Part 1: Docker Install and Loading Data

Installing Oracle GoldenGate for Big Data 12.3.1 with Kafka Connect and Confluent Platform

Apache Kafka™ talks at Oracle OpenWorld, JavaOne, and Oak Table World 2017

Is Privacy Dead?

Better Python compressed persistence in joblib

Sentiment analysis of tweets

Big Data: streams and lambdas

Building Data Pipelines with Microsoft Azure Data Factory

Hacking Academia: Data Science and the University

Five Hard-Won Lessons Using Hive

MongoDB Connector for Hadoop with Authentication - Quick Tip

Not Your Father's Cloud: Microsoft Azure HDInsights Explained

Getting Started With Hadoop, Final: Analysis Using Hive & Pig

The Big Data Brain Drain: Why Science is in Trouble

Getting Started Using Hadoop, Part 4: Creating Tables With Hive

Big data linear models

Getting Started Using Hadoop, Part 1: Intro

Instructions for Installing & Using R on Amazon EC2

DevNexus 2013 - Feb 18/19 - Registration is Open

Select Language

We use cookies