Randy Zwitch 4/18/2013

Getting Started Using Hadoop, Part 1: Intro

Read Original

This technical tutorial series introduces the Hadoop ecosystem and explains its value for parallel data processing. Part 1 covers core concepts, the rationale for using Hadoop over traditional tools for large-scale data, and outlines upcoming steps to set up a Hadoop cluster on Amazon EC2 with Cloudera, populate it with a sample airline dataset, and perform analytics using Hive and Pig.

Getting Started Using Hadoop, Part 1: Intro

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

1
The Beautiful Web
Jens Oliver Meiert 2 votes
3
LLM Use in the Python Source Code
Miguel Grinberg 1 votes
4
Wagon’s algorithm in Python
John D. Cook 1 votes