Wednesday, April 15, 2015

Hadoop Terminology

  • Hadoop Ecosystem: It consists of Hadoop core project(HDFS, MapReduce) and other open source projects (Hive, Pig, Imapala, Sqoop etc) working on top of Hadoop to help in data analysis
  • PIG: High level language to  analyses large data set, internal it uses MR
  • Hive: Offers a SQL like language on top of MR
  • Imapala: develops as a way to query data in hadoop like SQL with out using MR, with low latency compare to Hive.
  • Sqoop: To migrate data from RDBMS to Hadoop
  • Flume:  To move data from external resource (logs etc) to Hadoop
  • Hbase: real time db on hdfs
  • Hue: GUI Frontend to cluster
  • Oozie: workflow management
  • Mahoot: Machine learning library