Hadoop

Hadoop training in hyderabad

Course Content

 Introduction to Hadoop

  • What is Big Data?
  • What is Data so Important
  • Common problems
  • Defining Big Data
  • Exploring Data problem
  • Scaling up Vs. Scaling out
  • 3Vs of Big Data
  • What is Hadoop?
  • Hadoop Introduction
  • History of Hadoop and its Uses
  • Different Components of Hadoop
  • Various Hadoop Distributions

 HDFS (Hadoop Distributed File System)

  • Significance of HDFS in Hadoop
  • HDFS Features
  • Daemons of Hadoop and functionalities
  • NameNode
  • Data Node
  • Job Tracker
  • Task Track
  • Secondary NameNode

Data Storage in HDFS

  • Blocks
  • Heartbeats
  • Data Replication
  • HDFS Yarn
  • High Availability

Accessing HDFS

  • CLI (Command Line Interface) Unix and Hadoop Commands
  • Java Based Approach

Data Flow

  • Anatomy of a File Read
  • Anatomy of a File Write

Hadoop Archives

  • Setup Single Node Hadoop cluster

 MapReduce

  • Introduction to MapReduce
  • MapReduce Architecture
  • MapReduce Programming Model
  • MapReduce Algorithm and Phases
  • Data Types
  • Input Splits and Records
  • Blocks vs. Splits

Basic MapReduce Program

  • Driver Code
  • Mapper Code
  • Reducer Code
  • Combiner and Shuffler
  • Creating Input and Output formats in MapReduce Jobs
  • How to Debug MapReduce Jobs in Local

YARN and MR2

  • YARN
  • HDFS High -Availability
  • Name Node Federation
  • RM, NM
  • Setup Single Node Hadoop cluster
  • Setup Multi-Node Hadoop cluster

Pig

  • Introduction to Apache Pig
  • MapReduce vs. Apache Pig
  • SQL vs. Apache Pig
  • Different Data types in Apache Pig
  • Modes of Execution in Apache Pig
  • Local Mode
  • Distributed Mode
  • Execution Mechanism
  • Grunt shell
  • Script

Data Processing Operators

  • Loading and Storing Data
  • Filtering Data
  • Grouping and Joining Data
  • Sorting Data
  • Combining and Splitting Data
  • How to write a simple PIG Script
  • UDFs in PIG

 Sqoop

  • Introduction to Sqoop
  • Sqoop Architecture and Internals MySQL client and server installation
  • How to connect relational database using Sqoop Sqoop Commands
  • Different flavors of imports
  • Export
  • HIVE imports

Hive

  • The Metastore
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes

HiveQL

  • Data Types
  • Operators and Functions

Tables

  • Managed Tables and External Tables
  • Partitions and Buckets
  • Storage Formats
  • Importing Data
  • Altering Tables
  • Dropping Tables

Querying Data

  • Sorting and Aggregating
  • MapReduce Scripts
  • Joins
  • Sub queries
  • Views

User-Defined Functions

  • Writing a UDF
  • Writing a UDAF

 HBase

  • Introduction to HBase
  • HBase vs. HDFS
  • Use Cases

Basics Concepts

  • Column families
  • Scans
  • HBase

Architecture Clients

  • REST
  • Thrift
  • Java Based
  • Avro
  • Schema definition
  • Basic CRUD Operations

Introduction to Spark

Module 1-Introduction to Spark – Getting started

  • What is Spark and what is its purpose?
  • Components of the Spark unified stack
  • Resilient Distributed Dataset (RDD)
  • Downloading and installing Spark standalone
  • Scala and Python overview
  • Launching and using Spark’s Scala and Python shell

Module 2 – Resilient Distributed Dataset and DataFra

  • Understand how to create parallelized collections and external datasets
  • Work with Resilient Distributed Dataset (RDD) operations
  • Utilize shared variables and key-value pairs

Module 3 – Spark application programming

  • Understand the purpose and usage of the SparkContext
  • Initialize Spark with the various programming languages
  • Describe and run some Spark examples
  • Pass functions to Spark
  • Create and run a Spark standalone application
  • Submit applications to the cluster

Module 4 – Introduction to Spark libraries

  • Understand and use the various Spark libraries

Module 5 – Spark configuration, monitoring and tuning

  • Understand components of the Spark cluster
  • Configure Spark to modify the Spark properties, environmental variables
  • Monitor Spark using the web UIs, metrics, and external instrumentation
  • Understand performance tuning considerations

 

WORKING WITH FLUME

  • Introduction.
  • Configuration and Setup
  • Flume Sink with example
  • Flume Source with example

Introduction to Oozie

  • Mock Interviews

 

We are ready to start course at Traniee Comfortable Time

To Enroll/Enquire For Any Course Please Fill Below Details