DEV-BL-203 | HDP Developer: Apache Pig and Hive (Blended)

DEV-BL-203 | HDP Developer: Apache Pig and Hive (Blended)

This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive.

About this course

Subject-Matter Expert Self-Paced Live Micro-Lessons

Overview:
This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition and using Pig and Hive to perform data analytics on Big Data. Labs are executed on a 7-node HDP cluster. Regularly scheduled live micro-learning sessions will also be delivered by Hortonworks University Instructors to discuss various course related topics to further enhance and supplement the self-paced content.

Target Audience:
Software developers who need to understand and develop applications for Hadoop.

Prerequisites:
Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Format:
Slide Lessons and PDF Lab Guides (This course does not contain audio or video)
Downloadable VM for Lab Exercises
Regularly Scheduled Live Micro-learning Sessions (Check the live session schedule for upcoming dates)

Credit Hours: 4

Curriculum

  • Upcoming Live Micro-Learning Sessions
  • Blended Learning: Pig and Hive | Live Micro-Learning Session #2
  • Recorded Micro-Learning Sessions
  • Micro-Session #1 (02/20/2018)
  • Micro-Session #1 Notes (02/20/2018)
  • Course Documents
  • HDP Developer - Apache Pig and Hive-Student Guide-Rev 6.1.pdf
  • HDP Developer - AWS Guacamole Setup Guide-Rev 6.1.pdf
  • HDP_Developer_-_Apache_Pig_and_Hive-Downloadable_Setup_Guide-Rev_6.1.pdf
  • Apache Pig and Hive Course Slides-Rev 6.1.pdf
  • Lesson 1
  • Lesson 1: Understanding Hadoop
  • Lesson Review
  • Lab Guide: Starting an HDP 2.3 Cluster
  • Lesson 2
  • Lesson 2: Introduction to the Hadoop Distributed File System (HDFS)
  • Lesson Review
  • Demonstration: Understanding Block Storage
  • Lab Guide: Using HDFS Commands
  • Lesson 3
  • Lesson 3: Inputting Data Into HDFS
  • Lesson Review
  • Lab Guide: Importing RDBMS Data into HDFS
  • Lab Guide: Exporting HDFS Data to an RDBMS
  • Lab Guide: Importing Log Data into HDFS using Flume
  • Lesson 4
  • Lesson 4: The MapReduce Framework
  • Lesson Review
  • Demonstration: Understanding MapReduce
  • Lab Guide: Running a MapReduce Job
  • Lesson 5
  • Lesson 5: Introduction to Pig
  • Lesson Review
  • Demonstration: Understanding PIG
  • Lab Guide: Getting Started with PIG
  • Lab Guide: Exploring Data with PIG
  • Lesson 6
  • Lesson 6: Advanced Pig Programming
  • Lesson Review
  • Lab Guide: Splitting a Dataset
  • Lab Guide: Joining Datasets with PIG
  • Lab Guide: Preparing Data for Hive
  • Demonstration Guide: Computing PageRank
  • Lab Guide: Analyzing Clickstream Data
  • Lab Guide: Analyzing Stock Market Data using Quantiles
  • Lesson 7
  • Lesson 7: Hive Programming
  • Lesson Review
  • Lab Guide: Understanding Hive Tables
  • Demonstration: Understanding Partitions and Skew
  • Lab Guide: Analyzing Big Data with Hive
  • Demonstration: Computing ngrams
  • Lab Guide: Joining Datasets in Hive
  • Lab Guide: Computing ngrams of Emails in Avro Format
  • Lesson 8
  • Lesson 8: Using HCatalog
  • Lesson Review
  • Lab Guide: Using HCatalog with Pig
  • Lesson 9
  • Lesson 9: Advanced Hive Programming
  • Lesson Review
  • Lab Guide: Advanced Hive Programming
  • Lesson 10
  • Lesson 10: Hadoop 2 and YARN
  • Lesson Review
  • Lab Guide: Running a YARN Application
  • Lesson 11
  • Lesson 11: Introducing Apache Spark
  • Lesson Review
  • Lesson 12
  • Lesson 12: Programming with Apache Spark
  • Lesson Review
  • Lab Guide: Getting Started with Apache Spark
  • Lesson 13
  • Lesson 13: Spark SQL and DataFrames
  • Lab Guide: Exploring Spark SQL
  • Lesson 14
  • Lesson 14: Defining Workflow with Oozie
  • Lesson Review
  • Lab Guide: Defining an Oozie Workflow
  • Wrapping Up
  • Course Survey

About this course

Subject-Matter Expert Self-Paced Live Micro-Lessons

Overview:
This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition and using Pig and Hive to perform data analytics on Big Data. Labs are executed on a 7-node HDP cluster. Regularly scheduled live micro-learning sessions will also be delivered by Hortonworks University Instructors to discuss various course related topics to further enhance and supplement the self-paced content.

Target Audience:
Software developers who need to understand and develop applications for Hadoop.

Prerequisites:
Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Format:
Slide Lessons and PDF Lab Guides (This course does not contain audio or video)
Downloadable VM for Lab Exercises
Regularly Scheduled Live Micro-learning Sessions (Check the live session schedule for upcoming dates)

Credit Hours: 4

Curriculum

  • Upcoming Live Micro-Learning Sessions
  • Blended Learning: Pig and Hive | Live Micro-Learning Session #2
  • Recorded Micro-Learning Sessions
  • Micro-Session #1 (02/20/2018)
  • Micro-Session #1 Notes (02/20/2018)
  • Course Documents
  • HDP Developer - Apache Pig and Hive-Student Guide-Rev 6.1.pdf
  • HDP Developer - AWS Guacamole Setup Guide-Rev 6.1.pdf
  • HDP_Developer_-_Apache_Pig_and_Hive-Downloadable_Setup_Guide-Rev_6.1.pdf
  • Apache Pig and Hive Course Slides-Rev 6.1.pdf
  • Lesson 1
  • Lesson 1: Understanding Hadoop
  • Lesson Review
  • Lab Guide: Starting an HDP 2.3 Cluster
  • Lesson 2
  • Lesson 2: Introduction to the Hadoop Distributed File System (HDFS)
  • Lesson Review
  • Demonstration: Understanding Block Storage
  • Lab Guide: Using HDFS Commands
  • Lesson 3
  • Lesson 3: Inputting Data Into HDFS
  • Lesson Review
  • Lab Guide: Importing RDBMS Data into HDFS
  • Lab Guide: Exporting HDFS Data to an RDBMS
  • Lab Guide: Importing Log Data into HDFS using Flume
  • Lesson 4
  • Lesson 4: The MapReduce Framework
  • Lesson Review
  • Demonstration: Understanding MapReduce
  • Lab Guide: Running a MapReduce Job
  • Lesson 5
  • Lesson 5: Introduction to Pig
  • Lesson Review
  • Demonstration: Understanding PIG
  • Lab Guide: Getting Started with PIG
  • Lab Guide: Exploring Data with PIG
  • Lesson 6
  • Lesson 6: Advanced Pig Programming
  • Lesson Review
  • Lab Guide: Splitting a Dataset
  • Lab Guide: Joining Datasets with PIG
  • Lab Guide: Preparing Data for Hive
  • Demonstration Guide: Computing PageRank
  • Lab Guide: Analyzing Clickstream Data
  • Lab Guide: Analyzing Stock Market Data using Quantiles
  • Lesson 7
  • Lesson 7: Hive Programming
  • Lesson Review
  • Lab Guide: Understanding Hive Tables
  • Demonstration: Understanding Partitions and Skew
  • Lab Guide: Analyzing Big Data with Hive
  • Demonstration: Computing ngrams
  • Lab Guide: Joining Datasets in Hive
  • Lab Guide: Computing ngrams of Emails in Avro Format
  • Lesson 8
  • Lesson 8: Using HCatalog
  • Lesson Review
  • Lab Guide: Using HCatalog with Pig
  • Lesson 9
  • Lesson 9: Advanced Hive Programming
  • Lesson Review
  • Lab Guide: Advanced Hive Programming
  • Lesson 10
  • Lesson 10: Hadoop 2 and YARN
  • Lesson Review
  • Lab Guide: Running a YARN Application
  • Lesson 11
  • Lesson 11: Introducing Apache Spark
  • Lesson Review
  • Lesson 12
  • Lesson 12: Programming with Apache Spark
  • Lesson Review
  • Lab Guide: Getting Started with Apache Spark
  • Lesson 13
  • Lesson 13: Spark SQL and DataFrames
  • Lab Guide: Exploring Spark SQL
  • Lesson 14
  • Lesson 14: Defining Workflow with Oozie
  • Lesson Review
  • Lab Guide: Defining an Oozie Workflow
  • Wrapping Up
  • Course Survey