This course is designed for students who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Map-Reduce, HDFS, HBase, Hadoop, data ingestion, workflow definition and using Pig and Hive to perform data analytics on Big Data. Labs are executed on a Hortonworks Sandbox and AWS or windows azure. Students will get to attend multiple field trips during the program to different local companies to see how big data is utilized.
This course is prepared as an introduction to Big Data programming paradigms for senior undergrad students. There will be five lectures (two hours each) and five labs (two hours each lab) with quizzes and mid and final light exams. First, Students will explore and understand Map-Reduce, which is the fundamental programming paradigm supported by Hadoop for processing large data sets. Although, standard Map-Reduce gives a very deep level of controlling the way we want to process our data, but it takes hundreds of lines to write a very small program which could take a long time, is error-prone and needs experienced algorithmic programmers. Thus, students will learn Pig and Hive which are high-level platforms designed by Yahoo and Facebook to wrapper Map-Reduce programs used with Hadoop in order to reduce time and increase coding flexibility. Second, Students will learn Hadoop’s data storage which are HDFS (a distributed file system that distributes data across a cluster of machines taking care of redundancy etc.) and HBase (column based database). Finally, Students are going to write small programs using Hadoop (Hortonworks Sandbox and on cloud using either Amazon Web Services (AWS) or windows azure) to analysis, refine and visualize real data.
Upon completing this course, the students will be able to design and implement Map-Reduce programs for various large data set processing tasks using either native Map-Reduce, Pig Latin script, or Hive HQL, and will be able to design data schema using HBase and HDFS. In addition, Students will be able to use Cloud Computing platforms such as Azure and AWS.
Undergrad students who need to understand and develop applications for Hadoop.
|First Day: Lecture(1) Time(2:00 pm- 4:00 pm) Room(228)||Introduction:What is Big Data, Big Data attributes, Sources of Big Data, Why Big Data now? Definition and Characteristics of Big Data, Big Data Projects That Could Impact Your Life, Why is Big Data needed? What are the challenges for processing big data? Apache Hadoop, Hadoop-related Apache Projects, RDBMS vs Hadoop, when to use and when not to use Hadoop, Big Data analyzing, and the home of the U.S. Government’s open data.|
|Second Day: Lecture(2) Time (2:00 pm- 4:00 pm) Room (228)||
|Third Day: Lecture (3) Time (2:00 pm- 4:00 pm ) Room(228)||
|Fourth Day: Lecture (4) Time(2:00 pm- 4:00 pm) Room(228)||
|Fifth Day: Lecture (5) Time(2:00 pm- 4:00 pm) Room(228)||
|Sixth Day: Lab (1) Time (2:00 pm- 4:00 pm) Lab(119)||
|Seventh Day: Lab (2) Time(2:00 pm- 4:00 pm) Lab(119)||
|Eighth Day: Lab(3) Time(2:00 pm- 4:00 pm) Lab(119)||
|Ninth Day: Lab(4) Time(2:00 pm- 4:00 pm) Lab(119)||
|Tenth Day: Lab(5) Time(2:00 pm- 4:00 pm) Lab(119)||
The course lectures and labs will be posted online on the Schoology website or KSU Blackboard. Thus, all students are required to have accounts in this website prior classes.
Students should be familiar with programming principles and have idea about Data Structures and databases. SQL knowledge is also helpful. No prior Hadoop knowledge is required.
Name: Salem Othman
Plagiarism of any time will not be tolerated. It will be dealt with in accordance to Kent State University's policy on cheating and plagiarism described in the student handbook.
Students will get to visit the following places once a week: