Course Content
Introduction to Big Data Analytics, Characteristics of Big Data and Dimensions of Scalability, Data Science: Getting Value out of Big Data, Foundations for Big Data Systems and Programming, Big Data Platforms, Data Store & Processing using Hadoop, Big Data Storage and Analytics, Big Data Analytics ML Algorithms, Recommendation, Clustering, and Classification, Linked Big Data: Graph Computing and Graph Analytics, Handling Streaming Data, Graphical Models and Bayesian Networks, Big Data Visualization, Cognitive Mobile Analytics, Introduction to SQL in Big Data and HiveQL.
Objectives
- Understand the Big Data Landscape, its challenges, and its technological stack
- Develop a Big Data common-sense to identify the need for specific technologies
- Gain practical experience by experimenting with relevant technologies
Info
Lecture and practice slots are on Zoom
- Wednesday 12:15 – 14:00 and (log into courses to see link)
- Friday 14:15 – 16:00 (log into courses to see link)
Syllabus
- Introduction to Big Data
- Deployment Models
- Taming Data Volume with Apache Spark
- Taming Data Velocity with Spark Structured Streaming
- Taming Data Variety with GraphFrames
- Gaining Value with Spark MLlib
Grading
- 80% on mini projects (20% each, 15% the deliverables, 5% presentation)
- 20% mid-term MCQ (there is a bonus grade (take the best two))
- 10% on labs (submit at least 50%)
NOTE!!: The lecturers reserve the right to call for an individual interview that can impact the final grade.
Textbooks:
- Big Data: Principles and Best Practices of Scalable Real-Time Data Systems by Nathan Marz And James Warren 2015.
- Big Data for Beginners: Understanding SMART Big Data, Data Mining & Data Analytics for Improved Business Performance, Life Decisions & More! By Vince Reynolds 2016. * A. Rajaraman, J. Leskovec, and J. D. Ullman – Mining of Massive Datasets, 1st Edition, 2011.
Reference Books:
- Dirk deRoos, Paul C. Zikopoulos, Roman B. Melynk, Bruce Brown, Rafael Coss: Hadoop for Dummies Applications (1st Edition) 2014.
- Big Data and Analytics by Seema Acharya and Subhashini Chellappan 2015.