CMPT 733: Big Data Programming II (SFU, Spring 2022)


From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.




Final Project

Blog Post


Week Date Event Type Description Course Materials
Week 1 Monday
Jan 10
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
Jan 17
A1 Due Assignment #1 Due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Monday
Jan 17
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
Jan 24
A2 Due Assignment #2 Due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Monday
Jan 24
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
Jan 31
A3 Due Assignment #3 Due [Assignment #3]
Week 4 Monday
Jan 31
Lecture 4 Statistics (Part I)
Statistical Thinking
EDA with DataPrep.EDA
Correlation Analysis
Estimation and Bootstrapping
Feb 7
A4 Due Assignment #4 Due [Assignment #4]
Feb 11
Blog Post Due Blog Post Due [Blog Post Task]
Week 5 Monday
Feb 7
Lecture 5 Practical Machine Learning (Part I)
Case Study: Anomaly Detection
ML Workflow
Feature Transformation and Selection
Hyperparameter Tuning
Feb 15
A5 Due Assignment #5 Due
[Assignment #5]
Week 6 Monday
Feb 14
Lecture 6 Deep Learning (Part I) [slides]
Feb 15
Proposal Due Course Project Proposal Due
Feb 21
A6 Due Assignment #6 Due
[Assignment #6]
Week 7 Reading Break (Feb 21 - 27) Suggestion: Start project work
Work with data sources
Start prototyping
Have a team meeting
Week 8 Monday
Feb 28
Lecture 7 Data Visualization (Part II)
Design Principles with Case studies
Graph Drawing
[slides 1][slides 2]
Mar 7
A7 Due Assignment #7 Due
[Assignment #7]
[Graph tutorial] and [notebook]
Week 9 Monday
Mar 7
Lecture 8 Practical Machine Learning (Part II)
Automated Machine Learning (AutoML)
Explainable Machine Learning
[slides 1][slides 2]
Mar 10
Milestone Presentation Milestone Presentation
Mar 14
A8 Due Assignment #8 Due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Monday
Mar 14
Lecture 9 Statistics (Part II)
Hypothesis Testing
Causal Inference
[slides 1][slides 2]
Mar 21
A9 Due Assignment #9 Due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Monday
March 21
Lecture 10 Deep Learning (Part II)
Natural Language Processing
Cloud Computing & AWS

[slides DL II]
[slides NLP] [NLP demo]
March 31
A10 Due Assignment #10 Due
[Assignment #10]
Week 12 Monday
Mar 28
Lecture 11 Presto and openLooKeng
Responsible Data Science

[slides 1][slides 2]
April 4
A11 Due Assignment #11 Due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13 Monday
April 11
Final Project Presentation & Code Course Project Presentation
April 14
Report & Video Due Course Project Report Due

Final Project Showcase



  © Jiannan Wang 2022