CMPT 733: Big Data Programming II (SFU, Spring 2021)


From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.




Final Project

Blog Post


Week Date Event Type Description Course Materials
Week 1 Monday
Jan 11
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
Jan 18
A1 Due Assignment #1 Due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Monday
Jan 18
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
Jan 25
A2 Due Assignment #2 Due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Monday
Jan 25
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
Feb 1
A3 Due Assignment #3 Due [Assignment #3]
Week 4 Monday
Feb 1
Lecture 4 Statistics (Part I)
Statistical Thinking
EDA with DataPrep.EDA
Correlation Analysis
Estimation and Bootstrapping
Feb 8
A4 Due Assignment #4 Due [Assignment #4]
Feb 12
Blog Post Due Blog Post Due [Blog Post Task]
Week 5 Monday
Feb 8
Lecture 5 & Quiz 1 Practical Machine Learning (Part I)
Case Study: Anomaly Detection
ML Workflow
Feature Transformation and Selection
Hyperparameter Tuning
Feb 16
A5 Due Assignment #5 Due
[Assignment #5]
Feb 16
Proposal Due Course Project Proposal Due
Week 6 Reading Break (Feb 15 - 21) Suggestion: start project work
work with data sources
do EDA
start prototyping
have a team meeting
Week 7 Monday
Feb 22
Lecture 6 Data Visualization (Part II)
Design Principles with Case studies
Graph Drawing
[slides 1][slides 2]
Mar 1
A6 Due Assignment #6 Due
[Assignment #6]
[Graph tutorial] and [notebook]
Week 8 Monday
Mar 1
Lecture 7 Practical Machine Learning (Part II)
Automated Machine Learning (AutoML)
Explainable Machine Learning
[slides 1][slides 2]
Mar 8
A7 Due Assignment #7 Due
[Assignment #7 (Part 1)]
[Assignment #7 (Part 2)]
Week 9 Monday
Mar 8
Lecture 8 & Quiz 2 Deep Learning (Part I) [slides]
Mar 11
Milestone Presentation Milestone Presentation
Mar 15
A8 Due Assignment #8 Due [Assignment #8]
Week 10 Monday
Mar 15
Lecture 9 Statistics (Part II)
Hypothesis Testing
Causal Inference
[slides 1][slides 2]
Mar 22
A9 Due Assignment #9 Due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Monday
March 22
Lecture 10 Deep Learning (Part II)
Natural Language Processing
Cloud Computing & AWS

[slides DL II]
[slides NLP] [NLP demo]
[slides AWS]
March 29
A10 Due Assignment #10 Due
[Assignment #10]
Week 12 Monday
Mar 29
Lecture 11 & Quiz 3 Presto and openLooKeng
Responsible Data Science

[slides 1][slides 2]
April 5
A11 Due Assignment #11 Due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13 Monday
April 12
Final Project Presentation Course Project Presentation
April 18
Code & Report & Video Due Course Project Code & Report Due

Final Project Showcase



  © Jiannan Wang 2021