CMPT 733: Big Data Programming II (SFU, Spring 2022)

Objectives

From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.

Topics

Logistics

Grading

Blog Post

Schedule

Week Date Event Type Description Course Materials
Week 1 Monday
Jan 10
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
[slides]
Monday
Jan 17
A1 Due Assignment #1 Due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Monday
Jan 17
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
[slides]
Monday
Jan 24
A2 Due Assignment #2 Due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Monday
Jan 24
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
[slides]
Monday
Jan 31
A3 Due Assignment #3 Due [Assignment #3]
Week 4 Monday
Jan 31
Lecture 4 Statistics (Part I)
Statistical Thinking
EDA with DataPrep.EDA
Correlation Analysis
Estimation and Bootstrapping
[slides]
Monday
Feb 7
A4 Due Assignment #4 Due [Assignment #4]
Friday
Feb 11
Blog Post Due Blog Post Due [Blog Post Task]
Week 5 Monday
Feb 7
Lecture 5 Practical Machine Learning (Part I)
Case Study: Anomaly Detection
ML Workflow
Feature Transformation and Selection
Hyperparameter Tuning
[slides]
Tuesday
Feb 15
A5 Due Assignment #5 Due
[Assignment #5]
Tuesday
Feb 15
Proposal Due Course Project Proposal Due
Week 6 Reading Break (Feb 14 - 20) Suggestion: Start project work
Work with data sources
Do EDA
Start prototyping
Have a team meeting
Week 7 Monday
Feb 21
Lecture 6 Deep Learning (Part I) [slides 1][slides 2]
Monday
Feb 28
A6 Due Assignment #6 Due
[Assignment #6]
Week 8 Monday
Mar 7
Lecture 7 Data Visualization (Part II)
Design Principles with Case studies
Graph Drawing
[slides 1][slides 2]
Monday
Mar 7
A7 Due Assignment #7 Due
[Assignment #7]
[Graph tutorial] and [notebook]
Week 9 Monday
Mar 7
Lecture 8 Practical Machine Learning (Part II)
Automated Machine Learning (AutoML)
Explainable Machine Learning
[slides]
Thursday
Mar 10
Milestone Presentation Milestone Presentation
Monday
Mar 14
A8 Due Assignment #8 Due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Monday
Mar 14
Lecture 9 Statistics (Part II)
Hypothesis Testing
Causal Inference
[slides 1][slides 2]
Monday
Mar 21
A9 Due Assignment #9 Due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Monday
March 21
Lecture 10 Deep Learning (Part II)
Natural Language Processing
Cloud Computing & AWS

[slides DL II]
[slides NLP] [NLP demo]
[slides AWS]
Monday
March 28
A10 Due Assignment #10 Due
[Assignment #10]
Week 12 Monday
Mar 28
Lecture 11 Presto and openLooKeng
Responsible Data Science

[slides 1][slides 2]
Monday
April 4
A11 Due Assignment #11 Due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13 Monday
April 11
Final Project Presentation Course Project Presentation
Sunday
April 17
Code & Report & Video Due Course Project Code & Report Due

Final Project Showcase

References

 


  © Jiannan Wang 2022