CMPT 733: Big Data Programming II (SFU, Spring 2021)

Objectives

From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.

Topics

Logistics

Grading

Final Project

Blog Post

Schedule

Week Date Event Type Description Course Materials
Week 1 Monday
Jan 11
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
[slides]
Monday
Jan 18
A1 Due Assignment #1 Due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Monday
Jan 18
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
[slides]
Monday
Jan 25
A2 Due Assignment #2 Due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Monday
Jan 25
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
[slides]
Monday
Feb 1
A3 Due Assignment #3 Due [Assignment #3]
Week 4 Monday
Feb 1
Lecture 4 Statistics (Part I)
Statistical Thinking
EDA with DataPrep.EDA
Correlation Analysis
Estimation and Bootstrapping
[slides]
Monday
Feb 8
A4 Due Assignment #4 Due [Assignment #4]
Friday
Feb 12
Blog Post Due Blog Post Due [Blog Post Task]
Week 5 Monday
Feb 8
Lecture 5 & Quiz 1 Practical Machine Learning (Part I)
Case Study: Anomaly Detection
ML Workflow
Feature Transformation and Selection
Hyperparameter Tuning
[slides]
Tuesday
Feb 16
A5 Due Assignment #5 Due
[Assignment #5]
Tuesday
Feb 16
Proposal Due Course Project Proposal Due
Week 6 Reading Break (Feb 15 - 21) Suggestion: start project work
work with data sources
do EDA
start prototyping
have a team meeting
Week 7 Monday
Feb 22
Lecture 6 Data Visualization (Part II)
Design Principles with Case studies
Graph Drawing
[slides 1][slides 2]
Monday
Mar 1
A6 Due Assignment #6 Due
[Assignment #6]
[Graph tutorial] and [notebook]
Week 8 Monday
Mar 1
Lecture 7 Practical Machine Learning (Part II)
Automated Machine Learning (AutoML)
Explainable Machine Learning
[slides 1][slides 2]
Monday
Mar 8
A7 Due Assignment #7 Due
[Assignment #7 (Part 1)]
[Assignment #7 (Part 2)]
Week 9 Monday
Mar 8
Lecture 8 & Quiz 2 Deep Learning (Part I) [slides]
Thursday
Mar 11
Milestone Presentation Milestone Presentation
Monday
Mar 15
A8 Due Assignment #8 Due [Assignment #8]
Week 10 Monday
Mar 15
Lecture 9 Statistics (Part II)
Hypothesis Testing
Causal Inference
[slides 1][slides 2]
Monday
Mar 22
A9 Due Assignment #9 Due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Monday
March 22
Lecture 10 Deep Learning (Part II)
Natural Language Processing
Cloud Computing & AWS

[slides DL II]
[slides NLP] [NLP demo]
[slides AWS]
Monday
March 29
A10 Due Assignment #10 Due
[Assignment #10]
Week 12 Monday
Mar 29
Lecture 11 & Quiz 3 Presto and openLooKeng
Responsible Data Science

[slides 1][slides 2]
Monday
April 5
A11 Due Assignment #11 Due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13 Monday
April 12
Final Project Presentation Course Project Presentation
Sunday
April 18
Code & Report & Video Due Course Project Code & Report Due

Final Project Showcase

References

 


  © Jiannan Wang 2021