From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a realworld data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve realworld problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the stateoftheart approaches. After taking this course, students should feel confident when being asked to extract value from realworld data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design dataprocessing pipelines, and how to present final data products.
Week  Date  Event Type  Description  Course Materials 
Week 1  Monday Jan 11 
Lecture 1  Course Introduction What/Why Data Science? Data Science Lifecycle Questions that data scientists can answer Course Logistics 
[slides] 
Monday Jan 18 
A1 Due  Assignment #1 Due  [Assignment #1 (Part 1)] [Assignment #1 (Part 2)] 

Week 2  Monday Jan 18 
Lecture 2  Data Preparation Data Collection Data Transformation Data Cleaning Data Integration 
[slides] 
Monday Jan 25 
A2 Due  Assignment #2 Due  [Assignment #2 (Part 1)] [Assignment #2 (Part 2)] 

Week 3  Monday Jan 25 
Lecture 3  Data Visualization (Part I) Introduction to Visualization Exploratory Data Analysis Visualization Principles 
[slides] 
Monday Feb 1 
A3 Due  Assignment #3 Due  [Assignment #3]  
Week 4  Monday Feb 1 
Lecture 4  Statistics (Part I) Statistical Thinking EDA with DataPrep.EDA Correlation Analysis Estimation and Bootstrapping 
[slides] 
Monday Feb 8 
A4 Due  Assignment #4 Due  [Assignment #4]  
Friday Feb 12 
Blog Post Due  Blog Post Due  [Blog Post Task]  
Week 5  Monday Feb 8 
Lecture 5 & Quiz 1  Practical Machine Learning (Part I) Case Study: Anomaly Detection ML Workflow Feature Transformation and Selection Hyperparameter Tuning 
[slides] 
Tuesday Feb 16 
A5 Due  Assignment #5 Due 
[Assignment #5]  
Tuesday Feb 16 
Proposal Due  Course Project Proposal Due 

Week 6  Reading Break (Feb 15  21)  Suggestion: start project work work with data sources do EDA start prototyping have a team meeting 

Week 7  Monday Feb 22 
Lecture 6  Data Visualization (Part II) Design Principles with Case studies Graph Drawing 
[slides 1][slides 2] 
Monday Mar 1 
A6 Due  Assignment #6 Due 
[Assignment #6] [Graph tutorial] and [notebook] 

Week 8  Monday Mar 1 
Lecture 7  Practical Machine Learning (Part II) Automated Machine Learning (AutoML) Explainable Machine Learning 
[slides 1][slides 2] 
Monday Mar 8 
A7 Due  Assignment #7 Due 
[Assignment #7 (Part 1)] [Assignment #7 (Part 2)] 

Week 9  Monday Mar 8 
Lecture 8 & Quiz 2  Deep Learning (Part I)  [slides] 
Thursday Mar 11 
Milestone Presentation  Milestone Presentation  
Monday Mar 15 
A8 Due  Assignment #8 Due  [Assignment #8]  
Week 10  Monday Mar 15 
Lecture 9  Statistics (Part II) Hypothesis Testing Causal Inference 
[slides 1][slides 2] 
Monday Mar 22 
A9 Due  Assignment #9 Due 
[Assignment #9 (Part 1)] [Assignment #9 (Part 2)] 

Week 11  Monday March 22 
Lecture 10  Deep Learning (Part II) Natural Language Processing Cloud Computing & AWS 
[slides DL II] [slides NLP] [NLP demo] [slides AWS] 
Monday March 29 
A10 Due  Assignment #10 Due 
[Assignment #10]  
Week 12  Monday Mar 29 
Lecture 11 & Quiz 3  Presto and openLooKeng Responsible Data Science 
[slides 1][slides 2] 
Monday April 5 
A11 Due  Assignment #11 Due 
[Assignment #11 (Part 1)] [Assignment #11 (Part 2)] 

Week 13  Monday April 12 
Final Project Presentation  Course Project Presentation 

Sunday April 18 
Code & Report & Video Due  Course Project Code & Report Due 
© Jiannan Wang 2021