From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a realworld data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve realworld problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the stateoftheart approaches. After taking this course, students should feel confident when being asked to extract value from realworld data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design dataprocessing pipelines, and how to present final data products.
Week  Date  Event Type  Description  Course Materials 
Week 1  Monday Jan 10 
Lecture 1  Course Introduction What/Why Data Science? Data Science Lifecycle Questions that data scientists can answer Course Logistics 
[slides] 
Monday Jan 17 
A1 Due  Assignment #1 Due  [Assignment #1 (Part 1)] [Assignment #1 (Part 2)] 

Week 2  Monday Jan 17 
Lecture 2  Data Preparation Data Collection Data Transformation Data Cleaning Data Integration 
[slides] 
Monday Jan 24 
A2 Due  Assignment #2 Due  [Assignment #2 (Part 1)] [Assignment #2 (Part 2)] 

Week 3  Monday Jan 24 
Lecture 3  Data Visualization (Part I) Introduction to Visualization Exploratory Data Analysis Visualization Principles 
[slides] 
Monday Jan 31 
A3 Due  Assignment #3 Due  [Assignment #3]  
Week 4  Monday Jan 31 
Lecture 4  Statistics (Part I) Statistical Thinking EDA with DataPrep.EDA Correlation Analysis Estimation and Bootstrapping 
[slides] 
Monday Feb 7 
A4 Due  Assignment #4 Due  [Assignment #4]  
Friday Feb 11 
Blog Post Due  Blog Post Due  [Blog Post Task]  
Week 5  Monday Feb 7 
Lecture 5  Practical Machine Learning (Part I) Case Study: Anomaly Detection ML Workflow Feature Transformation and Selection Hyperparameter Tuning 
[slides] 
Tuesday Feb 15 
A5 Due  Assignment #5 Due 
[Assignment #5]  
Week 6  Monday Feb 14 
Lecture 6  Deep Learning (Part I)  [slides] 
Tuesday Feb 15 
Proposal Due  Course Project Proposal Due 

Monday Feb 21 
A6 Due  Assignment #6 Due 
[Assignment #6]  
Week 7  Reading Break (Feb 21  27)  Suggestion: Start project work Work with data sources Do EDA Start prototyping Have a team meeting 

Week 8  Monday Feb 28 
Lecture 7  Data Visualization (Part II) Design Principles with Case studies Graph Drawing 
[slides 1][slides 2] 
Monday Mar 7 
A7 Due  Assignment #7 Due 
[Assignment #7] [Graph tutorial] and [notebook] 

Week 9  Monday Mar 7 
Lecture 8  Practical Machine Learning (Part II) Automated Machine Learning (AutoML) Explainable Machine Learning 
[slides 1][slides 2] 
Thursday Mar 10 
Milestone Presentation  Milestone Presentation  
Monday Mar 14 
A8 Due  Assignment #8 Due  [Assignment #8 (Part 1)] [Assignment #8 (Part 2)] 

Week 10  Monday Mar 14 
Lecture 9  Statistics (Part II) Hypothesis Testing Causal Inference 
[slides 1][slides 2] 
Monday Mar 21 
A9 Due  Assignment #9 Due 
[Assignment #9 (Part 1)] [Assignment #9 (Part 2)] 

Week 11  Monday March 21 
Lecture 10  Deep Learning (Part II) Natural Language Processing Cloud Computing & AWS 
[slides DL II] [slides NLP] [NLP demo] 
Thursday March 31 
A10 Due  Assignment #10 Due 
[Assignment #10]  
Week 12  Monday Mar 28 
Lecture 11  Presto and openLooKeng Responsible Data Science 
[slides 1][slides 2] 
Monday April 4 
A11 Due  Assignment #11 Due 
[Assignment #11 (Part 1)] [Assignment #11 (Part 2)] 

Week 13  Monday April 11 
Final Project Presentation & Code  Course Project Presentation 

Thursday April 14 
Report & Video Due  Course Project Report Due 
© Jiannan Wang 2022