From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a realworld data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve realworld problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the stateoftheart approaches. After taking this course, students should feel confident when being asked to extract value from realworld data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design dataprocessing pipelines, and how to present final data products.
Week  Date  Event Type  Description  Course Materials 
Week 1  Monday Jan 7 
Lecture 1  Course Introduction What/Why Data Science? Data Science Lifecycle Questions that data scientists can answer Course Logistics 
[slides] 
Monday Jan 14 
A1 Due  Assignment #1 Due Web Scraping 
[Assignment #1]  
Week 2  Monday Jan 14 
Lecture 2  Data Preparation Data Collection Data Cleaning Data Integration 
[slides] 
Monday Jan 21 
A2 Due  Assignment #2 Due Entity Resolution 
[Assignment #2]  
Week 3  Monday Jan 21 
Lecture 3  Data Visualization Principles Pitfalls 
[slides] 
Monday Jan 28 
A3 Due  Assignment #3 Due Visualization 
[Assignment #3]  
Week 4  Monday Jan 28 
Lecture 4  Statistics (I) Statistical Thinking Exploratory Data Analysis Bootstrapping 
[slides] 
Monday Feb 4 
A4 Due  Assignment #4 Due EDA and Bootstrap 
[Assignment #4]  
Week 5  Monday Feb 4 
Lecture 5  Practical Machine Learning Feature Selection Crowdsourcing Active Learning Spark MLlib and ML Pipeline 
[slides] 
Monday Feb 11 
A5 Due  Assignment #5 Due 
[Assignment #5]  
Week 6  Monday Feb 11 
Lecture 6  Deep Learning (Part I) Renaissance of neural networks Background Construction and training of layered learners Frameworks for deep learning 
[slides] 
Monday Feb 18 
A6 Due  Assignment #6 Due 
[Assignment #6]  
Week 7  Reading Break (Feb 18  24)  Suggestion: start project work work with data sources do EDA start prototyping have a team meeting 

Week 8  Monday Feb 25 
Lecture 7  Anomaly Detection What is anomaly detection? Clustering and KMeans Feature Scaling Introduction to AWS What is cloud computing? SaaS/PaaS/IaaS AWS and its success 
[slides] [slides] 
Monday Mar 4 
A7 Due  Assignment #7 Due 
[Assignment #7]  
Monday Mar 4 
Milestone Slides Due  Course Project Milestone Slides Due  
Week 9  Monday Mar 4 
In Class Presentation  Course Project Milestone Presentation  
Week 10  Monday Mar 11 
Lecture 8  Statistics (II) Correlation Analysis Hypothesis Testing A/B Testing 
[slides] 
Friday Mar 15 
Blog Post Due  Blog Post Due 
[Blog Post Task]  
Monday Mar 18 
A8 Due  Assignment #8 Due 
[Assignment #8]  
Week 11  Monday Mar 18 
Lecture 9  Deep Learning (Part II): RNNs for sequences, sentiment MLVis 
[slides] 
Monday March 25 
A9 Due  Assignment #9 Due 
[Assignment #9]  
Week 12  Monday March 25 
Lecture 10  Machine Learning Visualization for Big Data Analysis Model explainability, Google Compute, Weather model highlights 
[slides] 
Week 13  Monday Apr 8 
Poster Due  Poster PDF + Printout Due 

Monday Apr 8 
Poster Presentation  Course Project Poster Presentation  
Sunday Apr 14 
Code & Report & Video Due  Course Project Code & Report Due 
© Jiannan Wang 2019