CMPT 733: Big Data Programming II (SFU, Spring 2019)

Objectives

From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, the students will find that there is still a gap between what they have learned in class and what they are going to do in practice. The goal of this course is to fill this gap, making the students be able to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about the state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.

Topics

Logistics

Grading

Final Project

Blog Post

Schedule

Week Date Event Type Description Course Materials
Week 1 Monday
Jan 7
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
[slides]
Monday
Jan 14
A1 Due Assignment #1 Due
Web Scraping
[Assignment #1]
Week 2 Monday
Jan 14
Lecture 2 Data Preparation
Data Collection
Data Cleaning
Data Integration
[slides]
Monday
Jan 21
A2 Due Assignment #2 Due
Entity Resolution
[Assignment #2]
Week 3 Monday
Jan 21
Lecture 3 Data Visualization
Principles
Pitfalls
[slides]
Monday
Jan 28
A3 Due Assignment #3 Due
Visualization
[Assignment #3]
Week 4 Monday
Jan 28
Lecture 4 Statistics (I)
Statistical Thinking
Exploratory Data Analysis
Bootstrapping
[slides]
Monday
Feb 4
A4 Due Assignment #4 Due
EDA and Bootstrap
[Assignment #4]
Week 5 Monday
Feb 4
Lecture 5 Practical Machine Learning
Feature Selection
Crowdsourcing
Active Learning
Spark MLlib and ML Pipeline
[slides]
Monday
Feb 11
A5 Due Assignment #5 Due
[Assignment #5]
Week 6 Monday
Feb 11
Lecture 6 Deep Learning (Part I)
Renaissance of neural networks
Background
Construction and training of layered learners
Frameworks for deep learning
[slides]
Monday
Feb 18
A6 Due Assignment #6 Due
[Assignment #6]
Week 7 Reading Break (Feb 18 - 24) Suggestion: start project work
work with data sources
do EDA
start prototyping
have a team meeting
Week 8 Monday
Feb 25
Lecture 7 Anomaly Detection
What is anomaly detection?
Clustering and K-Means
Feature Scaling
Introduction to AWS
What is cloud computing?
SaaS/PaaS/IaaS
AWS and its success
[slides]



[slides]
Monday
Mar 4
A7 Due Assignment #7 Due
[Assignment #7]
Monday
Mar 4
Milestone Slides Due Course Project Milestone Slides Due
Week 9 Monday
Mar 4
In Class Presentation Course Project Milestone Presentation
Week 10 Monday
Mar 11
Lecture 8 Statistics (II)
Correlation Analysis
Hypothesis Testing
A/B Testing
[slides]
Friday
Mar 15
Blog Post Due Blog Post Due
[Blog Post Task]
Monday
Mar 18
A8 Due Assignment #8 Due
[Assignment #8]
Week 11 Monday
Mar 18
Lecture 9 Deep Learning (Part II): RNNs for sequences, sentiment
ML-Vis

[slides]
Monday
March 25
A9 Due Assignment #9 Due
[Assignment #9]
Week 12 Monday
March 25
Lecture 10 Machine Learning Visualization for Big Data Analysis
Model explainability, Google Compute, Weather model highlights
[slides]
Week 13 Monday
Apr 8
Poster Due Poster PDF + Printout Due
Monday
Apr 8
Poster Presentation Course Project Poster Presentation
Sunday
Apr 14
Code & Report & Video Due Course Project Code & Report Due

Final Project Showcase

References

 


  © Jiannan Wang 2019