CMPT 733: Big Data Programming II (SFU, Spring 2023)


From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, we find that there is still a gap between what is addressed by machine learning or data engineering and what we are going to do in practice.

The goal of this course is to fill this gap, enabling the students to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.




Final Project

Blog Post


Week Date Event Type Description Course Materials
Week 1 Thursday
Jan 5
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
Jan 12/16
A1 due Assignment #1-1 and #1-2 due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Thursday
Jan 12
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
Jan 23
A2 due Assignment #2 due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Thursday
Jan 19
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
Jan 31
A3 due Assignment #3 due [Assignment #3]
Week 4 Thursday
Jan 26
Lecture 4 Statistics (Part I)
Statistical Thinking
Feb 6
A4 due Assignment #4 due [Assignment #4]
Feb 10
Blog Post due Blog Post due [Blog Post Task]
Week 5 Thursday
Feb 2
Lecture 5 Practical Machine Learning (Part I) [slides]
Feb 14
A5 due Assignment #5 due
[Assignment #5]
Week 6 Thursday
Feb 9
Lecture 6 Deep Learning (Part I) [slides]
Feb 22
A6 due Assignment #6 due
[Assignment #6]
Week 7 Thursday
Feb 16
Lecture 7 Data Visualization (Part II)
Design Principles
Graph Drawing
[slides 1][slides 2]
Feb 17
Proposal due Course Project Proposal due
Mar 7
A7 due Assignment #7 due
[Assignment #7]
Week 8 Reading Break (Feb 21 - 26) Suggestion: Start final project
Week 9 Thursday
Mar 2
Lecture 8 Practical Machine Learning (Part II)
[slides 1][slides 2]
Mar 20
A8 due Assignment #8 due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Thursday
Mar 9
Milestone Presentation
Mar 20
A8 due Assignment #8 due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Thursday
Mar 16
Lecture 9 Statistics (Part II)
[slides 1][slides 2]
Mar 27
A9 due Assignment #9 due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Thursday
Mar 23
Lecture 10 Deep learning (Part II),
Natural Language Processing
[slides DL]
[slides NLP]
[NLP notebook]
Apr 3
A10 due Assignment #10 due
[Assignment #10]
Week 12 Thursday
Mar 30
Lecture 11 Responsible Data Science [slides 1] [slides 2]
Mar 30
Apr 17
A11 due Assignment #11 due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13+ Tuesday
Apr 11
Final Project Presentation & Code Course Project Presentation
Apr 12
Report & Video due Course Project Report due

Final Project Showcase



  © Jiannan Wang & Steven Bergner 2023