CMPT 733: Big Data Programming II (SFU, Spring 2023)

Objectives

From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, we find that there is still a gap between what is addressed by machine learning or data engineering and what we are going to do in practice.

The goal of this course is to fill this gap, enabling the students to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.

Topics

Logistics

Grading

Final Project

Blog Post

Schedule

Week Date Event Type Description Course Materials
Week 1 Thursday
Jan 5
Lecture 1 Course Introduction
What/Why Data Science?
Data Science Lifecycle
Questions that data scientists can answer
Course Logistics
[slides]
Thursday
Jan 12/16
A1 due Assignment #1-1 and #1-2 due [Assignment #1 (Part 1)]
[Assignment #1 (Part 2)]
Week 2 Thursday
Jan 12
Lecture 2 Data Preparation
Data Collection
Data Transformation
Data Cleaning
Data Integration
[slides]
Thursday
Jan 23
A2 due Assignment #2 due [Assignment #2 (Part 1)]
[Assignment #2 (Part 2)]
Week 3 Thursday
Jan 19
Lecture 3 Data Visualization (Part I)
Introduction to Visualization
Exploratory Data Analysis
Visualization Principles
[slides]
Tuesday
Jan 31
A3 due Assignment #3 due [Assignment #3]
Week 4 Thursday
Jan 26
Lecture 4 Statistics (Part I)
Statistical Thinking
EDA
[slides]
Monday
Feb 6
A4 due Assignment #4 due [Assignment #4]
Friday
Feb 10
Blog Post due Blog Post due [Blog Post Task]
Week 5 Thursday
Feb 2
Lecture 5 Practical Machine Learning (Part I) [slides]
Tuesday
Feb 14
A5 due Assignment #5 due
[Assignment #5]
Week 6 Thursday
Feb 9
Lecture 6 Deep Learning (Part I) [slides]
Wednesday
Feb 22
A6 due Assignment #6 due
[Assignment #6]
Week 7 Thursday
Feb 16
Lecture 7 Data Visualization (Part II)
Design Principles
Graph Drawing
[slides 1][slides 2]
Friday
Feb 17
Proposal due Course Project Proposal due
Tuesday
Mar 7
A7 due Assignment #7 due
[Assignment #7]
Week 8 Reading Break (Feb 21 - 26) Suggestion: Start final project
Week 9 Thursday
Mar 2
Lecture 8 Practical Machine Learning (Part II)
[slides 1][slides 2]
Monday
Mar 20
A8 due Assignment #8 due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Thursday
Mar 9
Milestone Presentation
Monday
Mar 20
A8 due Assignment #8 due [Assignment #8 (Part 1)]
[Assignment #8 (Part 2)]
Week 10 Thursday
Mar 16
Lecture 9 Statistics (Part II)
[slides 1][slides 2]
Monday
Mar 27
A9 due Assignment #9 due
[Assignment #9 (Part 1)]
[Assignment #9 (Part 2)]
Week 11 Thursday
Mar 23
Lecture 10 Deep learning (Part II),
Natural Language Processing
[slides DL]
[slides NLP]
[NLP notebook]
Monday
Apr 3
A10 due Assignment #10 due
[Assignment #10]
Week 12 Thursday
Mar 30
Lecture 11 Responsible Data Science [slides 1] [slides 2]
Thursday
Mar 30
Monday
Apr 17
A11 due Assignment #11 due
[Assignment #11 (Part 1)]
[Assignment #11 (Part 2)]
Week 13+ Tuesday
Apr 11
Final Project Presentation & Code Course Project Presentation
Wednesday
Apr 12
Report & Video due Course Project Report due

Final Project Showcase

References

 


  © Jiannan Wang & Steven Bergner 2023