Professional Data Science Course

CMPT 733: Big Data Programming II (SFU, Spring 2023)

Objectives

From CMPT 726 and CMPT 732, students have learned machine learning algorithms and big data programming tools. However, when facing a real-world data problem, we find that there is still a gap between what is addressed by machine learning or data engineering and what we are going to do in practice.

The goal of this course is to fill this gap, enabling the students to apply what they have learned to solve real-world problems. To achieve this goal, our course will cover a set of important topics that a data scientist should know, and teach students about state-of-the-art approaches. After taking this course, students should feel confident when being asked to extract value from real-world data sets, and know how to ask interesting questions about data, how to choose proper tools, how to design data-processing pipelines, and how to present final data products.

Topics

Introduction to Data Science
Data Preparation
Visualization
Statistics
Deep Learning
Practical Machine Learning (AutoML, Explainable AI, Feature Engineering)
Anomaly Detection
Cloud Computing
Responsible Data Science
Communication

Logistics

Instructor: Steven Bergner
TAs: Hiren Naresh Bangani, Gonick Nalwa, Aditya Bhadreshkumar Panchal
Lectures: Thu 2:30 PM - 4:20 PM (AQ3153)
Lab G101: Tue 1:30 PM - 5:20 PM (SECB1010)
Lab G103: Thu 9:30 PM - 1:20 PM (SECB1010)

Grading

Assignments: 11 × 3% = 33%
Blog Post: 20%
Final Project: 47% (2% proposal + 15% milestone + 15% final presentation + 15% code&report&video)

Final Project

Blog Post

Schedule

Week	Date	Event Type	Description	Course Materials
Week 1	Thursday Jan 5	Lecture 1	Course Introduction What/Why Data Science? Data Science Lifecycle Questions that data scientists can answer Course Logistics	[slides]
Week 1	Thursday Jan 12/16	A1 due	Assignment #1-1 and #1-2 due	[Assignment #1 (Part 1)] [Assignment #1 (Part 2)]
Week 2	Thursday Jan 12	Lecture 2	Data Preparation Data Collection Data Transformation Data Cleaning Data Integration	[slides]
Week 2	Thursday Jan 23	A2 due	Assignment #2 due	[Assignment #2 (Part 1)] [Assignment #2 (Part 2)]
Week 3	Thursday Jan 19	Lecture 3	Data Visualization (Part I) Introduction to Visualization Exploratory Data Analysis Visualization Principles	[slides]
Week 3	Tuesday Jan 31	A3 due	Assignment #3 due	[Assignment #3]
Week 4	Thursday Jan 26	Lecture 4	Statistics (Part I) Statistical Thinking EDA	[slides]
	Monday Feb 6	A4 due	Assignment #4 due	[Assignment #4]
	Friday Feb 10	Blog Post due	Blog Post due	[Blog Post Task]
Week 5	Thursday Feb 2	Lecture 5	Practical Machine Learning (Part I)	[slides]
Week 5	Tuesday Feb 14	A5 due	Assignment #5 due	[Assignment #5]
Week 6	Thursday Feb 9	Lecture 6	Deep Learning (Part I)	[slides]
Week 6	Wednesday Feb 22	A6 due	Assignment #6 due	[Assignment #6]
Week 7	Thursday Feb 16	Lecture 7	Data Visualization (Part II) Design Principles Graph Drawing	[slides 1][slides 2]
Week 7	Friday Feb 17	Proposal due	Course Project Proposal due
	Tuesday Mar 7	A7 due	Assignment #7 due	[Assignment #7]
Week 8	Reading Break (Feb 21 - 26)		Suggestion: Start final project
Week 9	Thursday Mar 2	Lecture 8	Practical Machine Learning (Part II)	[slides 1][slides 2]
Week 9	Monday Mar 20	A8 due	Assignment #8 due	[Assignment #8 (Part 1)] [Assignment #8 (Part 2)]
Week 10	Thursday Mar 9	Milestone Presentation
Monday Mar 20	A8 due	Assignment #8 due	[Assignment #8 (Part 1)] [Assignment #8 (Part 2)]
Week 10	Thursday Mar 16	Lecture 9	Statistics (Part II)	[slides 1][slides 2]
Week 10	Monday Mar 27	A9 due	Assignment #9 due	[Assignment #9 (Part 1)] [Assignment #9 (Part 2)]
Week 11	Thursday Mar 23	Lecture 10	Deep learning (Part II), Natural Language Processing	[slides DL] [slides NLP] [NLP notebook]
Week 11	Monday Apr 3	A10 due	Assignment #10 due	[Assignment #10]
Week 12	Thursday Mar 30	Lecture 11	Responsible Data Science	[slides 1] [slides 2]
Week 12	Thursday Mar 30 Monday Apr 17	A11 due	Assignment #11 due	[Assignment #11 (Part 1)] [Assignment #11 (Part 2)]
Week 13+	Tuesday Apr 11	Final Project Presentation & Code	Course Project Presentation
Week 13+	Wednesday Apr 12	Report & Video due	Course Project Report due