Traditional vs. Modern Database Systems

CMPT 843: Traditional vs. Modern Database Systems (SFU, Spring 2019)

Motivation

The Big Data movement is attracting an increasing number of new researchers to work on data processing related research. On the other hand, the database community has been thinking about how to address data-processing challenges for over 40 years. Numerous elegant ideas were proposed in the past and many of them are being widely applied in industry. Therefore, there is a high need to educate new researchers to learn classical database knowledge and make sure they can stand on the shoulders of giants rather than reinvent the wheel.

Topics

Because of the purpose above, the course will be divided into two parts.

The first part will guide students to read classical database papers that were published before 2000 on the topics including Data Model, Relational Database Systems, Transaction Management, Query Optimization, Data Warehouse, and Approximate Query Processing.
The second part will be mostly about the papers published in the recent ten years on the topics including MapReduce, Spark, Column Store, NoSQL, NewSQL, and ML over SQL.

Objectives

Through this traditional vs. modern view of data processing, the students should gain a much deeper understanding of the Big Data movement and form their own opinion on what's novel about Big Data systems.

Furthermore, since this is a graduate seminar, another important objective is to train students to master basic skills for being a researcher. The course will create a number of opportunities for students to learn how to read a paper, how to write a paper review, how to give a good research talk, and how to ask questions during a talk?

Logistics

Instructor: Jiannan Wang
Time: Tuesday, 1:00pm - 2:20pm; Thursday, 1:00pm - 2:20pm
Location: AQ5037
Office Hours: By appointment. E-mail me to book a slot

Pre-requisites

Students should have taken undergraduate Database Systems courses (See CMPT 354 and CMPT 454 course outlines)

Grading

Paper Presentation: 20%
Questions: 10%
Paper Review: 20%
Participation: 10%
Blog Post: 10%
Final Project: 30% (2% plan + 14% poster + 14% report)

Blog Post

Understand Database Systems in Simple Ways

Final Project

Big Data Systems

Schedule

If you are a speaker, please see this doc about how to upload your slides after the presentation.
If you ask any question in the Q/A sessions, please write down the questions in this form (one question per row). You need to ask 20 questions in total (0.5 points per question) in order to get the full mark on Questions

Date	Topic	Content	Presenter
Th 1/3	Course Objective I	Course Introduction and History of Database Systems	Jiannan [slides]
Tu 1/8	Course Objective II	Essential Skills Needed for a PhD Student (How to read & review a paper? How to give a talk? How to ask questions?)	Jiannan [slides]
Part I: Traditional Database Systems and Techniques (before 2000)
Th 1/17	Background	1. Database Systems: Achievements and Opportunities (1990)	Lakshayy [slides]
Th 1/17	Background	2. The Asilomar Report on Database Research (1998)	Inder[slides]
Tu 1/22	Data Model	3. A Relational Model of Data for Large Shared Data Banks (1970) 4. What Goes Around Comes Around (1960-1970, Sec I~IV only)	Arshvir Mehvish [slides]
Th 1/24	Traditional DBMS	5. A History and Evaluation of System R (1981) 6. The Design of Postgres (1986)	Ankita [slides] Ruijia [slides]
Tu 1/29	Traditional DBMS	7. An Overview of Data Warehousing and OLAP Technology (1997)	Prabhjot [slides]
Th 1/31	Query Optimization	8. Access Path Selection in a Relational Database Management System (1979) 9. The Volcano Optimizer Generator: Extensibility and Efficient Search (1993)	Jill [slides] Ravi[slides]
Tu 2/5	Query Optimization	10. Eddies: Continuously Adaptive Query Processing (2000)	Ohoud [slides]
Th 2/7	Transaction Management	11. Granularity of Locks and Degrees of Consistency in a Shared Data Base (1976, Part 1 only) 12. Granularity of Locks and Degrees of Consistency in a Shared Data Base (1976, Part 2 only)	Muhammad [slides] Kyle [slides]
Tu 2/12	Transaction Management	13. On Optimistic Methods for Concurrency Control (1981)	kiarash
Th 2/14	Interactive Analytics	14. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-tab, and Sub-totals (1997) 15. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates (1997)	Xiaoying Wang [slides] Pushkar
Part II: Modern Database Systems and Techniques (After 2000)
Tu 2/26	Background	16. Challenges and Opportunities with Big Data (2011)	Jetic [slides]
Tu 2/26	Column Store	17. C-store: a column-oriented DBMS (2005)	Ruijia [slides]
Th 2/28	Column Store	18. Dremel: Interactive Analysis Of Web-Scale Datasets (2010) 19. Column-Stores vs. Row-Stores: How Different Are They Really? (2012)	Mohan [slides] Mehvish[slides]
Tu 3/5	MapReduce and Beyond	Why Mapreduce & Why Spark	Jiannan
Th 3/7		20. MapReduce: Simplified Data Processing on Large Clusters (2003) 21. A Comparison of Approaches to Large-Scale Data Analysis (2009) Parallel Database	Xiaoying [slides] Inder[slides]
Tu 3/12		22. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing (2012) 23. Spark SQL: Relational Data Processing in Spark (2015)	Ravi[slides] Pushkar
Th 3/14	NoSQL	OldSQL vs. NoSQL vs NewSQL 24. Bigtable: A Distributed Storage System for Structured Data (2006)	Jiannan Ankita [slides]
Tu 3/19	NoSQL	25. Dynamo: Amazon's Highly Available Key-Value Store (2007) 26. CAP Twelve Years Later: How the "Rules" Have Changed (2012)	Jill [slides] Ohoud [slides]
Th 3/21	NewSQL	27. OLTP Through the Looking Glass, and What We Found There (2008) 28. Hekaton: SQL Server's Memory-optimized OLTP Engine (2013)	Lakshayy [slides] Arshvir
Tu 3/26	NewSQL	29. Efficiently Compiling Efficient Query Plans for Modern Hardware (2011) 30. Scalable SQL and NoSQL data stores (2010)	Kyle MoHan [slides]
Th 3/28	ML and SQL	31. Accelerating Machine Learning Inference with Probabilistic Predicates (2018) 32. NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale (2017	Jetic [slides] Prabhjot [slides]
Tu 4/2	ML and SQL	33. Learning to Optimize Join Queries With Deep Reinforcement Learning (2018) 34. The Case for Learned Index Structures (2018)	Muhammad [slides] Ruijia [slides]
Final Project
W 4/10	Final Project	Final Project Poster Session	Groups