CMPT 843: Traditional vs. Modern Database Systems (SFU, Spring 2019)

Motivation

The Big Data movement is attracting an increasing number of new researchers to work on data processing related research. On the other hand, the database community has been thinking about how to address data-processing challenges for over 40 years. Numerous elegant ideas were proposed in the past and many of them are being widely applied in industry. Therefore, there is a high need to educate new researchers to learn classical database knowledge and make sure they can stand on the shoulders of giants rather than reinvent the wheel.

Topics

Because of the purpose above, the course will be divided into two parts.

  1. The first part will guide students to read classical database papers that were published before 2000 on the topics including Data Model, Relational Database Systems, Transaction Management, Query Optimization, Data Warehouse, and Approximate Query Processing.
  2. The second part will be mostly about the papers published in the recent ten years on the topics including MapReduce, Spark, Column Store, NoSQL, NewSQL, and ML over SQL.

Objectives

Through this traditional vs. modern view of data processing, the students should gain a much deeper understanding of the Big Data movement and form their own opinion on what's novel about Big Data systems.

Furthermore, since this is a graduate seminar, another important objective is to train students to master basic skills for being a researcher. The course will create a number of opportunities for students to learn how to read a paper, how to write a paper review, how to give a good research talk, and how to ask questions during a talk?

Logistics

Pre-requisites

Grading

Blog Post

Final Project

Schedule

Date Topic Content Presenter
Th 1/3 Course Objective I Course Introduction and History of Database Systems Jiannan [slides]
Tu 1/8 Course Objective II Essential Skills Needed for a PhD Student (How to read & review a paper? How to give a talk? How to ask questions?) Jiannan [slides]
Part I: Traditional Database Systems and Techniques (before 2000)
Th 1/17 Background 1. Database Systems: Achievements and Opportunities (1990) Lakshayy [slides]
2. The Asilomar Report on Database Research (1998) Inder[slides]
Tu 1/22 Data Model 3. A Relational Model of Data for Large Shared Data Banks (1970)
4. What Goes Around Comes Around (1960-1970, Sec I~IV only)
Arshvir
Mehvish [slides]
Th 1/24 Traditional DBMS 5. A History and Evaluation of System R (1981)
6. The Design of Postgres (1986)
Ankita [slides]
Ruijia [slides]
Tu 1/29 7. An Overview of Data Warehousing and OLAP Technology (1997) Prabhjot [slides]
Th 1/31 Query Optimization 8. Access Path Selection in a Relational Database Management System (1979)
9. The Volcano Optimizer Generator: Extensibility and Efficient Search (1993)
Jill [slides]
Ravi[slides]
Tu 2/5 10. Eddies: Continuously Adaptive Query Processing (2000) Ohoud [slides]
Th 2/7 Transaction Management 11. Granularity of Locks and Degrees of Consistency in a Shared Data Base (1976, Part 1 only)
12. Granularity of Locks and Degrees of Consistency in a Shared Data Base (1976, Part 2 only)
Muhammad [slides]
Kyle [slides]
Tu 2/12 13. On Optimistic Methods for Concurrency Control (1981) kiarash
Th 2/14 Interactive Analytics 14. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-tab, and Sub-totals (1997)
15. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates (1997)
Xiaoying Wang [slides]
Pushkar
Part II: Modern Database Systems and Techniques (After 2000)
Tu 2/26 Background 16. Challenges and Opportunities with Big Data (2011) Jetic [slides]
Column Store 17. C-store: a column-oriented DBMS (2005) Ruijia [slides]
Th 2/28 18. Dremel: Interactive Analysis Of Web-Scale Datasets (2010)
19. Column-Stores vs. Row-Stores: How Different Are They Really? (2012)
Mohan [slides]
Mehvish[slides]
Tu 3/5 MapReduce and Beyond Why Mapreduce & Why Spark Jiannan
Th 3/7 20. MapReduce: Simplified Data Processing on Large Clusters (2003)
21. A Comparison of Approaches to Large-Scale Data Analysis (2009)
Parallel Database
Xiaoying [slides]
Inder[slides]
Tu 3/12 22. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing (2012)
23. Spark SQL: Relational Data Processing in Spark (2015)
Ravi[slides]
Pushkar
Th 3/14 NoSQL OldSQL vs. NoSQL vs NewSQL
24. Bigtable: A Distributed Storage System for Structured Data (2006)
Jiannan
Ankita [slides]
Tu 3/19 25. Dynamo: Amazon's Highly Available Key-Value Store (2007)
26. CAP Twelve Years Later: How the "Rules" Have Changed (2012)
Jill [slides]
Ohoud [slides]
Th 3/21 NewSQL 27. OLTP Through the Looking Glass, and What We Found There (2008)
28. Hekaton: SQL Server's Memory-optimized OLTP Engine (2013)
Lakshayy [slides]
Arshvir
Tu 3/26 29. Efficiently Compiling Efficient Query Plans for Modern Hardware (2011)
30. Scalable SQL and NoSQL data stores (2010)
Kyle
MoHan [slides]
Th 3/28 ML and SQL 31. Accelerating Machine Learning Inference with Probabilistic Predicates (2018)
32. NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale (2017
Jetic [slides]
Prabhjot [slides]
Tu 4/2 33. Learning to Optimize Join Queries With Deep Reinforcement Learning (2018)
34. The Case for Learned Index Structures (2018)
Muhammad [slides]
Ruijia [slides]
Final Project
W 4/10 Final Project Final Project Poster Session Groups

References

 


  © Jiannan Wang 2019