CMPT 884: Human-in-the-loop Data Management (SFU, Fall 2016)

In the Big Data era, humans are playing an increasingly important role in almost every phase of data management. For example, human can be treated as Data Producer to generate data (e.g., Twitter); human can be treated as Data Processor to annotate/process data (e.g., Amazon MTurk); human can be treated as Data Scientist to analyze data in an interactive way (e.g., Jupyter, Spark); human can be treated as Data Consumer to benefit from the value extracted from data (e.g., Business/Healthcare Intelligence).

Because of this, human-in-the-loop data management has recently become a very hot research topic in numerous research fields including Database, Machine Learning, HCI, and Visualization. In this course, we will focus on the recent research progress that treats human as Data Processor or Data Scientist.

This graduate seminar has two objectives:

  1. Introducing students the cutting-edge research on Human-in-the-loop Data Management;
  2. Training students to master basic skills for being a researcher.
To achieve the first objective, the course will select a list of papers on different aspects of the topic, and the students will present each paper and lead a discussion in class. To achieve the second objective, the course will create a number of opportunities for students to learn how to read a paper, how to write a paper review, how to give a good research talk, and how to ask questions during a talk?

Logistics

Pre-requisites

Grading

Assignments

Final Project

Schedule

If you are a speaker, please see this doc about how to upload your slides after the presentation.
If you ask any question in the Q/A sessions, please write down the questions in this form (one question per row).

Date Topic Content Presenter
Wed 9/7 Course Objective I Introduction to Human-in-the-loop Data Management Jiannan [slides]
Mon 9/12 Course Objective II Essential Skills Needed for a PhD Student (How to read & review a paper? How to give a talk? How to ask questions?) Jiannan [slides]
Part 1: Crowdsourced Data Processing (Human as Data Processor)
Wed 9/14 Background Crowdsourcing systems on the world-wide web Jiannan [slides]
Mon 9/19 Systems and Programming Models CrowdDB: Answering Queries Using Crowdsourcing
TurKit: Human Computation Algorithms on Mechanical Turk
Sima [slides]
Han Shen [slides]
Wed 9/21 CrowdForge: crowdsourcing complex work Han Bao [slides]
Mon 9/26 Quality / Latency Control Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
SQUARE: A Benchmark for Research on Computing Crowd Consensus
Akash [slides]
Srikanth [slides]
Wed 9/28 CLAMShell: Speeding up Crowds for Low-latency Data Labeling Yan [slides]
Mon 10/3 Data Annotation Labeling images with a computer game
ImageNet: A Large-Scale Hierarchical Image Database
Akshay [slides]
Nazanin [slides]
Wed 10/5 Cheap and Fast — But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks Yifang [slides]
Mon 10/10 Thanksgiving No classes -------
Wed 10/12 Crowdsourced Operators I Human-powered Sorts and Joins Xiaoyi [slides]
Mon 10/17 CrowdER: Crowdsourcing Entity Resolution
Leveraging Transitive Relationships for Crowdsourced Joins
Ruochen [slides]
Loong [slides]
Wed 10/19 Crowdsourced Operators II Cascade: Crowdsourcing Taxonomy Creation Bandeep [slides]
Mon 10/24 Using the crowd for top-k and group-by queries
Crowdscreen: Algorithms for filtering data with humans
Mohan [slides]
Venkatesh [slides]
Part 2: Interactive Analytics (Human as Data Scientist)
Wed 10/26 Background IPython: A System for Interactive Scientific Computing Rashmisnata [slides]
Mon 10/31 Enterprise data analysis and visualization: An interview study
The Emerging Role of Data Scientists on Software Development Teams
Abhishek [slides]
Si [slides]
Wed 11/2 Interactive Data Cleaning Wrangler: Interactive Visual Specification of Data Transformation Scripts Eshan [slides]
Mon 11/7 SampleClean: Fast and Accurate Query Processing on Dirty Data
Scorpion: Explaining Away Outliers in Aggregate Queries
Jinglin [slides]
Sha [slides]
Wed 11/9 Interactive Visualization Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases Walther [slides]
Mon 11/14 Prefuse: a toolkit for interactive information visualization
SEEDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics
Pei [slides]
Kiana [slides]
Wed 11/16 imMens: Real-time Visual Querying of Big Data Saeedeh [slides]
Mon 11/21 Interactive Machine Learning Active Learning Literature Survey (Sec 1-4)
ActiveClean: Interactive Data Cleaning For Statistical Modeling
Lovedeep [slides]
Mohamad [slides]
Wed 11/23 Power to the People: The Role of Humans in Interactive Machine Learning Saif [slides]
Mon 11/28 Interactive SQL Analytics Spark SQL: Relational Data Processing in Spark
Implementing Data Cubes Efficiently
Mangesh [slides]
Manpreet [slides]
Wed 11/30 BlinkDB: queries with bounded errors and bounded response times on very large data Jacky [slides]
Final Project
Wed 12/7 Final Project Final Project Poster Session Groups


 


  © Jiannan Wang 2016