Given that the database community has been working on (big) data management for decades, it is natural to ask the question that what's novel about modern Big Data systems. This is the key question we aim to answer in this graduate seminar. In this final project, you can show off what you have learnt by doing one of the projects: (1) Contribute to an open-source data system; (2) Write down your personal view of big data systems.
Please choose ONE project type, and follow the following steps to do the project:
The table below summarizes the deadline for each phase:
ID | |||
---|---|---|---|
1 | Form a Team | Thursday 02/28 at 11:59 PM | Create your team in CourSys |
2 | Initial Plan | Tuesday 03/12 at 11:59 PM | Submit the filled form to the CourSys activity Initial Plan |
3 | Poster Session | Wednesday 4/10 at 10:00 AM Wednesday 4/10 at 2:00 PM |
Submit your poster to the CourSys activity Poster Present your poster at T9204 W&E |
4 | Report | Sunday 4/14 at 11:59 PM | Submit the report to the CourSys activity Report |
Please choose one project type, and follow the corresponding instruction to do the project.
From a functionality point of view, big data systems are quite similar to traditional RDBMS because both of them are designed to better store and process data. But, why do people get more excited about big data systems than traditional RDBMS right now? After this graduate semester, I believe everyone can form your person view about this question.
If your final project is to write a paper about your personal view of big data systems, the paper should consist of two parts:
In the first part (at least 5 pages), you need to list all the aspects in which you think big data systems are different from traditional RDBMS. For each aspect, you need to come up with a list of questions and answer each of them in detail. For example, if you think "flexibility" is one aspect, you need to explain why you think big data systems are more flexible, what are the key ideas/techniques that make them more flexible, why you think being flexible is very important, what they sacrifice for being flexible, and how hard it is to make traditional RDBMS as flexible as big data systems?
In the second part (at least 7 pages), you need to pick up one specific topic (e.g., Query Optimization, Transaction Management, In-memory Database, Large-Scale Dataflow Engines), and do a survey on this topic. Here are a list of steps you need to do:
Here is a suggested outline of the paper. You don't have to use it, but the paper has to cover the materials mentioned above.
Submission
This is the best time for being a data system programmer. Almost all the mainstream big data systems are open sourced. As a data system programmer, you can not only learn how the systems work by directly reading their source code, but also make a contribution to the systems (e.g., adding a new feature or fixing a bug). Being a contributor to an open-source project can be highly rewarding.
If your final project is to contribute to an open-source data system, here are a list of steps you need to do.
Submission