UWT/TCSS 490-590 Data Streams Research Seminar Course Syllabus
Autumn 2010

Instructors: Ankur M. Teredesai, Ph.D., Badrish Chandramouli, Ph.D. and Mohamed Ali, Ph.D.
Location:
PNK212 conference room
Class sessions:
Tuesdays and Thursdays 4:15 pm - 6:20 pm

Course Overview:

Welcome. This course is an advanced level computer science elective in the areas of data streams. The idea is to explore current topics in data streams research to understand and improve the state of the art. The focus is to clearly explore various aspects of designing and developing systems that work on large streams of data. Familiarity with database systems; in particular database design is expected.

Prerequisites:

TCSS 445/545 or equivalent database systems design experience at work or at internships. You should be proficient in database design and have an understanding of basic database system implementation techniques.

We also recommend that all students have prior experience with at least one programming language such as C/C++/Java/C# and have demonstrated ability to work on coding algorithms and data structures. Students with no programming experience should not take this course.

Readings: The links to readings will be made available on moodle during the term. Please bring your readings to each class session so that you can refer to them during discussion. You will additionally do a number of readings of your own choosing for the project and the accompanying research paper that you will write. Your instructors may assign specific readings to individual students.

Course Content:

There will be several aspects to this course:

1. A group project (2 students max)

2. Instructor led and Student led lectures on class topics

3. Discussion on streams implementation techniques

4. Guest lectures from industry experts describing latest research trends and implementation issues in commercial database systems.

5. Demo days: Class sessions where you demonstrate the projects you are working on towards the end of the term

6. Annotated Bibliographies: Corresponding to each reading assignment you will be expected to write an annotated bibliography. Bring a hard copy to class and hand it in to the instructor. The guidelines are provided as an attached document as well as on moodle.

7. Class quiz: Each week the Tuesday class will begin with a short quiz based on the reading assignments for that week. This quiz is to ensure that you are taking the reading assignments seriously and to encourage a good discussion of topics in the classroom.

Class Schedule:

Week

Date

Day

Lecture

Homework/Project

Required Reading

Various Research Papers

1

9/30/2010

Thursday

Course Introduction, Introduction to Data Stream

AN INTRODUCTION TO DATA STREAMS Charu C. Aggarwal

2

10/5/2010

Tuesday

Overview of Data Stream Processing

Download StreamInsight and related components

Data Stream Processing ByJo„o Gama and Pedro Pereira Rodrigues

10/7/2010

Thursday

Projects: Explain and Discuss

Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, Mingsheng Hong: Consistent Streaming Through Time: A Vision for Event Stream Processing. CIDR 2007: 363-374

3

10/12/2010

Tuesday

Microsoft Streaminsight

Product Documentation

10/14/2010

Thursday

Introduction to Continuous Query Processing

Class ends early at 5:30. Students to form teams and discuss projects

Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. "SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases". In ACM SIGMOD 2004.

4

10/19/2010

Tuesday

Continuous Query Processing: Latest Issues

Assignment on continuous queries

10/21/2010

Thursday

Scheduling

5

10/26/2010

Tuesday

Loadshedding Strategies in Datastreams

Tatbul et al. Load Shedding in a Data Stream Manager. VLDB 2003

10/28/2010

Thursday

Loadshedding Strategies in Datastreams

Assignment on loadshedding

6

11/2/2010

Tuesday

Join Processing

J. Kang, J. F.Naughton, and S. D. Viglas. Evaluating window joins over unbounded streams. In ICDE, March 2003. (no annotated Bib).

11/4/2010

Thursday

Join Processing Indexing

Assignment on Join Processing

Indexing Data Streams Ė Charuís Text

7

11/9/2010

Tuesday

CM sketch, sorting etc Query Optimization

CM Sketch algorithm, Sort Me If You Can: How to sort dynamic data - Aris Anagnostopoulos et al.

Paper on Query Optimization TBD

11/11/2010

Thursday

University Holiday - Veterans Day

N/A

N/A

8

11/16/2010

Tuesday

Stream vs Complex Event Processing Systems

Evolving Social Streams

International Workshop on Modeling, Managing and Mining of Evolving Social Networks (M3SN), in conjunction with IEEE International Conference on Data Engineering (ICDE 2010), Long Beach, CA, USA (Sindhura)

11/18/2010

Thursday

Dynamic Pattern Matching Indexing/Topic Modeling,

Graph and geometric problems in streams

Badrish Chandramouli, Jonathan Goldstein, and David Maier. High-Performance Dynamic Pattern Matching over Disordered Streams. In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB '10), Singapore, September 2010

Topic Modeling:

Yao, L., Mimno, D., and McCallum, A. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Paris, France, June 28 - July 01, 2009). KDD '09. ACM, New York, NY, 937-946. DOI= http://doi.acm.org/10.1145/1557019.1557121

Stream Graph processing:

(1) short introduction: http://www.cs.umass.edu/~mcgregor/papers/08-graphmining.pdf

(2)Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. 2008. Estimating PageRank on graph streams. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '08). ACM, New York, NY, USA, 69-78. DOI=10.1145/1376916.1376928 http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1376916.1376928

Data Stream Algorithms, Notes from lecture series by S. Muthu Muthukrishnan

9

11/23/2010

Tuesday

CM sketch, sorting etc

CM Sketch algorithm, Sort Me If You Can: How to sort dynamic data - Aris Anagnostopoulos et al.

11/25/2010

Thursday

University Holiday - Thanksgiving Day

N/A

10

11/30/2010

Tuesday

Clustering in Data streams

Charu Agarwal's Clustering in DS chapter

12/2/2010

Thursday

Indexing/Topic Modeling Dynamic Pattern Matching

Badrish Chandramouli, Jonathan Goldstein, and David Maier. High-Performance Dynamic Pattern Matching over Disordered Streams. In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB '10), Singapore, September 2010

11

12/7/2010

Tuesday

Frequent Pattern Mining/Classification/Social Streams

12/9/2010

Thursday

Final Project Demos