UWT/TCSS 490 Data Streams Research Seminar Course Syllabus
Spring 2010

Instructors: Ankur M. Teredesai, Ph.D. and Mohamed Ali, Ph.D.
Location: PNK 212
Class sessions:
Tuesday only 1:30- 3:30 pm

Course Overview:

Welcome. This course is an advanced level computer science elective in the areas of data streams. The idea is to explore current topics in data streams research to understand and improve the state of the art. The focus is to clearly explore various aspects of designing and developing systems that work on large streams of data. Familiarity with database systems; in particular database design is expected.

Prerequisites:

TCSS 445/545 or equivalent database systems design experience at work or at internships. You should be proficient in database design and have an understanding of basic database system implementation techniques.

We also recommend that all students have prior experience with at least one programming language such as C/C++/Java/C# and have demonstrated ability to work on coding algorithms and data structures. Students with no programming experience should not take this course.

Readings: The links to readings will be made available on Moodle during the term. Please bring your readings to each class session so that you can refer to them during discussion. You will additionally do a number of readings of your own choosing for the project and the accompanying research paper that you will write. Your instructors may assign specific readings to individual students.

Course Content:

Week #

Topic

Papers Readings

1

- Data Streams: a Paradigm Shift

- Introduction to Microsoft StreamInsight

- Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, Mingsheng Hong: Consistent Streaming Through Time: A Vision for Event Stream Processing. CIDR 2007: 363-374

- Microsoft StreamInsight Product documentation (Getting started/ Planning and architecture)

2

- StreamInsight Adapter framework

- StreamInsight Operators

- StreamInisght Extensibility framework

- Microsoft StreamInsight Product documentation (Creating adapters/ Operations/ Extensibility framework)

3

Project Description and Role Assignment

 

4

- Spatiotemporal data streaming

- Microsoft SQL Server Spatial Library

- Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. "SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases". In ACM SIGMOD 2004.

- Microsoft SQL Server product documentation

5

- An example data streaming operator: JOIN

- J. Kang, J. F.Naughton, and S. D. Viglas. Evaluating window

joins over unbounded streams. In ICDE, March 2003.

- M. F. Mokbel, M. Lu, and W. G. Aref. Hash-merge join: A

non-blocking join algorithm for producing fast and early join

results. In ICDE, 2004.

 

6

 

- Approximate Query Processing

- Sampling and  load shedding

 

- Rajeev Motwani, Jennifer Widom, et al, Query Processing, Resource Management, and Approximation

in a Data Stream Management System, CIDR 2003

 

- Tatbul et al. Load Shedding in a Data Stream Manager. VLDB 2003

 

- J. Kang, J. F.Naughton, and S. D. Viglas. Evaluating window

joins over unbounded streams. In ICDE, March 2003.

 

7

 

Adaptive Query Processing

 

- Avnur & Hellerstein. Eddies: Continuously Adaptive Query Processing. SIGMOD 2000.

- Madden, Shah, Hellerstein and Raman. Continuously Adaptive Continuous Queries over Streams. SIGMOD 2002.

8

 

Continuous Query Optimization

- Viglas and Naughton. Rate-based Query Optimization for Streaming Information Sources. SIGMOD 2002.

- Chen, DeWitt and Naughton. Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries. ICDE 2002.

 

9

 

Mining Online Data Streams

 

- Manku & Motwani. Approximate Frequency Counts over Data Streams. VLDB 2002

- Project Presentation

10

 

Sensor Networks

- Yao & Gehrke. Query Processing for Sensor Networks. CIDR 2003

- Project presentation