Syllabus:

Catalog Description: Covers established methods and current developments for finding information on the web. Understanding search engines: information extraction and indexing, vector space model, latent semantic indexing, ranking. Current trends: tagging (web 2.0), people search, XML retrieval, question answering, local search, recommender systems, semantic web.


Instructors: Profs. Ankur M. Teredesai and Martine De Cock


Course Objectives:

To teach students to:

·     Understand the basics of search engine functionality

·     Understand search design issues and performance constraints

·     Understand search and query processing in small and large networks.

·     Understand current trends in search.


Course Outcomes:

Upon successful completion of the course, students shall be able to:

·     Explain the basic principles of information indexing and ranking.

·     Explain existing state-of-the-art in semantic web.

·     Design search technologies as part of larger systems.

·     Describe issues facing latest social web technologies.


WHO SHOULD TAKE THIS COURSE:

The course is structured to attract any UW student who is interested in learning about web search. Students with an undergraduate course in data structures and statistics/discrete mathematics will find this course enjoyable and interesting. Experience with java/php and MySQL query processing is desired.


Workload and Style:

The course will involve discussions, research readings and one entire term-long group project.

Each week the discussion will be led by the instructor, a guest lecturer or a student from the class. The discussions will be threaded around the topics outlined in the course. The hope is that this discussion will be informal and fun, supported by in-class exercises that enhance the quality of learning. Hence, many times, the class will involve in-class activities that will require participation from all students.


GRADING SCHEME:

Project: (a) Working code that can be shared with the community - 20%, (b) Paper/Report outlining problem and results-15%. (c) Weekly status presentation updates - 5% - Total (40%)

Reading Assignments - Blog Posts(23%), Comments (2%).

Class Presentation and Discussion Leading - (25%)

Participation - (10%).


Reading Assignments:

Typically one article/manuscript will be provided for our reading review. The requirement is that you MUST read and assimilate the information in the article prior to the discussion in class. In addition your reviews on the reading must be posted at-least 24 hrs prior to the class meetings.

Reviews for reading assignments should be submitted to the instructor. You may blog your reviews and publish a rss feed. You may post and then review and comment on other reviews and posts to enhance the quality of discussion. Remember that even though the author made the most sincere attempt at developing the idea/theory such that it is flawless, it can have flaws. Our aim is to look beyond these flaws and not merely critique the effort but to gain an understanding of the subject area and the author(s) contribution to advancing the state-of-the-art. Your reviews should ideally offer constructive comments about the work and scope for its improvement.


TERM PROJECT:

Individual project.

Scope: The area of social networks is broadly construed to include a variety of topics outlined in the list of topics. The focus is to develop an idea that explores one of these topics and advances the current understanding of the problem domain as reflected by these topics.


Instructor will be available for consultation on the project scope and ideas prior to the start of the quarter and during the first week of classes. It is important to note that your project can focus on any one or more social network platforms or focus on offline social networks but you must prepare a plan for the project and get it approved by the instructor.


Timeline: In a quarter system it is important to get a head start on identifying the project. The first 3 sessions will be geared to help identify a project topic and get approval on the idea.  During the fourth session you will present the idea to the class and fine-tune the deliverables based on class discussion.


The final deliverable will include a not to exceed 10 pages paper reflecting your contribution to the area of social networks in ACM Proceedings Format. Concise, filtered presentation of ideas and theories carry more importance than loose statements that enhance the length of the paper. Kindly keep that in mind while writing.

TCSS590A - Web Search

T-Th 4:30 - 6:45 pm PNK104

Topics:

1.     Wk 1 - Search Basics: Understanding search engines, document retrieval, indexing, page rank, vector space model, LSI (latent semantic indexing).

2.     Wk 2 - SEO (search engine optimization), Monetization, Page HITS, Indexing: R-tree, KD-tree

3.     Wk 3 – NLP and Text mining basics: Stop, stem, POS tagging, text categorization

4.     Wk 4 - Recommender systems basics: Collaborative Filtering, Machine Learning

5.     Wk 5 – RSs II : Trust Enhanced, Blog Recommendation

6.     Wk 6 – Focused Search basics: Local Search, Question Answering

7.     Wk 7 – Focused Search II – NLP issues

8.     Wk 8 – Semantic Web – Overview, OWL (web ontology language), SAWSDL (semantically annotated web service discovery language)

9.     Wk 9 – Semantic Web -  ontology learning, unstructured to structured text

10.   Wk 10 – Projects