CSPP 56553 - Artificial Intelligence
Winter 2004
Homework #3: Due February 4, 2004
Goals
Through this assignment you will:
- Consider issues in the design of memory-based systems.
- Explore the effect of different expansion strategies on information
retrieval
Background
Previous homeworks have explored the representation and manipulation of
knowledge to produce inferences. Using mechanisms such as rule-based systems
and constraint satisfaction, we have seen how knowledge about the world
and about specific problems can be encoded in different ways. Further,
we have seen how the choices of knowledge representation can be
manipulated in different ways. However, all these techniques share
the fundamental requirement that the knowledge be captured manually
by some knowledge engineer to enable later automatic
inference.
This homework begins our exploration of general techniques that can
use the regularities in observed data to acquire functions from
inputs vectors to output values or classifications. In other words,
we will explore systems that can automatically learn from examples.
We begin with two well-known and widely used learning techniques
that learn functions from labeled training data: nearest neighbor
and identification trees.
Problem 1
Picking Moves
You've studied both alpha-beta minimax game play and nearest neighbor
learning. An impatient friend suggests that all that search is
simply too complicated and unnecessary since there are many on-line
game servers that record the moves of a game. Your friend suggests
that rather than doing alpha-beta you should simply download all the
games from a server and use a memory-based technique. For any given
game position (e.g. board position in a board game like chess or
hand of cards in a card game), find a move in the server's collection
that matches that position exactly and then do the move made by the
player in the recorded game.
You argue that things, unfortunately, aren't that simple. Give
three problems with your friend's approach
to game-playing. Please keep your answers brief (2-3 lines each).
Problem 2
Information Retrieval
Formulate four (4) detailed queries in a domain you know well.
You will use these queries to test your intuition in the questions
below.
Part 1
Expand the queries developed about to include all of the morphological
variants of each query word. Submit these expanded queries to two
of your favorite search engines. Does such morphological processing
seem warranted?
Part 2
Using WordNet,
an online semantic ontology/thesaurus, expand your queries to include
all the synonyms of each word in the original queries aboce. Report
on the results of submitting the queries to the two search engines used
above.
Part 3
Again using WordNet, expand your queries to include only those
synonyms that are appropriate for the senses of the terms you
intended in the original query. Submit these expanded queries to
a set of search engines and compare the results to those in the
previous exercise.
Part 4
Word sense disambiguation - picking the right senses of synonyms for
expansion - seems to have little effect on retrieval effectiveness
in settings where long queries are used. Suggest reasons why this
might be the case.
Part 5: Relevance Feedback
In class, we suggested that you could use information from
the set of documents retrieved in an initial query to improve the
query formulation by adding new terms.
Positive evidence
Consider the case where you select five documents that you believe to
be relevant to use as a source of terms to add to the query. Which
terms would you select? Consider values that we suggested should
contribute to the weight of query terms. Explain why.
Positive and Negative evidence
Now assume that you have identified 5 non-relevant documents that
were nevertheless ranked highly by the first pass of the retrieval
system, in addition to the 5 relevant documents chosen above.
How would you modify your expansion process given this new
evidence? Justify your answer.
Problem 3
Building a Recommender System
You would like to build a book recommender system similar to that
on Amazon.com. Based on purchasing patterns - since it's hard to
get people to submit actual ratings, you would like to identify
similar shoppers, and recommend to them books that were purchased
other similar purchasers.
Formulate input feature vectors for training and classification
based on book categories and purchasing patterns. Given five
example vectors.
Describe the type of labels you assign as classifications - aka
the output.
Describe the similarity function for identifying the nearest neighbor.
Some people have similar tastes in book types, but some may be
much more voracious readers than others. Does this present
a problem for your similarity measures? If not, why not?
If it does, explain how to modify your measure to over this problem.