Cite as:
Efthimiadis, E.N. "End-user's understanding of thesaural knowledge structures in interactive query expansion." In: Advances in Knowledge Organization. Proceedings of the Third International ISKO Conference, Copenhagen Denmark, June 20-24, 1994. Hanne Albrechtsen and Susanne Oernager, eds. Frankfurt am Main: Indeks Verlag, vol.4, 1994, pp 295-303.
The process of term selection for query expansion by
end-users is discussed within the context of a study of interactive query
expansion in a relevance feedback environment.
This user study focuses on how users' perceive and understand term
relationships, such as hierarchical and associative relationships, in their
searches.
End-users can now have direct access to databases
without having to consult a search intermediary. End-user searching therefore raises a very important question:
"how effective are the end-users in their searching?" Since "end-users" do not have the
formal training that "intermediaries" have, it is important to study
their searching behavior and their understanding of knowledge structures used
in information retrieval, such as thesauri, in order to identify patterns, styles, and especially
problems. The knowledge gained from
such studies can then be used to suggest methods for improving the existing
systems and, therefore, make the end-user searching process less frustrating to
the end-user during the interaction, as well as more efficient in terms of
retrieval.
The research presented in this paper is an investigation of interactive query expansion[1] within a relevance feedback system (Efthimiadis, 1992).
In order to investigate the process of query
expansion as well as its effectiveness one needs to have a real system.
Therefore, real users with their real requests were used in an operational
environment, as opposed to searching a static test collection with fixed
(artificial) queries, in order to study query expansion and relevance feedback
in a dynamic user-centered environment. For the research reported here the
INSPEC database, on both Data-Star and ESA-IRS, was searched online using CIRT,
a front-end system that allows weighting, ranking and relevance feedback
(Robertson & Thompson, 1990).
This research is among the first user studies that have
been conducted in an operational system
with relevance feedback and it has investigated, among others, the processes of
interactive query expansion, term
selection for query expansion, and the user perception and understanding of term
relationships.
In the context of a real system, real requests, and real
interaction, particular importance is given to the characteristics of the
interaction. The selection of terms at a particular stage in the search
process is, for example, the most
obvious characteristic. More
specifically, the process of term selection by the users is of particular interest for understanding the
users' searching behavior and its
implication for the design of retrieval systems.
This paper focuses only on the term selection
characteristics with respect to query expansion. The other facets of this research, ranking algorithms and overall
user study, on interactive query expansion are reported elsewhere (Efthimiadis,
1993; 1994). The research objectives,
specific to query expansion, that are reported here were:
*
characteristics of user selection of terms that could be added to the
search? (i.e., query expansion)
* how end-users perceive term relationships?
*
what types of relationships (as specified by the end-user) exist between the
initial query terms and the query
expansion terms?
Data was collected from 25 searches. The data collection
mechanisms included questionnaires, transaction logs, and relevance
evaluations.
The questionnaires elicited information about the query
expansion terms that were selected by the user from a list of terms that was
drawn from the descriptors and the identifiers of records that users had
previously seen and judged relevant. The questions concentrated on how the user
perceived the relationship between the
query terms and the term that s/he selected for query expansion. End-users were
asked to identify whether the query expansion terms were chosen because they
were thought of as being: (a) synonyms
of query terms, (b) related terms, (c) the best alternative to express the
subject that they could find in the list, or (d) terms that were representing
new ideas.
In addition, for each term users were also asked to
identify whether there is a correspondence between each new term and the
initial query terms, as well as the type of relationship that hold between
them, e.g., broader, narrower or related term relationships.
The variables examined were divided into seven
categories which include: user characteristics, request characteristics,
subjective user reactions, retrieval effectiveness, user effort, search process
characteristics and term selection characteristics. These were collected and evaluated via questionnaires and logs.
The categories are discussed below.
• User
characteristics. This category
covers user specific data which are pertinent to the search process, for
example, areas of subject expertise, the level of their work, and the number of
previous online searches, either with or without an intermediary.
• Request
characteristics. Scaled data were
elicited on context and problem characteristics: e.g., subject area of the
request, nature of enquiry, e.g., accurate or vague, type of search required,
i.e., broad or narrow. These allowed to
relate such issues as specifiability of problem, clarity of problem, work done
on problem, etc. to type of search required and success of search.
• Search
process characteristics. The
questionnaires on the outcome of the search elicited scaled data on the
effectiveness and efficiency of the search. These allowed to make estimates of
the degree of success or failure of the interaction, from the point of view of
the user and thus to relate to searching behavior.
• Subjective
user reactions. This category is
mostly concerned with the user's overall reactions to the search, impressions
of effort involved, and reaction to search results ‑ not offline
prints. This also included variables
from other categories such as how close was the search to the original/intended
enquiry; and whether the expected number of references were retrieved.
• Term
selection characteristics. This
questionnaire elicited information about the query formulation and
reformulation of terms that were selected by the users.
The questions concentrated on how
the user perceived the relationship between the query terms and the term that
s/he selected for query expansion. They were asked to identify whether the
query expansion terms were chosen because they were thought of as being
synonyms of query terms, related terms, the best alternative to express the
subject that they could find in the list, or representing new ideas.
The data collection instruments used in the study
included questionnaires and transaction logs. A summary of the instruments is
given below.
The data collection instruments used in the study can be
divided into three categories which include questionnaires, transaction logs
and offline prints. A summary of the instruments is given below and the
discussion that follows is presented by category.
• Pre‑search questionnaire.
Background information and context
of information request. Users' personal
data, assessment of subject enquiry, user's own online experience, etc.
• Search process log.
Search summary and notes, number of terms, online time, etc.
• Query expansion questionnaire.
(a) Term identification and
selection for query expansion from ranked list, and
(b) Term relationships
• End of search questionnaire.
User's satisfaction, impression of
search, assessment of query and results, and of the number of references
expected.
• Evaluation of offline prints.
Relevance judgements on a 3 point
scale and subdivisions.
• Final questionnaire.
Final assessment of search as a
whole and user remarks.
• Other logs.
Logs of the interaction between
searcher-CIRT, CIRT-vendor/host, searcher-ESA.
The methodology is summarized and presented below in a
step-by-step manner. The summary is then followed by a discussion
of the steps which describe the rules and guidelines that were employed as well
as how these worked in practice.
Steps:
• pre-search interview, user completes
questionnaire (questions 1-11)
• select initial query terms (search notes)
• search INSPEC in Data-Star using CIRT:
-
online relevance judgements
•
parse records of all positive relevance judgements and extract from the
descriptor field candidate terms for query expansion.
•
weight and rank all candidate terms (identified in the previous step) for query
expansion
• print (and display on the screen) the ranked list of
terms
• user evaluates the ranked list:
- user identifies all terms thought
to be useful for the search; - user selects query expansion terms; - user
completes questionnaire on term selection (quest 12-13)
• continue the search using CIRT:
- add
query expansion terms; - search; - print retrieved documents; - logoff
•
user completes questionnaire assessing
the search process before seeing and evaluating the search results (questions
14-18)
• offline evaluation
The steps presented above together with a set of
procedural guidelines helped in providing a consistent method for data
collection.
The records that were identified as relevant by the user
during the online relevance feedback interaction were parsed and the terms were
weighted. The terms were presented in the form of ranked lists and were
displayed on the screen as well as printed on paper. The users were asked to go through the list of terms twice:
(a) to identify ALL the terms that they considered as
being good for the purpose of the search;
(b1) to select the 5 best terms of those identified as
good in step (a);
(b2) to rank the 5 terms they selected in descending
order of importance to them.
The evaluation of the ranked list and the selection of
the query terms was followed by the completion of the questionnaire on term
selection.
The results discussed here are based on the 25 searches
finally obtained. The analysis was
essentially directed at the query expansion aspects of the searches, on the
ranking algorithm and on retrieval effectiveness. The discussion throughout is mostly qualitative in nature. The
data have also been subjected to appropriate statistical analysis using various
tests. However, because the sample is
small there are occasions where the results are presented only with the
intention to demonstrate some trends and to facilitate the discussion and there
are no claims of any statistical significance.
The users, after the evaluation of the ranked list
completed the questionnaire on term selection. The variables involved and the
procedure for collecting the data have been described above.
The results are presented in two stages. First the
answers to the questionnaire are analyzed and discussed and then the user
preferences of query expansion terms are compared to the system's suggestions.
At first users identified all the terms in the ranked
list that they thought as being useful for the purpose of the search, i.e. they
selected terms that would be acceptable to use in the search. The results of
this part of the selection process are given in Table 1 which presents the
total number of terms in each list and the number of terms selected by the
users.
The percentage of terms chosen by the users from the
ranked lists ranged from a minimum of 4% to a maximum of 87% with a mean of 28%. This implies that on average about one third
of the terms given in the lists is thought to be useful. This is approximately
an average of 18 terms. If however "search #128" is excluded as being
an extreme case, because 98 terms were selected out of 113, then the average
drops to 15 terms.
Terms Terms Terms Terms
User
in list chosen % User in
list chosen %
101
62 10 16
118 55 32 58
102
38 9 24 119 117 34 29
103
137 11 8 120 44 9 20
105
77 14 18
121 61 17 28
108
33 8 24 122 60 13 22
110
61 10 16
123 61 12 20
111
48 31 65
124 80 27 34
112
93 6 6
125 41 21 51
113
65 24 37
126 39 7 18
114
62 15 24
127 62 11 18
115
42 9 21 128 113 98 87
116
77 3 4
129 34 8 24
117
64 25 39
Table 1: Totals and percentages of terms in the
ranked-lists and of terms chosen by the subjects.
The questionnaire on query expansion asked the users two
questions, each corresponding to some aspect of the term selection process they
had followed. The users were asked for
the reason(s) that made them chose those terms. How did they think the selected
terms related to the search and to the original query terms. They were asked to
consider such relationship(s) for all
terms collectively rather than for each term individually. Users were given
four options to choose from and they could select as many as they thought
appropriate. The options were: (a)
variant expressions or synonyms; (b) alternative (related) terms; (c) couldn't
find better term(s) to express the subject of the enquiry; and (d) representing
new ideas (i.e. not part of your original request).
![]()
64% variant expressions or synonyms
88% alternative (related) terms
4% couldn't
find better term(s) to express the subject of the enquiry
44% representing new ideas (i.e. not part of
your original request)
Table
2: Association of terms identified from the rank list to query.
![]()
Table 2 summarizes the results of the user perceived
association between the terms identified from the ranked list to the query. The
percentages given in Table 2 do not amount to a total of 100% because users
were asked to select as many of the four options as applied to the terms. Users
thought of the terms they selected as being alternative (related) terms to the
query terms for 88% and as variant expressions or synonyms for 64% of the time.
These two categories account for the majority of the responses. A very small
percentage (4%) chose the terms because they could not find a better term from
those on the list to express the subject of the query. A rather interesting
finding comes from terms that do not relate directly to the original query
terms and which represent new ideas. These accounted for 44% of the responses.
This finding demonstrates the unpredictability that is
involved in subject searching and the difficulties imposed on information
retrieval. Additional information was not collected for this category. In retrospect I think some questions could
have been included to elicit information about the terms that represent new
ideas. Further research is therefore needed into this area. More specifically
about the relationships of the `new ideas' to the original query. What was the
reason for choosing these terms? Was the user aware about these new
concepts/ideas at the beginning of the search? If yes, why were these not
expressed at the pre-search interview? Was the reason for the exclusion
interview related?, e.g. communication failure, or did the user chose to
exclude them at the interview stage because s/he thought of them as
peripheral? If users had not thought of
these concepts at an earlier stage, did they knew about them before? Did they
recognize and chose these concepts as the result of a learning process during
the search? On the whole, what were the
reason(s) and stimuli that made them choose these new terms. Answers to these
questions I believe will contribute to the understanding of the users'
searching behavior.
The second question asked the users how they perceived
the relation of the five best terms to the original query terms. As mentioned earlier users were asked to
select the 5 best terms of all those they had identified as useful. The
questions on term relationship concentrated on whether there is a
correspondence between each of the 5 new terms and the query terms. If the user
identified that there was a correspondence between them the type of correspondence was noted. The term relationship
that users were asked to select from are among the standard types of
relationships found in thesauri, i.e. broader term and narrower term for
hierarchical relationships and related term for affinitive/associative
relationships.
The relationships of the 5 best terms to the query terms
are shown in Table 3. For 34% of the
chosen terms there is no relationship or other type of correspondence to a
query term. The remaining 66% of the terms is divided as follows. A narrower
term relationship between a selected term and a query term accounts for 70% of
the responses. A broader term relationship accounts for 5%. An associative
relationship (i.e. related term) holds for 25% of the terms.
![]()
34%
No relationship
66%
Some relationship: 5% Broader
70%
Narrower
25%
Related
Table 3: Relationship of the user
selected 5 best terms for query expansion to the initial query terms.
![]()
From these results it can be established that
approximately 75% of the term associations fall within a hierarchical
relationship. Users have overwhelmingly selected narrower terms as the terms
for query expansion. This finding is very important and emphasizes the possible
advantages that may be involved if an online thesaurus is used. Such a thesaurus
could assist users in looking-up terms, establishing their relationships and
deciding on term inclusion, as well as for exploiting hierarchical
relationships. A possible way to include terms could be in clusters or in hierarchies. For example in a fashion similar to that of the `explode' command in
Medline, where an `exploded' term retrieves itself as well as all the terms in
the hierarchy beneath it. However, all these should be user-controlled or
`machine-aided' operations rather than entirely automated. This suggestion
comes from the poor results achieved from automatic query expansion in earlier
information retrieval experiments.
The study has provided useful information about
interactive query expansion and relevance feedback through a front-end system
to online databases. The aim was to look at the process of interactive query
expansion when searching in an operational situation with real users and see
what can be learnt from it.
The methodology provided consistency during searching
and was proven to be effective. There were not any surprises during the data
collection and ambiguities were resolved by following the guidelines set by the
methodology or as the occasion demanded.
A pattern that emerges from the user selection of terms
for query expansion suggests that about one third of the terms from a list of
candidate terms are potentially useful. This finding may have implications in
the design of a query expansion module for the selection of terms. Such module
can be applied to both types of query expansion, interactive or automatic. The suggested cutoff level of approximately 20 terms for the display of terms for query expansion may not be representative
of the required number of useful terms and therefore the option of using one
third of the terms may be more appropriate.
Since the one third of the total number of terms in some systems may be
a quite large number of terms a browsing option can facilitate the
identification and selection of terms.
Users identified
mostly as `related terms' or `synonymous terms' the relationship between
the terms they chose from the list and the initial query terms. A breakdown of
the relationship of the 5 best terms chosen by the users to the query terms
reveals that the hierarchical relationship predominates. Query expansion terms
were mainly narrower terms to the corresponding query terms.
A number of questions for future research emerge from
this study. Such questions include:
* how do end-users select terms to search with? (i.e. initial query terms),
*
which are the difficulties end-users encounter during the term selection
process (this includes both the initial query formulation and the query
expansion)?
* what is the nature of such difficulties?, are these
difficulties technical or conceptual?
* how could these difficulties be analyzed?
*
what is the correspondence of the user-identified term relationships to the
relationships found for these terms in the INSPEC thesaurus.
Answers to the above questions would facilitate better
understanding of the problems that end-users face during searching. Such analysis would result in suggestions
and recommendations for improving the design of retrieval systems in order to
facilitate the display of knowledge structures, such as thesauri, as well as
their use as tools for navigation, query formulation and query expansion.
Efthimiadis, E.N. (1992) Interactive Query Expansion
and Relevance Feedback for Document Retrieval Systems. Ph.D. thesis, City
University, London, U.K., July 1992.
Efthimiadis, E.N. (1993) A user-centred evaluation
of ranking algorithms for interactive query expansion. In Proceedings of the
16th Annual International ACM SIGIR, Pittsburgh, PA, June 27 - July 1, 1993. Korfhage, R. Rasmussen, E. and Willett, P., eds. pp146-159.
Efthimiadis, E.N. (1994) "Interactive query
expansion: a user-based evaluation in a
relevance feedback environment". Submitted for
publication.
Robertson, S.E. and Thompson, C.L. (1990) Weighted
searching: the CIRT experiment. In: Informatics 10: prospects for
intelligent retrieval. University of York, 21‑23 March 1989. London:
ASLIB, 1990, 153‑166.