End-user's understanding of thesaural knowledge structures in interactive query expansion.

Efthimis N. Efthimiadis

Cite as:

Efthimiadis, E.N. "End-user's understanding of thesaural knowledge structures in interactive query expansion." In: Advances in Knowledge Organization. Proceedings of the Third International ISKO Conference, Copenhagen Denmark, June 20-24, 1994. Hanne Albrechtsen and Susanne Oernager, eds. Frankfurt am Main: Indeks Verlag, vol.4, 1994, pp 295-303. 

 

Abstract:         

The process of term selection for query expansion by end-users is discussed within the context of a study of interactive query expansion in a relevance feedback environment.  This user study focuses on how users' perceive and understand term relationships, such as hierarchical and associative relationships, in their searches.

 

 

1.         Introduction

 

End-users can now have direct access to databases without having to consult a search intermediary.  End-user searching therefore raises a very important question: "how effective are the end-users in their searching?"  Since "end-users" do not have the formal training that "intermediaries" have, it is important to study their searching behavior and their understanding of knowledge structures used in information retrieval, such as thesauri, in order to identify  patterns, styles, and especially problems.   The knowledge gained from such studies can then be used to suggest methods for improving the existing systems and, therefore, make the end-user searching process less frustrating to the end-user during the interaction, as well as more efficient in terms of retrieval.

 

The research presented in this paper is an investigation of interactive query expansion[1] within a relevance feedback system (Efthimiadis, 1992).  

In order to investigate the process of query expansion as well as its effectiveness one needs to have a real system. Therefore, real users with their real requests were used in an operational environment, as opposed to searching a static test collection with fixed (artificial) queries, in order to study query expansion and relevance feedback in a dynamic user-centered environment. For the research reported here the INSPEC database, on both Data-Star and ESA-IRS, was searched online using CIRT, a front-end system that allows weighting, ranking and relevance feedback (Robertson & Thompson, 1990).

 

This research is among the first user studies that have been  conducted in an operational system with relevance feedback and it has investigated, among others, the processes of interactive query  expansion, term selection for query expansion, and the user perception and understanding of term relationships.

 

In the context of a real system, real requests, and real interaction, particular importance is given to the characteristics of the interaction. The selection of terms at a particular stage in the search process  is, for example, the most obvious characteristic.  More specifically, the process of term selection by the users is of  particular interest for understanding the users' searching behavior  and its implication for the design of retrieval systems.

 

This paper focuses only on the term selection characteristics with respect to query expansion.  The other facets of this research, ranking algorithms and overall user study, on interactive query expansion are reported elsewhere (Efthimiadis, 1993; 1994).  The research objectives, specific to query expansion, that are reported here were:

* characteristics of user selection of terms that could be added to the search?  (i.e., query expansion)

* how end-users perceive term relationships?

* what types of relationships (as specified by the end-user) exist between the initial query terms and the  query expansion terms?

 

 

2. Methodology 

Data was collected from 25 searches. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations.

 

The questionnaires elicited information about the query expansion terms that were selected by the user from a list of terms that was drawn from the descriptors and the identifiers of records that users had previously seen and judged relevant. The questions concentrated on how the user perceived the relationship  between the query terms and the term that s/he selected for query expansion. End-users were asked to identify whether the query expansion terms were chosen because they were thought of as being:  (a) synonyms of query terms, (b) related terms, (c) the best alternative to express the subject that they could find in the list, or (d) terms that were representing new ideas.

 

In addition, for each term users were also asked to identify whether there is a correspondence between each new term and the initial query terms, as well as the type of relationship that hold between them, e.g., broader, narrower or related term relationships.

  

2.1 Variables

The variables examined were divided into seven categories which include: user characteristics, request characteristics, subjective user reactions, retrieval effectiveness, user effort, search process characteristics and term selection characteristics.  These were collected and evaluated via questionnaires and logs. The categories are discussed below.

 

User characteristics.  This category covers user specific data which are pertinent to the search process, for example, areas of subject expertise, the level of their work, and the number of previous online searches, either with or without an intermediary.

Request characteristics.  Scaled data were elicited on context and problem characteristics: e.g., subject area of the request, nature of enquiry, e.g., accurate or vague, type of search required, i.e., broad or narrow.  These allowed to relate such issues as specifiability of problem, clarity of problem, work done on problem, etc. to type of search required and success of search.

Search process characteristics.  The questionnaires on the outcome of the search elicited scaled data on the effectiveness and efficiency of the search. These allowed to make estimates of the degree of success or failure of the interaction, from the point of view of the user and thus to relate to searching behavior.

Subjective user reactions.  This category is mostly concerned with the user's overall reactions to the search, impressions of effort involved, and reaction to search results ‑ not offline prints.  This also included variables from other categories such as how close was the search to the original/intended enquiry; and whether the expected number of references were retrieved.

Term selection characteristics.  This questionnaire elicited information about the query formulation and reformulation of terms that were selected by the users.

 

            The questions concentrated on how the user perceived the relationship between the query terms and the term that s/he selected for query expansion. They were asked to identify whether the query expansion terms were chosen because they were thought of as being synonyms of query terms, related terms, the best alternative to express the subject that they could find in the list, or representing new ideas.

 

 

2.2 Data Collection Instruments

 The data collection instruments used in the study included questionnaires and transaction logs. A summary of the instruments is given below.

 

The data collection instruments used in the study can be divided into three categories which include questionnaires, transaction logs and offline prints. A summary of the instruments is given below and the discussion that follows is presented by category.

 

• Pre‑search questionnaire.

            Background information and context of information request.  Users' personal data, assessment of subject enquiry, user's own online experience, etc.

• Search process log.

             Search summary and notes, number of terms, online time, etc.

• Query expansion questionnaire.

            (a) Term identification and selection for query expansion from ranked list, and

            (b) Term relationships 

• End of search questionnaire.

            User's satisfaction, impression of search, assessment of query and results, and of the number of references expected.

• Evaluation of offline prints.

            Relevance judgements on a 3 point scale and subdivisions.

• Final questionnaire.

            Final assessment of search as a whole and user remarks.

• Other logs.

            Logs of the interaction between searcher-CIRT, CIRT-vendor/host, searcher-ESA.

 

 

2.3 Procedure for data collection

 The methodology is summarized and presented below in a step-by-step manner. The summary is then followed by a discussion of the steps which describe the rules and guidelines that were employed as well as how these worked in practice.

 

Steps:

• pre-search interview, user completes questionnaire (questions 1-11)

• select initial query terms (search notes)

• search INSPEC in Data-Star using CIRT:

            - online relevance judgements

• parse records of all positive relevance judgements and extract from the descriptor field candidate terms for query expansion.

• weight and rank all candidate terms (identified in the previous step) for query expansion

• print (and display on the screen) the ranked list of terms

• user evaluates the ranked list:

            - user identifies all terms thought to be useful for the search; - user selects query expansion terms; - user completes questionnaire on term selection (quest 12-13)

• continue the search using CIRT:

          - add query expansion terms; - search; - print retrieved documents; - logoff

• user completes questionnaire  assessing the search process before seeing and evaluating the search results (questions 14-18)

• offline evaluation

 

The steps presented above together with a set of procedural guidelines helped in providing a consistent method for data collection.

 

 

2.4 User selection of terms for query expansion

The records that were identified as relevant by the user during the online relevance feedback interaction were parsed and the terms were weighted. The terms were presented in the form of ranked lists and were displayed on the screen as well as printed on paper.  The users were asked to go through the list of terms twice:

 

(a) to identify ALL the terms that they considered as being good for the purpose of the search;

(b1) to select the 5 best terms of those identified as good in step (a);

(b2) to rank the 5 terms they selected in descending order of importance to them.

 

The evaluation of the ranked list and the selection of the query terms was followed by the completion of the questionnaire on term selection.

 

 

3. Results and Discussion

 The results discussed here are based on the 25 searches finally obtained.  The analysis was essentially directed at the query expansion aspects of the searches, on the ranking algorithm and on retrieval effectiveness.  The discussion throughout is mostly qualitative in nature. The data have also been subjected to appropriate statistical analysis using various tests.  However, because the sample is small there are occasions where the results are presented only with the intention to demonstrate some trends and to facilitate the discussion and there are no claims of any statistical significance.

 

 

3.1 Term selection characteristics

The users, after the evaluation of the ranked list completed the questionnaire on term selection. The variables involved and the procedure for collecting the data have been described above.

 

The results are presented in two stages. First the answers to the questionnaire are analyzed and discussed and then the user preferences of query expansion terms are compared to the system's suggestions.

 

 

3.2 User selection of terms for query expansion

At first users identified all the terms in the ranked list that they thought as being useful for the purpose of the search, i.e. they selected terms that would be acceptable to use in the search. The results of this part of the selection process are given in Table 1 which presents the total number of terms in each list and the number of terms selected by the users.

 

The percentage of terms chosen by the users from the ranked lists ranged from a minimum of 4% to a maximum of 87%  with a mean of 28%.  This implies that on average about one third of the terms given in the lists is thought to be useful. This is approximately an average of 18 terms. If however "search #128" is excluded as being an extreme case, because 98 terms were selected out of 113, then the average drops to 15 terms.

 

                                                                                                          

                           Terms     Terms                                Terms    Terms

                  User    in list    chosen   %                  User    in list   chosen   %  

                   101        62        10    16                       118        55        32    58    

                   102        38         9    24                        119       117        34    29    

                   103       137        11     8                       120        44         9    20     

                   105        77        14    18                       121        61        17    28    

                   108        33         8    24                        122        60        13    22    

                   110        61        10    16                       123        61        12    20    

                   111        48        31    65                       124        80        27    34    

                   112        93         6     6                         125        41        21    51    

                   113        65        24    37                       126        39         7    18    

                   114        62        15    24                       127        62        11    18    

                   115        42         9    21                        128       113        98    87    

                   116        77         3     4                         129        34         8    24   

                   117        64        25    39                                    

 

Table 1: Totals and percentages of terms in the ranked-lists and of terms chosen by the subjects.

 

 

The questionnaire on query expansion asked the users two questions, each corresponding to some aspect of the term selection process they had followed.  The users were asked for the reason(s) that made them chose those terms. How did they think the selected terms related to the search and to the original query terms. They were asked to consider such relationship(s)  for all terms collectively rather than for each term individually. Users were given four options to choose from and they could select as many as they thought appropriate. The options were:  (a) variant expressions or synonyms; (b) alternative (related) terms; (c) couldn't find better term(s) to express the subject of the enquiry; and (d) representing new ideas (i.e. not part of your original request).

 

 

                        64%     variant expressions or synonyms

                        88%     alternative (related) terms

                         4%      couldn't find better term(s) to express the subject of the enquiry

                        44%     representing new ideas (i.e. not part of your original request)

 

                             Table 2: Association of terms identified from the rank list to query.

 

 

Table 2 summarizes the results of the user perceived association between the terms identified from the ranked list to the query. The percentages given in Table 2 do not amount to a total of 100% because users were asked to select as many of the four options as applied to the terms. Users thought of the terms they selected as being alternative (related) terms to the query terms for 88% and as variant expressions or synonyms for 64% of the time. These two categories account for the majority of the responses. A very small percentage (4%) chose the terms because they could not find a better term from those on the list to express the subject of the query. A rather interesting finding comes from terms that do not relate directly to the original query terms and which represent new ideas. These accounted for 44% of the responses.

 

This finding demonstrates the unpredictability that is involved in subject searching and the difficulties imposed on information retrieval. Additional information was not collected for this category.  In retrospect I think some questions could have been included to elicit information about the terms that represent new ideas. Further research is therefore needed into this area. More specifically about the relationships of the `new ideas' to the original query. What was the reason for choosing these terms? Was the user aware about these new concepts/ideas at the beginning of the search? If yes, why were these not expressed at the pre-search interview? Was the reason for the exclusion interview related?, e.g. communication failure, or did the user chose to exclude them at the interview stage because s/he thought of them as peripheral?  If users had not thought of these concepts at an earlier stage, did they knew about them before? Did they recognize and chose these concepts as the result of a learning process during the search?  On the whole, what were the reason(s) and stimuli that made them choose these new terms. Answers to these questions I believe will contribute to the understanding of the users' searching behavior.

 

The second question asked the users how they perceived the relation of the five best terms to the original query terms.  As mentioned earlier users were asked to select the 5 best terms of all those they had identified as useful. The questions on term relationship concentrated on whether there is a correspondence between each of the 5 new terms and the query terms. If the user identified that there was a correspondence between  them the type of correspondence was noted. The term relationship that users were asked to select from are among the standard types of relationships found in thesauri, i.e. broader term and narrower term for hierarchical relationships and related term for affinitive/associative relationships.

 

The relationships of the 5 best terms to the query terms are shown in Table 3.  For 34% of the chosen terms there is no relationship or other type of correspondence to a query term. The remaining 66% of the terms is divided as follows. A narrower term relationship between a selected term and a query term accounts for 70% of the responses. A broader term relationship accounts for 5%. An associative relationship (i.e. related term) holds for 25% of the terms.

 

 

                        34% No relationship

                        66% Some relationship:            5% Broader

                                                                        70% Narrower

                                                                        25% Related

 

            Table 3: Relationship of the user selected 5 best terms for query expansion to the initial query terms.

 

 

From these results it can be established that approximately 75% of the term associations fall within a hierarchical relationship. Users have overwhelmingly selected narrower terms as the terms for query expansion. This finding is very important and emphasizes the possible advantages that may be involved if an online thesaurus is used. Such a thesaurus could assist users in looking-up terms, establishing their relationships and deciding on term inclusion, as well as for exploiting hierarchical relationships. A possible way to include terms could be in clusters or in  hierarchies. For example in a fashion  similar to that of the `explode' command in Medline, where an `exploded' term retrieves itself as well as all the terms in the hierarchy beneath it. However, all these should be user-controlled or `machine-aided' operations rather than entirely automated. This suggestion comes from the poor results achieved from automatic query expansion in earlier information retrieval experiments. 

 

 Concluding remarks

The study has provided useful information about interactive query expansion and relevance feedback through a front-end system to online databases. The aim was to look at the process of interactive query expansion when searching in an operational situation with real users and see what can be learnt from it.

 

The methodology provided consistency during searching and was proven to be effective. There were not any surprises during the data collection and ambiguities were resolved by following the guidelines set by the methodology or as the occasion demanded.

 

A pattern that emerges from the user selection of terms for query expansion suggests that about one third of the terms from a list of candidate terms are potentially useful. This finding may have implications in the design of a query expansion module for the selection of terms. Such module can be applied to both types of query expansion, interactive or automatic.  The suggested cutoff level of approximately 20 terms for the display of terms for query expansion may not be representative of the required number of useful terms and therefore the option of using one third of the terms may be more appropriate.  Since the one third of the total number of terms in some systems may be a quite large number of terms a browsing option can facilitate the identification and selection of terms.

 

Users identified  mostly as `related terms' or `synonymous terms' the relationship between the terms they chose from the list and the initial query terms. A breakdown of the relationship of the 5 best terms chosen by the users to the query terms reveals that the hierarchical relationship predominates. Query expansion terms were mainly narrower terms to the corresponding query terms.

 

A number of questions for future research emerge from this study.  Such questions include:

* how do end-users select terms to search  with? (i.e. initial query terms),

* which are the difficulties end-users encounter during the term selection process (this includes both the initial query formulation and the query expansion)?

* what is the nature of such difficulties?, are these difficulties  technical or conceptual?

* how could these difficulties be analyzed?

* what is the correspondence of the user-identified term relationships to the relationships found for these terms in the INSPEC thesaurus.

 

Answers to the above questions would facilitate better understanding of the problems that end-users face during searching.  Such analysis would result in suggestions and recommendations for improving the design of retrieval systems in order to facilitate the display of knowledge structures, such as thesauri, as well as their use as tools for navigation, query formulation and query expansion.

 

References

Efthimiadis, E.N. (1992) Interactive Query Expansion and Relevance Feedback for Document Retrieval Systems. Ph.D. thesis, City University, London, U.K., July 1992.

 

Efthimiadis, E.N. (1993) A user-centred evaluation of ranking algorithms for interactive query expansion. In Proceedings of the 16th Annual International ACM SIGIR, Pittsburgh, PA, June 27 - July 1, 1993. Korfhage, R. Rasmussen, E. and Willett, P., eds. pp146-159.

 

Efthimiadis, E.N. (1994) "Interactive query expansion: a user-based evaluation in a

relevance feedback environment". Submitted for publication.

 

Robertson, S.E. and Thompson, C.L. (1990) Weighted searching: the CIRT experiment. In: Informatics 10: prospects for intelligent retrieval. University of York, 21‑23 March 1989. London: ASLIB, 1990, 153‑166.



    [1] The terms interactive  query expansion and semi-automatic query expansion are used interchangeably in the text.