Ordering Author and Work Records:

  An Evaluation of Collocation in Online Catalog Displays

 

 

 

 

[Published in Journal of the American Society for Information Science, 47, 7 (July 1996): 538-554.]

 

 

 

 

 

 

 

 

 

 

 

 

 

October 31, 1995

 

 

 

 

Allyson Carlyle

 

School of Library and Information Science

Kent State University

P.O. Box 5190

Kent, OH  44240 

E-mail:  acarlyle@kentvm.kent.edu

 

 

 

216/678-3528 (Home)

216/672-2782 (Work)

216/672-7965 (Fax)

 

ABSTRACT

            To investigate the extent to which online catalogs arrange together, or collocate, records representing particular authors and works, a survey compared the displays resulting from five author and five work queries in eighteen online catalogs.  Dependent variables to measure collocation included the number of times irrelevant records were interfiled among relevant records.  Searches for worst-case authors and works associated with large retrieval sets, including "Homer" and "Paradise Lost," revealed the effects of Boolean versus string matching, query type, and catalog size on the collocation of relevant records.  Results of the survey showed that string matching collocated relevant records more successfully than Boolean matching, that author records were collocated more successfully than work records, and, surprisingly, that catalog size had only a small effect on collocation. 

 

Introduction

            Author and work searching are frequently considered to be non-problematic aspects of online catalog use.  However, this is not necessarily true.  Consider the following scenario.  A user of UCLA's online catalog ORION is interested in checking out a textual edition of the Bible.  This user selects ORION's Boolean keyword "find title" search using the term "bible" as a query term.  A title search on "bible" retrieves approximately 18,000 bibliographic records in ORION, including a record for the book Animals of the Bible by Isaac Asimov, a sound recording of Genesis read by Judith Anderson, and a work called Woman's Bible for Survival in a Violent Society by Thomas P. McGurn.  These records are scattered among those for various textual editions of the Bible.  The first record for a textual edition of the Bible appears only after approximately 150 other records have been displayed; in fact, a user would have to look through well over 1,000 records to find the first significant grouping of such editions.  The ability of even an experienced catalog user to find items relevant to a query is compromised by large retrieval sets where many records are retrieved and irrelevant records are scattered intermittently among relevant ones.  Clearly, author and work searching are, at times, quite as problematic as subject searching.

            Displays that alleviate the large retrieval set problem by facilitating the identification and use of relevant records are particularly crucial in information retrieval (IR) systems such as online catalogs, which serve users whose understanding of the search process is limited or whose information needs are not well-defined.  Research suggests that the large retrieval set problem is prevalent in online catalogs and that long displays confuse and discourage users (for example, Wiberley, Daugherty, & Danowski, 1990).   As illustrated in the Bible example, one of the reasons that large retrieval sets may be confusing is that irrelevant records may be scattered intermittently among relevant ones.  An international cataloging standard addressing this problem requires that catalogs arrange and display together (collocate) records representing the same author, work, or subject.  Displays meeting this requirement have been mandated because catalogers believe they help make the organization and content of retrieved record sets clear and thus provide users with a means of surveying the range of records retrieved in a large set quickly and efficiently.

            This paper examines variables affecting the collocation of records for particularly problematic authors and works retrieved in large record sets in online catalogs.  The records studied represent "worst-case" authors and works, that is, authors and works associated with a large number of relevant records.

            Worst cases were used because those cases retrieving large numbers of records, both relevant and irrelevant, are more likely to be susceptible to arrangement problems in display and, as a consequence, illustrate them more clearly than searches retrieving few records.  Because the purpose of research such as this is to improve online systems, it makes sense to direct our attention toward the most problematic aspects of those systems.  Worst-case queries are also more likely to be performed in online catalogs than other types of queries.  Authors and works that appear in many editions do so precisely because publishing companies perceive a demand for them.  Research by Nelson (1988) discovered a correlation between the number of times a term was indexed and the number of times it was used.  Solomon (1993) found that children use a small number of terms frequently; 100 terms accounted for over 50 percent of the 1,210 terms children used in his study of online catalog use. 

            Authors and works were selected as query types because they have a distinct advantage over subjects in that the relevance of a particular record to a query may be posited with greater certainty.  This control over the relevance problem in retrieval frees the research somewhat from the problems engendered by user relevance judgments.

 

BACKGROUND

            Online catalogs have frequently been criticized as being confusing and hard to use.  One of the reasons for this may be that users are confronted with features that vary from catalog to catalog.  Borgman identified interface complexity as one of the reasons that online catalogs are hard to use (1986, p. 393).  Research on online catalog use supports this assertion.  Dalrymple compared user experiences in online and card catalogs and found that card catalog users were more satisfied with their search results and reformulated their searches less frequently than online catalog users (1990).  The 1983 Council on Library Resources (CLR) online catalog study found that user problems centered around "search formulation control and output (display) control."  (Matthews, Lawrence, & Ferguson, 1983, p. 123).  In focus group research by Johnson and Connaway, one participant complained, "I hate the many different ways there are to search for an author" (1992, p. 11).

            Long displays resulting from large retrieval sets may be one of the major contributors to user confusion and frustration.  Users surveyed in the CLR study ranked "scanning through a long display" fifth out of 27 reported problems and "understanding a display of multiple items" 24th (Matthews, Lawrence, & Ferguson, 1983, p. 124).  A transaction log analysis by Wiberley, Daugherty, and Danowski showed that user persistence dropped off sharply when the number of retrieved records exceeded 200; many users did not look beyond the first 30-35 records (1995, pp. 262-263).  Dwyer, Gossen, and Martin studied interlibrary loan requests for items actually held by the requesting library.  They attributed six percent of failed searches to what they called the "big-hit list" problem; that is, the title sought was one that was displayed in incomplete form and was displayed in a long list of similar-looking titles (1991, p. 232).  "Too many matches" was cited as a reason for not using online catalogs in a catalog use study by Pease and Gouke (1982, p.  288).

            The understandability of large retrieval sets may be compromised because records are sometimes arranged in random (internal record number) order.  Random arrangements are common in response to Boolean searches.  Because Boolean searches often allow matches across multiple fields, no single field, for example, an author field or a title field, stands out as a logical element by which to arrange records.  Research suggests that record arrangements do have an effect on use.  Abels (1993) found that users scanned random record arrangements more slowly than records arranged by author name, date, or subject heading.  Purgailis Parker and Johnson (1990) investigated the effect of ordering and size of retrieved record sets on users' relevance judgments.  They found that record order had no effect when record sets were composed of fewer than fifteen records, but their results were inconclusive when sets exceeded fifteen records.

            Attig (1989), Duke (1989), and Delsey (1989) identified ways in which computer technology has obstructed collocation in online catalogs.  Although collocation of relevant records is a standard underlying the construction of both card and online catalogs, no research has ever been undertaken to investigate to what extent catalogs, card or online, actually achieve it.  One of the reasons for the scarcity of this type of research may be the difficulty in defining terms.  Carpenter (1981) discussed the history of the term "author" in cataloging codes and illustrated many of the difficulties inherent in defining it.  "Work" has never been defined in  a code of American cataloging rules, although individuals have attempted to define it conceptually outside the codes (Lubetzky, 1969; Svenonius, 1988; Wilson, 1989; O'Neill & Vizine-Goetz, 1989; Smiraglia, 1992, pp. 8-9; Yee, 1995).  Another reason for the lack of research on collocation may be the difficulty in developing a means to measure it.  Collocation has various aspects, making the use of a single measure less than satisfactory. 

 

SOLUTIONS TO DISPLAY PROBLEMS

            Two major design principles have been used in library catalogs to increase the comprehensibility of retrieval sets.  Record ranking based on relevance to a query has been used in IR systems with the intention of showing users first those records most likely to be relevant to their queries.  One online catalog using this approach is OKAPI, developed at the Polytechnic of Central London (Walker & de Vere, 1990).  Record arrangements based on relevance ranking may be particularly useful for specific, well-defined subject queries because they attempt to predict relevance based on the relationship between subject words in a query and subject words in a record and thus produce meaningful record orderings.  The purpose of ranked displays is to save the time of users by displaying first the records that best match their searches.  Queries for authors and works, however, do not lend themselves so easily to record ranking.  On what basis is a retrieval system to judge which of several identical author names or titles is the "most" relevant to a particular query?  Although it makes sense to base retrieval  on the probability of a record's relevance to a query for an author or a work, it is questionable that ranked orderings  would be particularly helpful; it is more questionable still that such orderings would be comprehensible to users. 

            The collocation standard, mentioned above, was developed to increase the comprehensibility of retrieval sets.  It stipulates that relevant author, work, and subject records be arranged and displayed together, one after another and without interruption by irrelevant records (Cutter, 1904, p. 12; Lubetzky, 1960, p. ix; International Federation of Library Associations, 1971, p. xiii).  A rationale underlying this standard, in which groupings of relevant records are arranged with other groupings in alphabetical order is that it is not possible to predict which of any particular group may be the most useful.  Displays that collocate groups of relevant records may be useful to users whose information needs represent broad topics or whose needs are not well-defined, or to users whose queries are poorly or incompletely formulated.  As Mann so aptly puts it in his discussion of the advantages of The Library of Congress Subject Headings, "...we can sharpen questions that are unfocused or fuzzy to begin with--a frequent starting point for readers--by examining the array of precoordinated subdivisions under a heading that spell out and distinguish its various aspects in ways that we couldn't think of in advance; that is, the system will clarify our range of options for us, thereby enabling users to ask better questions in the first place [italics in original]" (1993, p. 123).  The purpose of collocation in display is thus somewhat different from that of displays based on record ranking in that the aim of collocated displays is to provide users with an overview, or picture, of the entire content of a retrieved record set.  This concept has also appeared, although in somewhat different form, in IR system research.  Korfhage (1991) articulated the importance of presenting users with an overview in IR systems and proposed a model shifting the emphasis in IR system design from retrieval to display. 

 

RESEARCH DESIGN

            This research investigated the effect of catalog variables on collocation by performing author and title queries on five worst case authors and works in eighteen different online catalogs.  The following questions were asked:  What is the effect of the type of match, Boolean versus character-string match, on collocation of particular worst-case author and work records in current online catalogs?  What is the effect of type of query, author versus work, on collocation of relevant worst-case author and work records?  What is the effect of catalog size on collocation of particular worst-case author and work records?  To address these questions, "work" and "author" needed to be defined and measures of collocation developed. 

 

Operational Definitions

            The MARC record communications format (MARC Formats for Bibliographic Data, 1980) determines the structure of bibliographic records in online catalogs and was used to operationalize the concepts "author" and "work."  "Author" was operationally defined as the set of records containing the same author name in a MARC personal author field (personal author fields include 100, 400, 600, 700, or 800 MARC fields).  For example, a record set representing the author William James was comprised of records having in an author field the field contents:  [subfield a] James, William, [subfield d] 1842-1910.  Other author subfields, if present, were disregarded.   

            "Work" was somewhat more difficult to operationalize because of the ambiguous nature of the term.  What records should be included in the work record set?  For instance, should the Pop-Up Christmas Carol, a children's pop-up book version of Dickens' A Christmas Carol  be considered to be an edition of the Dickens work?  The extent to which the items in a work set must resemble each other is debatable and as a result "work" has been defined by cataloging theorists in different ways.  Furthermore, a work is usually identified by both an author name and a title and while the Anglo-American Cataloguing Rules (AACR2) requires author names to be normalized, normalization of titles is optional (AACR2, p. 484).  For example, one edition of John Milton's Paradise Lost may be entitled "Paradise Lost" another "Milton's Paradise Lost" and yet another "Paradis perdu," but AACR2 does not require the normalized or uniform title "Paradise Lost" to be present in every bibliographic record that represents an edition of that work. 

            In this research, "work" was defined in two different ways to accommodate two competing perspectives.  First, "work" was defined narrowly to represent a set of records that share the same primary author and title MARC field contents (work record set).  For example, a record representing a textual edition of Paradise Lost by John Milton published by Norton Publishers and one representing a textual edition published by Rineholt Publishers, which share the same primary author and title MARC field contents, were considered to be members of the same work record set.  A primary author field was defined as a 100 MARC field and a primary title field was defined as a 240 or a 245 MARC field.  

            Second, "work" was defined broadly to represent a set of records that may not share both primary author and title fields, but may still be relevant to a query for a work (superwork record set).  A superwork record set contains the records of a work record set and, in addition, the set of records related to a work in that they contain the same author and title field contents, but in secondary author and title fields as opposed to primary author and title fields.  So, for example, the record for a vocal score for a symphonic poem based on Paradise Lost by Marco Enrico Bossi was considered to be a member of the related work record set for Paradise Lost because it contains the author name "John Milton" and the title "Paradise Lost" in a related work field.  Secondary author/title fields included 600 and 700 fields; a secondary title field was the 740 MARC field.  For further discussion of the operational definitions, see Carlyle (1994, Chap. 2). 

 

Dependent Variables

            Collocation, which has to do with the position of the members of a set of relevant records in a display of a retrieved record set, can be measured in several different ways.  Consider that a set of fifty Paradise Lost records may be preceded in a display by ten unrelated or irrelevant records, followed by five irrelevant records, or interrupted six times by the display of seventeen irrelevant records.  Four measures were used in this research:

     interruption number:  the number of times the display of relevant records was interrupted by irrelevant records, including an interruption preceding the display of the first relevant record and one following the display of the last relevant record,

     intervening records:  the number of irrelevant records that were interspersed among the relevant record set; that is, the number of irrelevant records following the first relevant record and preceding the last relevant record,

     average interruption size:  the average size of the interruptions; that is, the ratio of the total number of irrelevant retrieved to the total number of interruptions, and

     precision:  the ratio of relevant records to records retrieved.

An example is given in Figure 1.

 

[Fig. 1 about here]

 

            Interruption number  and intervening records indicate the extent to which relevant records are scattered in display.  McGarry and Svenonius (1991) first used interruption number as a measure in their study of displays of multiple subject headings.  A user confronted with a display of records that is often interrupted by irrelevant records may be puzzled and leave a search before finding a record of interest.  The research by Wiberley, Daugherty, and Danowsky indicates that a large interruption preceding the first relevant record displayed might mean search failure for users even when relevant records were retrieved because they wouldn't look beyond the first screen when many records were retrieved.  Intervening records indicates how many irrelevant records a user must pass over in order to see the entire set of relevant records.  Again, a user may balk at the irrelevant records and leave a search before finding records of interest or make the assumption that the retrieved record set does not contain a particular item of interest.

            On occasion, displays of irrelevant records continue for one or more screens, depending on the size of the retrieved record set.  As mentioned above, user persistence decreases as retrieval size increases, and long displays of irrelevant records may lower persistence even more.  Average interruption size, which measures the average size of irrelevant record interruptions, is thus a vital measure because it gives an indication of when and how often a particular retrieval set might cause a user to give up.

            Precision is normally regarded as a measure of retrieval effectiveness and not display effectiveness.  However, it was used in this research because, like a snapshot, it gives a quick indication or picture of the character of the retrieved record set.  One sees from it how much noise a user must go through to see all the relevant records.  The other collocation measures by themselves do not give the same kind of overall picture of the retrieval set--one of the reasons precision has been consistently used in the evaluation of IR systems.  Precision is included also because it may be used to provide a perspective from which to view the other, more "true" measures of collocation.  For example, when precision is very high, one would assume that the number of interruptions would be low.  However, if one found many interruptions with high precision, one would be able to state more assuredly that the interruptions were indeed related to the variable of interest and not simply a by-product of a retrieval set in which a high percentage of the records retrieved were relevant. 

 

Independent Variables

            Three independent variables were tested for their effect on collocation:  a) match type, that is, Boolean "and" matching versus string matching; b) query type, that is, author versus work versus superwork; and c) catalog size.  Boolean and string matching often result in displays that order records in different ways (Fig. 2).  Boolean "and" matches separate query terms, look for them independently, and often, so long as they appear in the same record, retrieve records that have one query term in one field and one in another.  String matches keep query terms together and retrieve records only if query terms occur at the beginning of a field.  Boolean searching in particular has been cited as obstructing collocation in online catalog displays because search results are often arranged in online catalogs randomly.  Another reason for studying Boolean versus string matching is that, at present, these two types of matching are the predominant types of matching available in online catalogs, particularly those available in the United States.

 

[Fig. 2 about here.]

 

            The second variable investigated was query type:  author, work, or superwork.  As discussed above, cataloging rules treat these entities differently.  Normalization of author names is required while normalization of work titles is not.  Author record sets are defined by the contents of a single type of field, author fields, while work record sets are defined by the contents of two types of fields, author fields and title fields.  The presence of a second type of field may add a level of complexity to record arrangement that makes it difficult for catalogs to collocate these records.  These issues have to do with record structure, that is, the MARC fields in which relevant record content (author names and titles) is contained and the record content itself.  In effect, the investigation of query type is also an investigation of record structure.   

            The last variable studied was catalog size.  The effect of catalog size was evaluated by selecting equal numbers of large, medium, and small catalogs.  A large catalog was defined as having over 1,000,000 records, a medium catalog between 300,000 and 1,000,000 records, and a small catalog fewer than 300,000 records.  One of the reasons for studying catalog size was the frequent observation that catalog size is, and should be, a determiner of catalog design and cataloging rules (see, for example, Kilgour, 1979, pp. 34-35).  One might easily assume that the larger the catalog, the more difficult it is to achieve good collocation; however, no research has been conducted to test this assumption, nor has anyone looked at the impact of catalog size on worst cases. 

 

Worst-Case Method

            A random sample method is commonly used to assess overall performance of a system.  However, the very nature of an overall assessment is that it gives all aspects of a system equal attention.  Thus, those areas responsible for system breakdown would be given as much, or as little, attention as all other areas.  A worst-case method, on the other hand, focuses attention on system weaknesses, which in turn elicits information about those aspects of a system that most need improvement.  Since collocation has been identified as an aspect of online catalogs needing improvement, a worst-case method was selected.  Worst-case methods have also been used to evaluate the performance of various types of systems in engineering and other applied sciences (see, for example, Nassif, Strojwas, & Director, 1986). 

            One of the difficulties with using a worst-case method in this research was that very little was known about what constitutes a "worst case" for collocation in the online catalog environment.  Some types of queries have been identified as problematic for retrieval, such as queries that retrieve zero hits or queries that retrieve large retrieval sets, but virtually nothing has been said or is known about queries that are problematic for display.  Because large retrieval sets may also be assumed to pose problems for collocation, attributes of queries known to contribute to large retrieval sets, i.e., queries consisting of a few words that occur frequently in a database or a few words that are homonyms, were used to identify worst cases for this research.  However, what is it about a query that makes it a worst case in terms of collocation?  Relevant record sets retrieved in large retrieval sets may or may not be collocated.  It was hypothesized that, in addition to having the attributes that contribute to large retrieval sets, worst cases record sets were composed of records that exhibited a wide variety of record structures.  A preliminary taxonomy of record structures was developed to help identify candidate worst-case queries (see Carlyle, 1994, Appendix 2).   

            Actual selection of worst-case queries proceeded as follows.  Candidates for worst-case author and work queries were identified based on a review of examples in AACR2, my own acquaintance with problematic cases, and suggestions from practicing librarians.  Candidate worst-case queries were then searched in UCLA's ORION.  A worst-case query was first defined as being composed of individual words or names that retrieved 1,000 or more hits in an ORION search.  Queries that retrieved extremely large sets (for example, 10,000 records) were not selected.  Queries that met the 1,000 record criterion were then pre-searched in several online catalogs.  The final list of queries used was determined by selecting those queries for the research that exhibited a wide variety of record structures and exhibited poor collocation.  Worst case queries selected included:

 

                Authors                                                 Works

                Homer                                                                    Charles Dickens. A Christmas Carol.

                William James                                                       James Joyce.  Ulysses.

                H.D. (Hilda Doolittle)                                           John Milton.  Paradise Lost.

                Alice Walker                                                         Sir Thomas More.  Utopia.

                Peter Gray                                                              William Shakespeare.  Sonnets.

               

            Ideally, one would use a profile of a worst-case query to identify worst cases and then select a random sample of those cases for research such as this.  However, so little was known about the characteristics of worst-case queries that random sampling of this type would have been difficult and costly.  One of the purposes of this research was to collect information about worst cases so that a taxonomy of record structures could be fully developed.

 

Constants  

            In studies of operational systems, unrelated variables may influence the results.  Efforts were made to control for these variables in this research as much as possible.  Database variables unrelated to size were controlled by selecting, when possible, only those databases that had the following characteristics:

1)         over 75 percent of the library collection was contained in the online catalog

2)         the library collection was general in nature

3)         the library collection contained primarily English language materials

4)         the library collection was located in the United States.

System variables unrelated to match type were partially controlled by selecting, when possible, a large, medium, and small sized online catalog designed by the same vendor.  Vendor selection was based on two characteristics:  (1) high numbers of installations in libraries (statistics on vendor installation were available in Bridge (1992)), and (2) availability of catalogs using those vendors via Internet connections.  Record structure variables that might influence record arrangement or a record's membership in an author, work, or superwork set were held constant.  These variables were the conformance of records to specifications mandated in AACR2 and the Library of Congress Subject Cataloging Manual:  Subject Headings (1991).  Conformance to cataloging standards was controlled by dropping records whose non-conformance to standards had an effect either on record ordering or on a record's membership in an author, work, or superwork record set.

 

Data Collection

            In each catalog of the eighteen online catalogs surveyed, the five author names were searched using all the types of author search permitted, and the five works/superworks were searched using all the types of title search permitted.  Data collection parameters are given in Table 1.

 

[Table 1 about here]

 

Author searches were not used to retrieve works/superworks.  For example, in catalogs that offered both a string author search and a Boolean author search, both searches were used.  An attempt was made to make author and title searches across catalogs as comparable as possible, so when Boolean searches were available that allowed limiting to author or title fields, author searches were limited to author fields and title searches were limited to title fields.  If the Boolean "and" was not the default operator, "and" was used in the construction of the queries to standardize the queries across the catalogs surveyed.  Sets retrieved in title searches were analyzed twice, first to discover the extent of collocation of work record sets, and then to discover the extent of collocation of superwork record sets. 

 

Collocation Under Title

            An assumption behind the collocating standard in cataloging history has been that it applies to authors and works only when searching under an author's name; that is, it mandates the collocation of the works of an author and the editions of a work in a display under the name of an author, but not in a display under title.  Thus, in the card catalog, additional entries were not made for uniform titles, which were used for arrangement purposes under author name only.  For example, if one looked in a card catalog under the title Paradise Lost, one would not find a French edition of Milton's work entitled Paradis Perdu.  Rules do exist, however, that allow catalogers to make a reference under the title of a work having many manifestations to look under the name of the author (for example, AACR2  rule 26.6).  Thus a card might be filed under the title Paradise Lost by John Milton that stated:  "For editions of this work, see Milton, John.  Paradise lost."  These practices reflect the limitations of title searching and work collocation in the card environment.  In the online environment, however, retrieving and displaying uniform titles in addition to other titles is easily accomplished.  In fact, uniform titles were retrieved and displayed in title searches in all the online catalogs surveyed for this research.  It makes sense, then, to test the collocating objective for works by searching under title in the online environment.

 

RESULTS

            Data were analyzed using descriptive statistics since neither the selection of worst-case queries nor the sample of online catalogs was random.  As sample sizes were relatively small and standard deviations were large, medians were reported instead of means. 

Match Type Results

            The first independent variable tested was type of match performed, Boolean versus string match.  Match type proved to be a strong determiner of collocation in display, particularly for authors and superworks.  String matching was clearly superior to Boolean matching in collocating author records (Tables 2, 3).  Only ten percent of string matches were interrupted more than twice by irrelevant records.  In addition, when interruptions occurred, they were small.  Boolean matches, on the other hand, were interrupted frequently.  Thirty-nine percent of Boolean author searches were interrupted twelve or more times and the number of intervening records and average interruption size were correspondingly large.  As may have been expected, precision was much higher for string matches (median .77) than for Boolean matches (median .26). 

 

 

[Tables 2 and 3 about here] 

 

 

            The impact of match type on work record sets was much less pronounced than the impact of match type on author record sets, although string matches again outperformed Boolean (Tables 2, 4).  Results for interruption number and precision were surprisingly similar for string and Boolean matches, with string matches providing only slightly better collocation than Boolean.  The results for intervening records and average interruption size, however, indicated string match superiority more clearly.

 

 

[Table 4 about here]

 

 

            As was true of authors, superworks were collocated notably better with string matches than Boolean (Tables 2, 5).  Superwork record sets were interrupted much less often in string matches than in Boolean matches.  String matches had a median of four interruptions, in contrast to Boolean matches, which had a median of fourteen interruptions.  The distributions for intervening records echoed those for interruption number.  Fifty-eight percent of the string matches were interruped by fewer than ten intervening records, while 53 percent of the Boolean matches were interrupted by 30 or more intervening records.  However, in contrast to the statistics for authors, superwork statistics for average interruption size and precision were inconclusive regarding the difference between string and Boolean matching.

 

[Table 5 about here.]

 

Query Type Results

            The second aspect of collocation examined was the effect of query type, author, work, or superwork, on collocation.  Query type had a strong effect on collocation.  Author record sets were interrupted far less frequently and by far fewer numbers of intervening records than work or superwork record sets (Tables 6, 7).  Seventy percent of author record sets, as opposed to 23 and 24 percent of work and superwork record sets, respectively, were interrupted two or fewer times.  Median interruption number shows this difference more dramatically.  Half of the author record sets were not interrupted at all, while work and superwork record sets had a median of 15.5 and 18 intervening records respectively. 

 

[Tables 6 and 7 about here.]

 

            One of the most interesting results of this study was the poor collocation of work record sets, which was illustrated most clearly in average interruption size and precision statistics.  Median average interruption size was largest for works at 9.7 records, while authors had a median of 5.1 records and superworks 3.2.  Precision for works was distressingly low--only three percent of work record sets had over .59 precision and nearly 90 percent were below .4.  Median precision for works was .15, in contrast to .61 and .47 for authors and superworks, respectively. 

 

Catalog Size Results

            The last variable studied was catalog size.  Quite unexpectedly, catalog size had an inconsistent relationship to collocation of relevant record sets.  Catalog size had a negligible effect on arrangement of author records (Tables 8, 9).  The distributions for large, medium, and small catalogs for interruption number were almost identical.  Even the results for the two measures based on simple record counts, intervening records and average interruption size, which one would expect to be sensitive to catalog size, showed a relatively small effect.  In fact, the distributions for intervening records were almost indistinguishable from the distributions for interruption number.  Catalog size also had no discernible influence on precision.  Only the statistics for average interruption number revealed an effect for catalog size, with a median of four records in small, five in medium, and seven in large catalogs, and even this effect was smaller than may have been expected.

 

[Tables 8 and 9 about here] 

 

            Catalog size had a more perceptible impact on collocation of work records than on collocation of author records, although it was not consistent across the measures (Tables 8, 10).  In contrast to the results for authors, the results for works showed the effect of catalog size most when measured by interruption number and intervening records.  The distributions for interruption number demonstrate the effect of catalog size dramatically; 55 percent of work record sets in small, thirteen percent in medium, and two percent in large catalogs were interrupted two or fewer times by irrelevant records. The results for intervening records demonstrated a similarly strong effect.  In contrast, the results for average interruption size demonstrate a smallish and uneven impact on work records.  Median average interruption size was 7.3, 11.9, and 10.9 respectively in small, medium, and large catalogs.  The effect of catalog size on precision was negligible, with median precision almost identical for all three catalog sizes:  .14 for small, .12 for medium, and .17 for large catalogs, and what small differences there were, were contrary to expectations.

 

[Table 10 about here.]

 

            The results for superworks were similar to those for works (Tables 8, 11).  As expected, small catalogs performed notably better than medium or large catalogs when measured by interruption number and intervening records.  However, average interruption size revealed only a small difference.  In catalogs of all sizes, average interruption sizes were small.  Only a query for More's Utopia in a large catalog had an average interruption size over 20 records. The impact of catalog size on precision for superwork record sets was even less noticeable than it was on average interruption size.  Medians for all catalog sizes varied less than .04 precision (.47, .44, and .45 median precision for small, medium, and large catalogs).

 

[Table 11 about here.]

 

 

MATCH TYPE DISCUSSION

            The claim made by various catalogers that Boolean matching is inferior to string matching in collocating related records is supported by the findings of this research.  In particular, string matching provided better collocation for author queries than did Boolean matching.  The reason for this most likely has to do with the number of fields matched.  String matches are by nature limited to a single field and they arrange records by the field matched, ensuring author collocation.  Boolean matching occurs frequently in multiple fields, so no single field or type of field stands out as element of arrangement.  As a result, records are often arranged randomly by internal record number.

            That string matching may be superior to Boolean matching in collocating related records has important ramifications with respect to the types of searches online catalogs provide and how they present them to users.  Because preliminary studies show that users do not use all the searches that are available to them, some researchers are beginning to recommend selection of default searches for authors, works, and subjects (Yee & Layne, in press).  Other types of searches could be used either as backup for zero-hits queries or as advanced options. 

            Online catalogs exist that provide only multiple-field Boolean matching and do not support string matching at all.  The findings of this research suggest that the selection of Boolean multiple-field matching as a default for author and title queries may be ill-advised, particularly for author queries.  Although all of the online catalogs surveyed provided author string matching, not all provided author Boolean matching in addition.  This may be because online catalog designers have assumed the superiority of string matching for author queries.  Boolean multiple-field match displays frequently mask the presence of author terms because they display brief title information instead of author name headings (Fig. 3)  For online catalogs that provide Boolean matching, limiting matches to a single field offers the advantages of string matching in that the field matched may be used as the element determining record arrangement, author name headings may be displayed, and cross references may be easily provided.  Boolean single-field matches may, in fact, be preferable to string matches in that they offer the flexibility to enter an author's name in any order as well as the advantages of string matching.  Such matching was provided for author queries in only four out of thirteen online catalogs surveyed offering Boolean author searches.  Had single-field Boolean matching been available in all the catalogs surveyed, the results of this research for the effect of match type on author queries may have been quite different. 

 

[Fig. 3 about here]

 

 

            For work queries, a default approach that is, as yet, unavailable in most catalogs may be desirable.  Yee and Layne (in press) propose a default work search which allows catalog users to enter both author name and title as a means of identifying a particular work of interest more quickly and efficiently than title searches alone as long as authority records are included in the search.  Recent research by Kilgour (1995) supports this proposal in a study showing that queries composed of both author surname and title words produced significantly smaller retrieval sets than queries composed of title words alone. 

            The results of this study also call into question the emphasis on Boolean searching techniques, sometimes to the exclusion of all other searching techniques, in bibliographic instruction for online retrieval systems (for example, see Reed, 1993).  One reason for this emphasis may be that Boolean searching techniques are more complex than string searching techniques.  However, instruction stressing Boolean strategies to the exclusion of other strategies may lead users to believe that such techniques work well for all types of queries, when, as demonstrated by this research, this may very well not be the case.

            More research must be completed to further our understanding of the impact of match type on catalog displays.  Research similar to the study reported here could determine the effect of match type on collocation of all types of queries, not just worst cases.  Also, future research could investigate the effect of match type on catalog functions other than display, for instance, on the recall of author and work records. It would be useful as well to study match type in an experimental setting in which system and database variables could be more completely controlled.  Finally, it is clear that work must be done to develop methods to improve collocation in both Boolean and string matching environments.

 

 

QUERY TYPE DISCUSSION

            The notable effect of query type indicates that record structure differences had an important impact on collocation.  Author records collocated well, undoubtedly because record structure is simple; few MARC fields are involved and AACR2 requires uniform headings for authors.  Moreover, string matching collocated author records well, and more catalogs offered string author matches than Boolean author matches. 

            Works and superworks may be more difficult to collocate because record structures are more complex.  Works and superworks are identified by two different types of information, author name and title, and only a uniform author name, not a uniform title, is required in AACR2.  In addition, work records may contain varying subtitles that cause records to scatter when arranged alphabetically by title.  Superworks are sometimes identified by two fields, author and title, and sometimes by a single field consisting of an author subfield and a title subfield (a name-title added entry or a work-as-subject added entry). 

            Various strategies are needed to enhance collocation of these records.  Work and superwork collocation could be improved by both changing current record structures and by improving online catalog systems.  For instance, uniform titles could be required by AACR2 in all cases when a new edition of a work is published under a different title.  Online catalog systems could improve collocation if work and superwork records were arranged or grouped using both author name and title fields, instead of relying solely on title fields or arranging records in random order.

            Median precision was low for all query types.  This finding was to be expected because one of the criteria for selection of worst cases was their tendency to retrieve large numbers of records.  However, median superwork precision, .47, was surprisingly higher than median work precision, .15.  Only two work record sets had precision of .8 or better.  It is quite possible that libraries collect relatively few editions of a particular work, but, for worst-case works, they collect many works related to or about that work.  Still, it is notable even though work record sets were composed of relatively few records, they were afflicted with very poor collocation.  Interpreting the statistics for works, one sees that although work record sets were not interrupted often, they were interrupted by very large interruptions.  The effect this could have on users is profound because the few related records retrieved could be so scattered that users would have difficulty identifying even one record relevant to a query. 

            Superwork record sets were interrupted far more frequently and by larger numbers of intervening records than work record sets.  Several factors may explain this.  By definition, superwork record sets contain more records than work record sets, and, as a result, they may be more susceptible to interruption.  Also, some related work records were identified as such because of information they have in secondary-title entries.  Because title commands sometimes arrange records by main-title fields (240 or 245 fields) and not by secondary-title fields, these records may be scattered throughout a retrieved record set.  Online catalogs could improve superwork collocation by using author and title fields of all types as the basis for grouping related records.

            Although collocation of superwork records was somewhat better than work collocation as measured by average interruption size and precision, this finding is not necessarily a reason for optimism as it indicates that related work records are scattered intermittently among work records.  That superworks have more interruptions and smaller average interruption sizes than works means that the large interruptions in work record sets have been filled in by related work records.  For example, videorecordings of Scrooge and The Jetson's Christmas Carol may be interfiled with records for the Dickens text.  Although this research was not intended to test the arrangement of work and related-work records per se, different levels of arrangement may be crucial when a retrieval set is large.  The problems of arranging records in a large work record set, dubbed "the Humphry Clinker problem," have been studied by O'Neill and Vizine-Goetz (1987) and Svenonius (1988).  Solutions to these problems are being studied in an experimental system at the University of Bradford which was designed specifically to collocate work records (Ayres, Nielsen, Ridley, & Torsun, 1995). 

            Future research should investigate the effect of record ordering on catalog use.  Users could be presented with displays incorporating various record orderings, for example, orderings based on format, date of publication, or language, and user understanding and preference for each type of display could be compared.  Different types of orderings may be preferable for different types of queries, and the suggestion has been made that online catalogs offer more than one ordering option (Buckland, Norgard, & Plaunt, 1993).  Future investigation could also compare the effect of the current list-type displays to the effect of graphic displays. 

 

 

CATALOG SIZE DISCUSSION

            Catalog size produced the most surprising results of this study.  Because two of the measures for collocation used in this study, interruption number and intervening records, were simple counts, one would expect them show the effect of catalog size and be larger in large catalogs than in medium or small.  It would also be reasonable to expect that average interruption size and precision would show the influence of catalog size.  That these expectations proved unfounded has serious implications for information retrieval in online catalogs.  The intelligibility of displays even in small databases may be compromised by large retrieval sets, and, thus, even small online catalogs may have to be designed to deal with the problems presented by them.

            The fact that author collocation was little affected by catalog size is perhaps the least surprising given the strong effects of match type.  Because all of the catalogs surveyed provided both string and Boolean matches, it is perhaps to be expected that catalog size had a less powerful influence.  Even in large catalogs, string matches often resulted in perfectly collocated author record sets.  The reason for the unsystematic effect of catalog size on collocation of work and superwork records is less clear.  As mentioned in the discussion above, it is possible that record structure variations associated with query type have a significant impact on the grouping of work and superwork records.  Catalog size had the strongest effect on work collocation.  The implication is that when the influence of other variables is not strong, catalog size will have an impact on collocation. 

            One reason that catalog size had a relatively small impact overall may have to do with the nature of library collections.  Certain authors and works are collected heavily and, thus, worst cases may exist in all general library environments.  The implications of this are twofold.  First, the large retrieval set problem may be more universal than may have been supposed.  When one considers the advent of retrieval in a virtual library environment, in which full-texts and enhanced bibliographic records are included in the retrieval pool, the prospect becomes even more daunting.  On a more optimistic note, if a successful method of securing collocation is designed for small catalogs, then the same method could be successful for larger catalogs.  This certainly seems to be the case for author string matching.  A method of collocating work and superwork records may also be effective regardless of catalog size.

            Assumptions regarding catalog size have influenced the design of cataloging rules.  Cataloging folklore purports smaller catalogs to collect fewer editions of works, and therefore have different needs with respect to the length of a bibliographic record and the use of uniform title.  AACR2 rule 1.0.D provides catalogers, presumably from smaller libraries, the option of creating brief records.  In AACR2 rule 25.1A, one of the reasons not to use uniform titles is based on "the extent to which the catalogue is used for research purposes," an implication being that smaller libraries do not need to use uniform titles.  The results of this research demonstrate that these assumptions may be incorrect with respect to worst cases, and that smaller libraries may be well-advised to use all the techniques available to enhance collocation.

            Further research using a larger worst-case sample is necessary to confirm the observations of this study regarding catalog size.  It would also be interesting to look at the incidence of worst cases in catalogs of various sizes.  The results of this study imply that worst-case displays are comparable across catalogs.  Does this also imply that users would have the same success finding particular worst cases regardless of catalog size?  The 1958 American Library Association card catalog use study (Jackson, p. 15) found that users were more likely to find the known-items they sought in small catalogs than in large catalogs.  This research has not been replicated in the online environment, nor has any research looked at the impact of catalog size on users' ability to find worst-case authors or works. 

 

 

GENERAL DISCUSSION

               First, a word about the dependent variables.  The measures that seemed most revealing of the extent to which relevant record sets were collocated were interruption number and intervening records.  Average interruption size and precision, on the other hand, were useful to contrast to interruption number and intervening records to obtain a more complete picture of collocation.  Although four measures of collocation may seem somewhat unwieldy, one could imagine using even more to portray the various aspects of collocation.  In this research, collocation was measured somewhat imprecisely in that the placement of different types of relevant records with respect to each other was not regarded, that is, all relevant records were treated equally.  An author record set in which works by a particular author and works about that author were scattered would have been treated in this research as a perfectly collocated record set, when, in fact, such scatter could be confusing or irritating to users.  For future research on collocation, other, more finely tuned, dependent variables could be defined to measure collocation.

            The results of this study have important implications for the revision of cataloging rules.  One of the reasons that catalogers have supposed string matching to be superior to Boolean is that string matching mimics retrieval in the manual environment.  The techniques outlined in AACR2 to ensure collocation were created for the manual environment.  Collocation in the manual environment is wholly dependent on record content, that is, on the construction of uniform author and work headings.  In a computer environment offering Boolean and other types of matching and retrieval, uniform headings alone are not sufficient to collocate author and work records.  The constructors of AACR2 and earlier Anglo-American cataloging codes limited themselves to the realm of record content perhaps, in part, because collocation in the manual environment was largely guaranteed based on content alone.  In so doing, however, they have abdicated responsibility for meeting user needs with respect to display.  If AACR2 is to truly support collocation of related author and work records in online catalogs, then it must come to grips with the fact that it can no longer restrict the province of the rules to record content alone and must include rules for arrangement and display as well.  In other words, AACR2 must become a code governing the construction of catalogs, not just catalog records.

            The reluctance of code writers to incorporate rules for arrangement and display may be due to a reluctance to specify exactly how online catalogs must provide collocation.  As online catalog software varies from catalog to catalog, provisions specifying how collocation should occur would be complex.  However, precedents exist for AACR2 to specify an outcome without specifying how that outcome must be accomplished.  For example, many rules mandate the use of cross references, but they do not specify how a particular catalog must create them.  In a similar manner, collocation of author and work records could be mandated without specifying exactly how a particular system must accomplish it.

            One of the most critical areas for future research is an investigation of the effect of collocation on catalog use.  To what extent do poorly collocated record displays prevent users from finding the items they seek?  For at least two centuries catalogers have assumed that collocation of relevant records enhances use.  However, no research has ever been done to test this assumption.  While it seems likely that this assumption is valid, it would still be useful to test it because any improvements in our catalogs should be justified by a clear gain for users.  As collocation is poor in operational catalogs, it would be necessary to develop an experimental system which provides perfectly collocated record displays.  Because of the varieties of record structures associated with work and superwork records, this task is formidable.  Two avenues present themselves for accomplishing it.  First is "cleaning up" or enhancing existing worst-case records, for example, adding uniform titles to all appropriate work records.  This would be costly,  so it would be desirable to investigate the development of automatic collocation or linking procedures using existing records.  

            It seems that little thought, much less creative design work, has gone into the display of multiple records in online catalogs.  Few catalogs have used graphic display software or sophisticated linking techniques to improve multiple-record displays, particularly multiple-record displays of authors and works.  Although much work must be completed before displays that collocate relevant records could actually be implemented in online catalogs, it is not difficult to imagine how such displays might look (Fig. 4).  In a graphic display environment using hierarchical tree structures to represent relevant works or authors, users could simply click on the part of a tree that they are interested in to retrieve other tree structure displays leading to specific records of interest.  Displays incorporating work tree structures have also been suggested by Svenonius (1988, p. 7). 

 

[Fig. 4 about here.]

 

            Further work must be done to define and identify worst cases.  The definition of worst case used in this research was based predominantly on characteristics of retrieval.  It was also somewhat loose by design, as it was a first attempt at such a definition, and the individual cases produced different results (Carlyle, 1994, pp. 120-141).  A more thorough analysis of record structure and arrangement could be used to provide a definition of worst case based on display characteristics as opposed to retrieval characteristics.  Use of a display-based definition might also reveal the existence of various types of worst case, which would, in turn, be useful for the development of automatic collocating procedures.  It would also be useful to identify specific cases that are problematic for users.  Identification of the actual cases that cause problems for users would be invaluable if record enhancement were necessary to improve collocation as it would allow researchers to limit the number of records enhanced.

 

CONCLUSION

            The results of this research show that online catalog displays sometimes scatter records relevant to a query among irrelevant records and that multiple-field Boolean matching in particular contributes to this scatter.  In addition, the findings of this study indicate that poor collocation may be a problem for online catalogs regardless of their size, and that works are collocated less successfully than authors or superworks.  Although collocation is one of the standards governing catalog design, this standard is obviously far from being operative in current online catalogs. The results of this study may help to explain the difficulties reported by catalog users who retrieve long displays.  Displays that do not demonstrate relationships among relevant items retrieved may leave users, at best, disgruntled over the amount of time necessary to find what they are looking for and, at worst, oblivious to the fact that the library actually holds the very item or items they seek.  As always, the question remains--how do we design systems that help users complete their searches easily, efficiently, and successfully? 

            Catalog displays that order records based on their probable relevance to a query offer users whose information needs are well-defined an essential strategy for identifying records relevant to their needs.  However, users whose information needs are less well-defined or whose queries consist of specific authors or works may require a different approach.  Collocation of relevant records lays the foundation for a strategy that may help users identify relevant records by allowing them to see an overview of the records in a large retrieval set.   Efforts by Larson (1991) and McGarry and Svenonius (1991), demonstrate that large subject displays may be reduced by record grouping, compression, and clustering.  Reducing large record sets by grouping collocated record sets may help users identify relevant records by giving them an overview of the records that have been retrieved.  The more rapidly online systems increase in size and complexity, the more urgently we need to solve the problems these systems engender.  The principle of collocation may serve well in the creation of displays that guide users successfully to the information they require.

 

ACKNOWLEDGEMENTS

            I wish to express my thanks and deep appreciation to Elaine Svenonius, chair of my dissertation committee, for her encouragement, guidance, and support.  I would also like to acknowledge the thoughtful comments and advice of Ann Bein, Michčle Cloonan, Milos Ercegovac, Raya Fidel, Julie Gedeon, Sara Shatford Layne, Dorothy McGarry, Dee Andy Michel, Richard E. Rubin, Diana M. Thomas, and Martha M. Yee.  I am also grateful to the many librarians who kindly answered my questions regarding the survey.

 

References

Abels, D. M.  (1993).  Sequencing Items in Multiple-item Displays on Online Public Access Catalogs.  Ph.D. diss. University of California, Los Angeles.

Anglo-American Cataloguing Rules.  (1988).  2nd. ed. revised.  Ottawa:  Canadian Library Association.

Attig, J. C.  (1989).  Descriptive Cataloging Rules and Machine-Readable Record Structures:  Some Directions for Parallel Development.  In Svenonius, E. (Ed.)  The Conceptual Foundations of Descriptive Cataloging  (pp. 135-148).  San Diego:  Academic Press.

Ayres, F. H., Nielsen, L. P. S., Ridley, M. J., & Torsun, I. S.  (1995).  The Bradford Opac:  A New Concept in Bibliographic Control.  British Library R & D Report 6183.  West Yorkshire, British Library Research and Development Dept.

Borgman, C.L.  (Nov. 1986).  Why are Online Catalogs Hard to Use?  Lessons Learned from Information-Retrieval Studies.  Journal of the American Society for Information Science.  37(6), 387-400.

Bridge, F. R.  (April 1, 1992).  Automated System Marketplace 1992.  Library Journal, 58-72.

Buckland, M. K., Norgard, B. A., & Plaunt, C.  (Sept. 1993).  Filing, Filtering, and the First Few Found.  Information Technology and Libraries.  12, 311-319.

Carlyle, A.  (1994).  The Second Objective of the Catalog:  An Evaluation of Collocation in Online Catalog Displays.  Ph.D. diss.  Los Angeles, CA:  University of California, Los Angeles.

Carpenter, M.  (1981). Corporate Authorship:  Its Role in Library Cataloging.  Westport, Conn.:  Greenwood Press.

Cutter, C. A. (1904).  Rules for a Dictionary Catalog.  4th ed., rewritten.  Washington:  Government Printing Office.

Dalrymple, P. W.  (June 1990).  Retrieval by Reformulation in Two Library Catalogs:  Toward a Cognitive Model of Searching Behavior.  Journal of the American Society for Information Science.  41(4), 272-281.

Delsey, T.  (1989).  Standards for Descriptive Cataloging:  Two Perspectives on the Past Twenty Years.  In Svenonius, E. (Ed.)  The Conceptual Foundations of Descriptive Cataloging  (pp. 51-60).  San Diego:  Academic Press.

Duke, J. K.  (1989).  Access and Automation:  The Catalog Record in the Age of Automation.  In Svenonius, E. (Ed.)  The Conceptual Foundations of Descriptive Cataloging  (pp. 117-128).  San Diego:  Academic Press.

Dwyer, C.M., Gossen, E.A., & Martin, L.M.  (1991).  Known-Item Search Failure in an OPAC.  RQ.  31(2), 228-236.

International Federation of Library Associations.  (1971).  Statement of Principles Adopted at the International Conference on Cataloguing Principles, Paris, October, 1961.   Annotated ed. by E. Verona.  London: IFLA Committee on Cataloguing.

Jackson, S. L.  (1958).  Catalog Use Study.  Chicago:  American Library Association, 1958.

Johnson, D. W. & Connaway, L. S.  (1992).  Use of Online Catalogs:  A Report of the Results of Focus Group Interviews.  Typescript.

Kilgour, F. G.  (1979).  Design of Online Catalogs in The Nature and Future of the Catalog:  Proceedings of the ALA's Information Science and Automation Division's 1975 and 1977 Institutes on the Catalog  (pp. 34-41).  Maurice J. Freedman and S. Michael Malinconico, eds.  Phoenix:  Oryx Press.

Kilgour, F. G.  (Mar. 1995).  Effectiveness of Surname-Title-Words Searches by Scholars.  Journal of the American Society for Information Science 46(2), 146-151. 

Korfhage, R. R.  (1991).  To See or Not to See--Is That the Query?  SIGIR '91:  Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval  (pp. 134-141).  New York:  ACM. 

Larson, R.  (1991).  Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog.  Library Quarterly  61(2), 133-173.

Library of Congress.  (1991).  Subject Cataloging Manual:  Subject Headings.    Washington, D.C.:  Library of Congress.

Lubetzky, S.  (1960).  Code of Cataloging Rules:  Author and Title Entries, an Unfinished Draft.  American Library Association.

Lubetzky, S.  (1969).  Principles of Cataloging.  Los Angeles, California:  Institute of Library Research, University of California.

Mann, T.  (1993).  Library Research Models.  New York:  Oxford University Press.

MARC Formats for Bibliographic Data.  (1980 & updates).  Washington:  Automated Systems Office, Library of Congress.

Matthews, J. R., Lawrence, G. S., & Ferguson, D. K.  (1983).  Using Online Catalogs.  New York, NY:  Neal-Schuman.

McGarry, D. & Svenonius, E.  (September 1991).  More on Improved Browsable Displays for Online Subject Access.  Information Technology and Libraries.  10(3), 185-191.

Nassif, S. R., Strojwas, A. J., & Director, S. W.  (January 1986).  A Methodology for Worst-Case Analysis of Integrated Circuits.  IEEE Transactions on Computer-Aided Design.  5(1), 104-113.

Nelson, M.J. (1988).  Correlation of Term Usage and Term Indexing Frequencies.  Information Processing & Management.  24(5), 541-547.

O'Neill, E. T. & Vizine-Goetz, D.  (1989).  Bibliographic Relationships:  Implications for the Function of the Catalog.  In Svenonius, E. (Ed.).  The Conceptual Foundations of Descriptive Cataloging  (pp. 167-179).  San Diego:  Academic Press.

Pease, S. & Gouke, M. N.  (July 1982).  Patterns of Use in an Online Catalog and a Card Catalog.  College & Research Libraries.  43, 279-291.

Purgailis Parker, L. M. & Johnson, R. E.  (October 1990).  Does Order of Presentation Affect Users' Judgment of Documents?  Journal of the American Society for Information Science.  4, 493-494.

Reed, L. L.  (Winter 1993).  Locally Loaded Databases and Undergraduate Bibliographic Instruction.  RQ   33(2), 266-273.

Smiraglia, R. P.  (1992).  Authority Control and the Extent of Derivative Bibliographic Relationships. Ph.D. diss.  Chicago, Ill.:  University of Chicago.

Solomon, P.  (June 1993).  Children's Information Retrieval Behavior:  A Case Analysis of an OPAC.  Journal of the American Society for Information Science.  44(5), 245-264.

Svenonius, E.  (1988).  Clustering Equivalent Bibliographic Records.  Annual Review of OCLC Research, July 1987-June 1988  (pp. 6-8).  Dublin, OH:  OCLC.

Walker, S. & de Vere, R.  (1990).  Improving Subject Retrieval in Online Catalogues:  2.  Relevance Feedback and Query Expansion.  British Library Research Paper 72.  London:  British Library.

Wiberley, S. E., Daugherty, R. A., & Danowski, J. A. (1995).  Displaying Online Catalog Postings:  LUIS.  Library Resources & Technical Services 39(3), 247-264.

Wilson, P.  (1989).  The Second Objective.  In Svenonius, E. (Ed.)  The Conceptual Foundations of Descriptive Cataloging  (pp. 5-16).  San Diego:  Academic Press.

Yee, M. M.  (1995).  What is a Work?  Part 4:  Cataloging Theorists and a Definition. Cataloging & Classification Quarterly. 20(2):  3-24.

Yee, M. M. & Layne, S. S.  (In press).  Online Public Access Catalogs.  Encyclopedia of Library and Information Science.  New York:  Marcel Dekker.


 

 

               

                1.             Paradise lost [not by Milton]  (N)

                2.             Paradise lost / by John Milton  (R)

                3.             Paradise lost, 1667  (R)

                4.             Paradise lost, a concordance  (R)

                5.             Paradise lost, a play in three acts  (R)

                6.             Paradise lost, a poem in twelve books  (R)

                7.             Paradise lost, a poem in written in ten books  (R)

                8.             Paradise lost, a tercentenary tribute ... [proceedings of a conference]  (R)

                9.             Paradise lost and the rise of the American republic  (N)

                10.          Paradise lost, books XI and XII   (R)

                11.          Paradise lost.  Latin  (R)

                12.          Paradise lost or gained, the literature of the Hispanic exile  (N)

                13.          Paradise lost; paintings of English country life and landscape ...  (N)

                14.          Paradise lost, Samson Agonistes, Lycidas  (R)

                15.          Paradise lost.  Spanish  (R)

                16.          Paradise lost : the decline of the auto industrial age (N)

                17.          Paradise lost? : the ecological economics of biodiversity (N)

 

                Interruption no.:  4                                                              R = relevant

                Intervening records:  3                                                      N = not relevant

                Interruption size:  1.5

                Precision:  .65

 

 

Fig. 1.  Dependent Variable Calculation Example:  Display from a Query for Milton's Paradise Lost

 


 

 

 

String Matching Display for "A Christmas Carol" Query

                1.             A Christmas carol / by Charles Dickens

                2.             A Christmas carol  [a musical score by Charles Ives]

                3.             Christmas carol [motion picture]

                4.             Christmas carol : a poem / by Sara Teasdale

                5.             A Christmas carol and other stories / by Charles Dickens

                6.             A Christmas carol.  In prose. [by] Charles Dickens

                7.             A Christmas carol : part 1, to begin with ... / Charles Dickens

                8.             A Christmas carol pop-up book  [based on Dickens' classic story]

 

                Records retrieved in response to string matches are often arranged in alphabetical

                order by title.  In addition, most online catalogs string matches retrieve query

                terms only if they occur at the beginning of a field.

 

Boolean Matching Display for "christmas" and "carol" Query

                1.             A Virgin unspotted (Judea):  a Christmas carol for ...

                2.             A Christmas carol / by Charles Dickens

                3.             The Birds' Christmas carol / by Kate Douglas Wiggin

                4.             A Christmas carol  [a musical score by Charles Ives]  

                5.             Christmas carol [motion picture]  

                6.             A Tale of two cities ; A Christmas carol ; The chimes

                7.             A Christmas carol.  In prose. [by] Charles Dickens

                8.             Christmas at the Cratchits : being an excerpt from "A Christmas carol"

                9.             The carol album : seven centuries of Christmas music 

                10.          The Jetson's Christmas carol

                11.          A Christmas carol and other stories / Charles Dickens

                12.          Christmas carol : a poem / by Sara Teasdale

                13.          A Christmas carol pop-up book   [based on Dickens' classic story]

                14.          Scrooge [a musical based on A Christmas carol by Charles Dickens]

                15.          Dickens's a Christmas carol

                16.          The first Canadian Christmas carol

 

                Records retrieved in response to Boolean matches are often arranged essentially

                randomly by internal record number.

 

 

 

Fig. 2.  String and Boolean Matching Displays from a Query for A Christmas Carol


 

 

Boolean Multiple Field Matching Display for William James Query

                1.             Literary theory and structure: essays, ......................

                                New Haven, Yale University Press,

                2.             The grave, a poem. (1743) .........................................      Blair, Robert

                                Los Angeles, William Andrews Clark Memorial..

                3.             Irish literary portraits; W. B. Yeats, ..........................        Rodgers, William Robert

                                New York, Taplinger

                4.             The flamboyant judge, James D. Hamlin, .....................                Hamlin, James D.

                                Canyon, Tex. : Palo Duro Press,

                5.             The history of Western education, ..............................    Boyd, William

                                New York, Barnes & Noble,10th ed.

 

                Records retrieved in response to Boolean multiple-field matches are often

                arranged essentially randomly by internal record number.

 

Boolean Single Field Matching Display for William James Query

                1.             Adams, William James, 1947-  ................................       1 entry

                                Durant, William James, 1885- 

                2.                [search under the name]  Durant, Will, 1885- ........    12 entries

                3.             Gibson, James William.  ............................................      1 entry

                4.             James, William .........................................................          2 entries

                5.             James, William, 1842-1910 .....................................        30 entries

                7.             James, William E.  ....................................................         1 entry

                                James, William Roderick, 1892-1942

                8.               [search under the name] James, Will, 1892-1942 ... 4 entries

                9.             Reid, William James ..................................................       1 entry

                                 

                Boolean matches limited to single fields may be arranged by author name and

                make use of cross references.

 

String Matching Display for William James Query

                1.             James, William .........................................................          2 entries

                2.             James, William, 1842-1910 .....................................        30 entries

                2.             James, William E.  ....................................................         2 entries

                                James, William Roderick, 1892-1942

                4.               [search under the name] James, Will, 1892-1942 ... 4 entries

 

                Records retrieved in response to string author matches are often arranged in

                alphabetical order by author name heading.  In addition, cross references are

                provided to unused forms of an author name.

 

 

 

Fig. 3.  String and Boolean Matching Author Displays


 

 

 

A CHRISTMAS CAROL / Charles Dickens

                                                                        |

            _________________________________________________________________

            |                                                           |                                                           |

 

   Textual Editions                                Works Related to                                 Works About

   A Christmas Carol                             A Christmas Carol                                A Christmas

   [Call. no. ...]                                                                                                  Carol

           

    For example:                                                        For example:                                                            For example

    English language editions                                    Text adaptations                                                       Criticism

    Editions in other languages                                  Musical adaptations

    Audio editions [Call no. ...]                                   Audiovisual adaptations

 

 

 

Fig. 4.  Superwork Hierarchical Tree Structure Summary Display for A Christmas Carol


 

 

Table 1.  Summary of Data Collection Parameters

 

 

      Types of Catalogs Surveyed:                   DRA, Dynix, GEAC Advance,                                                                                    Innovative Interfaces, NOTIS, VTLS

 

      No. of Catalogs Surveyed:                       18

 

      No. of Author Queries Performed:           

                  String (S):                                 90

                  Boolean (B):                              62*

 

      No. of Title Queries Performed:   

                  String:                                       90

                  Boolean:                                   90

 

      *Some catalogs surveyed did not provide Boolean author searches.

 

 

 

 

 

 

Table 2. Match Type Descriptive Statistics (Medians)

 

 

 

 

 

 

 

AUTHORS 

Interruption Number

Intervening Records

Average Interruption Size

Precision

     String

2.0     

0.0

5.0

.77

     Boolean

3.5

6.5

6.1

.26

 

 

 

 

 

WORKS

Interruption Number

Intervening Records

Average Interruption Size

Precision

     String

4

11

  6.5

.21

     Boolean

5

29

16.7

.07

 

 

 

 

 

SUPERWORKS

Interruption Number

Intervening Records

Average Interruption Size

Precision

     String

4

   6

3.2

.51

     Boolean

14   

40

2.8

.40

 

 

 

 

 

 

 

 


               Table 3. Match Type Results:  Authors

 

 

  VARIABLE

STRING

PERCENT

BOOLEAN

PERCENT

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

0 to 2

81

90%

26

42%

 

3 to 5

7

8%

8

13%

 

6 to 8

2

2%

4

6%

 

9 to 11

0

0%

0

0%

 

12 up

0

0%

24

39%

 

 

Intervening Recs.

 

 

 

 

 

0 to 9

85

94%

32

52%

 

10 to 19

1

1%

3

5%

 

20 to 29

0

0%

0

0%

 

30 up

4

4%

27

44%

 

 

Avg. Int. Size

 

 

 

 

 

0-9.9

64

71%

37

60%

 

10-19.9

19

21%

6

10%

 

20-29.9

5

6%

3

5%

 

30 up

2

2%

16

26%

 

 

Precision

 

 

 

 

 

.8-1

39

43%

10

16%

 

.6-.79

24

27%

5

8%

 

.4-.59

15

17%

2

3%

 

.2-.39

7

8%

20

32%

 

0-.19

5

6%

25

40%

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 


 

                       Table 4.  Match Type Results:  Works

 

 

  VARIABLE

STRING

PERCENT

BOOLEAN

PERCENT

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

0 to 2

27

30%

15

17%

 

3 to 5

36

40%

37

41%

 

6 to 8

18

20%

17

19%

 

9 to 11

6

7%

8

9%

 

12 up

3

3%

13

14%

 

 

Intervening Recs.

 

 

 

 

 

0 to 9

44

49%

34

38%

 

10 to 19

13

14%

7

8%

 

20 to 29

6

7%

4

4%

 

30 up

27

30%

45

50%

 

 

Avg. Int. Size

 

 

 

 

 

0-9.9

65

72%

26

29%

 

10-19.9

20

24%

26

29%

 

20-29.9

3

3%

20

22%

 

30 up

2

2%

18

20%

 

 

Precision

 

 

 

 

 

.8-1

1

1%

1

1%

 

.6-.79

2

2%

2

2%

 

.4-.59

12

13%

2

2%

 

.2-.39

35

39%

11

12%

 

0-.19

40

44%

74

82%

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 


 

 

                       Table 5. Match Type Results:  Superworks

 

 

  VARIABLE

STRING

PERCENT

BOOLEAN

PERCENT

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

0 to 2

37

41%

7

8%

 

3 to 5

21

23%

14

16%

 

6 to 8

12

13%

9

10%

 

9 to 11

16

18%

7

8%

 

12 up

4

4%

53

59%

 

 

Intervening Recs.

 

 

 

 

 

0 to 9

52

58%

20

22%

 

10 to 19

9

10%

13

14%

 

20 to 29

9

10%

9

10%

 

30 up

20

22%

48

53%

 

 

Avg. Int. Size

 

 

 

 

 

0-9.9

84

93%

86

96%

 

10-19.9

5

6%

4

4%

 

20-29.9

0

0%

0

0%

 

30 up

1

1%

0

0%

 

 

Precision

 

 

 

 

 

.8-1

18

20%

16

18%

 

.6-.79

19

21%

11

12%

 

.4-.59

13

14%

18

20%

 

.2-.39

33

37%

21

23%

 

0-.19

7

8%

24

27%

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 

 


 

Table 6. Query Type Descriptive Statistics (Medians)

 

 

 

 

 

 

 QUERY TYPE

INTERRUPTION NUMBER

INTERVENING RECORDS

AVERAGE INTERRUPTION SIZE

PRECISION

Author

2

0

5.1

.61

Work

4

15.5

9.7

.15

Superwork

7

  18

3.2

.47

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 7.  Query Type Results

 

 

VARIABLE

AUTHORS

PERCENT

WORKS

PERCENT

SUPERWORKS

PERCENT

 

 

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

 

 

0 to 2

107

70%

42

23%

44

24%

 

3 to 5

15

10%

73

41%

35

19%

 

6 to 8

6

4%

35

19%

21

12%

 

9 to 11

0

0%

14

8%

23

13%

 

12 up

24

16%

16

9%

57

32%

 

 

 

 

 

 

 

 

 

Intervening Recs.

 

 

 

 

 

 

 

0 to 9

117

77%

78

43%

72

40%

 

10 to 19

4

3%

20

11%

22

12%

 

20 to 29

0

0%

10

6%

18

10%

 

30 up

31

20%

72

40%

68

38%

 

 

 

 

 

 

 

 

 

Avg. Interr. Size

 

 

 

 

 

 

 

0-9.9

101

66%

91

51%

170

95%

 

10-19.9

25

16%

46

26%

9

5%

 

20-29.9

8

5%

23

13%

0

0%

 

30 up

18

12%

20

11%

1

1%

 

 

 

 

 

 

 

 

 

Precision

 

 

 

 

 

 

 

.8-1

49

32%

2

1%

34

19%

 

.6-.79

29

19%

4

2%

30

17%

 

.4-.59

17

11%

14

8%

31

17%

 

.2-.39

27

18%

46

26%

54

30%

 

0-.19

30

20%

114

63%

31

17%

 

 

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 

 

 


 

Table 8.  Catalog Size Statistics (Medians)

 

 

 

 

 

 

AUTHORS

INTERRUPTION NUMBER

INTERVENING RECORDS

AVERAGE INTERRUPTION SIZE

PRECISION

Small Catalogs

2

0

3.7

.56

Medium Catalogs

2

0

5.0

.65

Large Catalogs

2

0

6.6

.59

 

 

 

 

 

WORKS

INTERRUPTION NUMBER

INTERVENING RECORDS

AVERAGE INTERRUPTION SIZE

PRECISION

Small Catalogs

2

1.5

7.3

.14

Medium Catalogs

5

24.5

11.9

.12

Large Catalogs

7

38.5

10.9

.17

 

 

 

 

 

SUPERWORKS

INTERRUPTION NUMBER

INTERVENING RECORDS

AVERAGE INTERRUPTION SIZE

PRECISION

Small Catalogs

3

 6.0

2.9

.47

Medium Catalogs

9

22.5

3.1

.44

Large Catalogs

11

48.5

4.0

.45

 

 

 

 

 

 

 

 

 

Table 9.  Catalog Size Results:  Authors

 

 

VARIABLE

SMALL CATS.

PERCENT

MEDIUM CATS.

PERCENT

LARGE CATS.

PERCENT

 

 

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

 

 

0 to 2

38

78%

35

71%

34

63%

 

3 to 5

4

8%

4

8%

7

13%

 

6 to 8

1

2%

1

2%

4

7%

 

9 to 11

0

0%

0

0%

0

0%

 

12 up

6

12%

9

18%

9

17%

 

 

 

 

 

 

 

 

 

Intervening Recs.

 

 

 

 

 

 

 

0 to 9

41

84%

38

78%

38

70%

 

10 to 19

0

0%

0

0%

4

7%

 

20 to 29

0

0%

0

0%

0

0%

 

30 up

8

16%

11

22%

12

22%

 

 

 

 

 

 

 

 

 

Avg. Interr. Size

 

 

 

 

 

 

 

0-9.9

34

69%

35

71%

32

59%

 

10-19.9

8

16%

8

16%

9

17%

 

20-29.9

4

8%

3

6%

1

2%

 

30 up

3

6%

3

6%

12

22%

 

 

 

 

 

 

 

 

 

Precision

 

 

 

 

 

 

 

.8-1

18

37%

16

33%

15

28%

 

.6-.79

6

12%

11

22%

12

22%

 

.4-.59

6

12%

6

12%

5

9%

 

.2-.39

9

18%

8

16%

10

19%

 

0-.19

10

20%

8

16%

12

22%

 

 

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 

 

 

 


Table 10. Catalog Size Results:  Works

 

 

VARIABLE

SMALL CATS.

PERCENT

MEDIUM CATS.

PERCENT

LARGE CATS.

PERCENT

 

 

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

 

 

0 to 2

33

55%

8

13%

1

2%

 

3 to 5

23

38%

30

50%

20

33%

 

6 to 8

4

7%

14

23%

17

28%

 

9 to 11

0

0%

6

10%

8

13%

 

12 up

0

0%

2

3%

14

23%

 

 

 

 

 

 

 

 

 

Intervening Recs.

 

 

 

 

 

 

 

0 to 9

45

75%

20

33%

13

22%

 

10 to 19

7

12%

8

13%

5

8%

 

20 to 29

2

3%

6

10%

2

3%

 

30 up

6

10%

26

43%

40

67%

 

 

 

 

 

 

 

 

 

Avg. Interr. Size

 

 

 

 

 

 

 

0-9.9

38

63%

26

43%

27

45%

 

10-19.9

15

25%

15

25%

16

27%

 

20-29.9

3

5%

9

15%

11

18%

 

30 up

4

7%

10

17%

6

10%

 

 

 

 

 

 

 

 

 

Precision

 

 

 

 

 

 

 

.8-1

2

3%

0

0%

0

0%

 

.6-.79

2

3%

2

3%

0

0%

 

.4-.59

5

8%

3

5%

6

10%

 

.2-.39

13

22%

13

22%

20

33%

 

0-.19

38

63%

42

70%

34

57%

 

 

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.

 

 

 

 

 

 

 

 

Table 11. Catalog Size Results:  Superworks

 

 

VARIABLE

SMALL CATS.

PERCENT

MEDIUM CATS.

PERCENT

LARGE CATS.

PERCENT

 

 

 

 

 

 

 

 

 

Interruption No.

 

 

 

 

 

 

 

0 to 2

26

43%

9

15%

9

15%

 

3 to 5

14

23%

14

23%

7

12%

 

6 to 8

9

15%

5

8%

7

12%

 

9 to 11

7

12%

7

12%

9

15%

 

12 up

4

7%

25

42%

28

47%

 

 

 

 

 

 

 

 

 

Intervening Recs.

 

 

 

 

 

 

 

0 to 9

38

63%

18

30%

16

27%

 

10 to 19

7

12%

9

15%

6

10%

 

20 to 29

8

13%

5

8%

5

8%

 

30 up

7

12%

28

47%

33

55%

 

 

 

 

 

 

 

 

 

Avg. Interr. Size

 

 

 

 

 

 

 

0-9.9

59

98%

57

95%

54

90%

 

10-19.9

1

2%

3

5%

5

8%

 

20-29.9

0

0%

0

0%

0

0%

 

30 up

0

0%

0

0%

1

2%

 

 

 

 

 

 

 

 

 

Precision

 

 

 

 

 

 

 

.8-1

13

22%

12

20%

9

15%

 

.6-.79

10

17%

9

15%

11

18%

 

.4-.59

11

18%

9

15%

11

18%

 

.2-.39

19

32%

19

32%

16

27%

 

0-.19

7

12%

11

18%

13

22%

 

 

 

 

 

 

 

 

 

Percents may not add up to 100 due to rounding error.