DEVELOPING ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS WORKS:  A STUDY OF USER CLUSTERING BEHAVIOR

 

 

 

 

 

 

 

 

Allyson Carlyle

Assistant Professor

Information School

University of Washington

Box 352930

Seattle, WA  98195-2930 USA

 

acarlyle@u.washington.edu

(206) 543-1887 (office)

(206) 616-3152 (fax)

 

 

 

 

Acknowledgments

            A grant from Kent State University funded this research.  An earlier version of this paper won the 2000 OCLC/ALISE Research Paper Award.  Research assistants Rebecca Albrecht and Melanie Rapp participated in the data collection.  I would also like to acknowledge the following for their invaluable assistance along the way:  Julie Gedeon, Rick Rubin, Jan Winchell, and Sue Gong from Kent State University; Raya Fidel, Terry Brooks, Dean Billheimer, Peter F. Cragmile, and David Poole from the University of Washington; Hur-Li Lee from the University of Wisconsin - Milwaukee; Robert M. Hayes and Elaine Svenonius from UCLA; Lisa M. Fusco; and the anonymous reviewers.

 

Publication information:

"Developing Organized Information Displays for Voluminous Works: A Study of User Clustering Behavior." Information Processing & Management, 35, 5 (July 2001):  677-699.

 


DEVELOPING ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS WORKS:  A STUDY OF USER CLUSTERING BEHAVIOR

 

This paper investigates the ways in which people group or categorize documents associated with a voluminous work to guide the construction of organized displays for information retrieval systems.  Fifty participants completed an unconstrained sorting task in which they were asked to sort into groups 47 documents associated with the voluminous work A Christmas Carol, by Charles Dickens.  Participants were asked to group documents based on how similar they were to each other and such that the groups would help them to remember how to find them at a later time.  Data collected from the sorting task were summarized using cluster analysis, employed to discover common groupings created by participants.  Groupings discovered frequently shared physical format, language, and audience attributes. 

 

Keywords:  user studies, information retrieval system design, online catalog design, sorting task, cluster analysis, known items, categories, classification, information displays


DEVELOPING ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS WORKS:  A STUDY OF USER CLUSTERING BEHAVIOR

 

1.         Introduction

            The purpose of the research presented in this paper is to discover the ways in which people group or categorize documents associated with a voluminous work such as Shakespeare's Hamlet, to guide the construction of organized displays in information retrieval systems (IRSs).  The research emphasizes an area of information seeking seldom examined in information retrieval research, namely, queries for known-items.  Two types of query are common in IRSs:  queries for subjects or topics, and queries for documents already known to the searcher, or known-item queries.  Much of the emphasis in research and development of IRSs has been on subject queries.  Subject queries pose serious problems to users and to systems designers, so it makes sense that research has focused on them.  Known-item queries, on the other hand, are frequently assumed to be unproblematic because they involve formal search terms, i.e., author names and titles.  However, queries for known items, like subject queries, present a variety of significant challenges for retrieval and interface design. 

            One of the challenges that user interfaces for IRSs must meet with respect to known-item searching is how to display large groups of documents, or records representing documents, sharing various types of bibliographic relationships (see Tillett, 1991 for a taxonomy of these relationships).  An example of a group of documents sharing such relationships includes:  the motion picture Gone With the Wind; the typescript screenplay used to produce it; a book of trivia about the making of the movie; and the original book by Margaret Mitchell, upon which the screenplay and movie are based.  This group of documents is representative of a type of document group that will, in this paper, be called a voluminous work or work set.[1]  A voluminous work is a large group of documents sharing a variety of relationships that evolve out of and are linked to a common originator document.  In the case of Gone With the Wind, the common originator document is the original written text by Margaret Mitchell.  Thus, the Koran, Stephen Hawking’s A Brief History of Time, Charles Dickens’ A Christmas Carol, or any of the individual works of Shakespeare would fit the definition of voluminous work.  Voluminous works may generate hundreds or thousands of related documents, including editions, translations, works of criticism, and adaptations for other audiences or into other mediums.  For example, although published only in 1988, Hawking’ A Brief History of Time has already generated over fifty editions, translations, videorecordings, and other related works.

 

2.         Rationale for the Study

            Records for documents associated with voluminous works appear frequently in bibliographic databases such as online catalogs.  Even relatively small databases may contain large numbers of records representing these works.  In the Internet environment, this problem has been identified as the "versions" problem (e.g., Leazer and Smiraglia, 1996; Levy, 1995).  Works that embody many bibliographic relationships are an important focus for research.  Because they are popular, they are likely to be sought frequently by users.  In addition, they often engender searches that retrieve large numbers of relevant records (Carlyle, 1996).  Such searches may impede the identification of documents relevant to a user's particular information need (Matthews, Lawrence and Ferguson, 1983).

            In IRSs such as online catalogs, relatively unsophisticated multiple-record displays are presented in which records are listed one after another, often in an order (e.g., record number order) that appears to the user to be random.  In these types of display, relationships among records are often obscured by the listing of irrelevant records among relevant ones or by the listing of related records with no information indicating relationships among the relevant records retrieved (e.g., see figs. 1 & 2).   For example, in figure 1 it is not clear what relationship, if any, exists between Dickens' A Christmas Carol and All I want for Christmas is my two front teeth in this display.

 

[Figure 1 about here]

 

In figure 2, no information other than publisher and date is included that distinguishes one item from another, so that a user would have to look at each individual record in the list to discover the character and nature of each item represented, for example, whether or not they are illustrated.  

[Figure 2 about here]

 

            In other types of IRS, records or documents are displayed in order of their predicted relevance to search terms, a manner reflecting the IRS development focus on subject queries.  In general, relevance in these systems is predicted by the frequency with which query terms appear in a record or document.  Queries for known-items, however, are composed of formal terms such as author names or titles that may appear only once in a record or document.  Further, when voluminous works are involved, hundreds or thousands of records or documents may satisfy the query, but it may not be possible to predict their relevance to a query in a meaningful way.  This may occur when a user is unable to specify a particular edition or version of interest, or does not know in advance that he or she even wants a particular edition or version.  For example, a user who simply wants to read a particular Shakespeare play or a Dickens novel may not be aware of the existence of that work in many different formats, such as videorecordings or sound recordings, or of adapted or simplified versions, and so may not include these specifications in the query.  Thus, queries for voluminous works may require different approaches to display, including those that provide overviews of records or documents retrieved by highlighting relationships among them.

            One approach to display that would aid users who submit known-item queries for voluminous works is the creation of easily scanned, single-screen summary displays that group records or documents by type of relationship or other common attribute.  Grouping or clustering records or documents retrieved in a search and clearly identifying the types of attribute characterizing the groups or clusters is a promising means by which brief, organized summary displays may be created.  Grouped or clustered displays may enable IRS users to identify records or documents of interest more quickly and efficiently than long, unorganized lists. 

            Displays that simplify large retrieval sets by grouping or organizing records have been suggested as a means of relieving the long display problem in online catalogs (e.g., Svenonius, 1988; Massicotte, 1988; Kinnucan, 1992; Carlyle, 1997a; Yee and Layne, 1998).  Systems featuring clustered or overview-type displays are becoming more common and are also seen as a means of improving the effectiveness of subject searching (e.g., Larson, 1991; Hearst and Pedersen, 1996; Lin, 1997; Zamir, Etzioni, Madani, and Karp, 1997).  A study of user behavior that investigates the ways in which users themselves organize voluminous works is an obvious avenue to pursue in the development of organized work displays.

 

3.         Review of Relevant Research

            One of the most serious obstacles to the representation and display of information contained in an information retrieval system (IRS) is information overload.  Information overload plagued system design even in the manual environment.  Findings from a major study of card catalogs in the 1950's, for example, showed that search failures tended to increase as catalog size increased (Jackson, 1958, p. 19).  In this study it was noted that search failures occurred more often in searches for known authors and works than it did in subject searches.  Information overload problems continue in current systems.  In research on the use of fisheye views, Schaffer et al. (1996) remark that “… today’s computers encourage ‘tunnel vision’ interfaces, for they supply users with very small screens to view large complex information spaces.”  In online catalog research, "scanning through a long display" was ranked fifth in a list of 27 interface problems identified by online catalog users in the first major study of online catalog use (Matthews, Lawrence and Ferguson, 1983, p. 124).  Other online catalog research has shown that users are unlikely to look at all of the screens presented to them when many screens have been retrieved (e.g., Wiberley, Daugherty and Danowski, 1995).  Some users will display all of the screens retrieved, but even relatively persistent users will look through only as many as 200 catalog records before moving to another search. 

            Difficulties surrounding the representation of voluminous works in both manual and online environments have occupied library catalogers for many years (e.g., Panizzi, 1848; Pettee, 1936; Lubetzky, 1969; Yee, 1995).  The development of displays that organize and clearly identify records related to a voluminous work in the catalog has often been advocated as a solution to these problems (e.g., Lubetzky, 1963).  Displays composed of clusters of records identified by attributes common among the items in the clusters have the potential to be an effective method of addressing this problem; they reduce the number of screens a user has to review and they clarify relationships present among items.

Interface design based on user knowledge, behavior, and preferences has been advocated as a means of developing more effective systems (e.g., Fidel, 1994).   Dennis Wixon defines a user-center design process as: “ … one that sets users or data generated by users as the criteria by which a design is evaluated or as the generative source of design ideas.”  (in Karat, Atwood, Dray, Rantzer, and Wixon, 1996, p. 161).  While systems are frequently evaluated with user input, they are seldom designed based on it (Wilson, Bekker, Johnson, and Johnson 1997, p. 179)  Basing clustered display designs on user-categorization behavior offers the possibility of creating more effective, efficient interfaces, understandable by larger numbers of users than system-centered or designer-centered designs.

One of the areas in which designs based on user-categorization behavior have been implemented with some success is in the design of menu categories for pull-down menus in a variety of software applications.  Menu design issues are relevant here since they are similar to summary display designs:  both use categories to attempt to alleviate information overload and to make retrieval of items efficient and accurate.  McDonald and Schvaneveldt (1988) summarize a large body of cognitive-science research in interface design demonstrating the improvement of menu designs based on the incorporation of user-derived categories: "The studies ... have demonstrated that menus organized according to users' empirically-derived cognitive structures are superior to other alternatives (e.g., alphabetical, random, and subjective organizations)." (p. 318).  In this type of research, users’ conceptual schema are modeled, and information systems are then designed based on the user behavior model.

A variety of studies have shown improvement of menu design based on use of categories.  In one of them, McDonald, Stone, and Liebelt (1983) tested the effectiveness of menus organized in three different ways:  alphabetically, categorically, and randomly.  The results of their experiment showed that categorical menus were clearly more effective than either of the other two types.  Of particular significance for the online catalog environment, wherein many users are novice or casual users who are unfamiliar with catalog designs, this experiment showed that categorical menus were even more effective when study participants were uncertain about what they were looking for.

Later studies show that not only are categorical displays more effective than other types of displays, but moreover, those that are constructed based on groupings common to a variety of people are even more effective.  Hayhoe (1990)[2] used an unconstrained sorting task methodology similar to the methodology used in this study.  In that study, he assigned a sorting task to two types of study participants, experts and non-experts in a particular subject area.  He then performed one cluster analysis on the groupings of the experts, another one on the groupings of the the non-experts, and a third one on the groupings of all participants together (combined participant grouping).  He then created different menus, one each based on the results of the three cluster analyses, and one that reflected each individual participant’s own particular grouping.  In Hayhoe’s study, the combined participant grouping was found to be more effective for menu construction than groupings created by experts.  Surprisingly, combined participant groupings were also more effective for participants than the groupings they had created for themselves.

Many of the studies investigating user-based menu categories have employed the method used in this paper, cluster analysis, to derive categories from users (Lewis, 1991).  A common methodology that employs cluster analysis to derive user categories is the unconstrained sorting task.  In an unconstrained sorting task, participants are given a set of items, frequently cards with words and definitions printed on them, and asked to sort them into as many groups as they like based on item similarity.  This methodology is also referred to as free clustering, free sorting,  F-sorting, and bottom-up sorting.  Sorting task studies have been used in psychological studies to discover the characteristics people use to group words or other items (e.g., Miller, 1969).  Sorting tasks of various types have also been used as a knowledge elicitation technique for the creation of expert systems (for a summary, see Cooke, 1994) and to assist in the design of large WWW sites.[3]  In these cases, the underlying assumption is that categories derived from user conceptual schemas are more effective than other types of categories.

User-centered information retrieval system designs incorporating user-based categorizations are, unfortunately, rather rare.  One that has not only been tested and evaluated, but implemented in a real-life setting is BookHouse (Rasmussen, Pejtersen, and Goodstein, 1994).  BookHouse is a fiction catalog developed for library users.  Its icon text-based interface design was based on an extensive study of user needs, expectations, and conventions (Chapters 9-11).  In the evaluation stage, two traditional prototype fiction retrieval systems, one a traditional command driven interface, and the other, a text-based interface with mouse capabilities, were developed for testing in a real library setting against the icon text-based design (BookHouse).  A variety of methods of evaluation were combined, including questionnaires, interviews, observations, and transaction log analyses.  The BookHouse design was shown to be more effective overall than the other, more traditional systems (Chapter 12).

            The unconstrained sorting task methodology has been used previously in library and information science research to discover the categories individuals use to group images.  Vidal (1995) asked 58 study participants to sort 48 images of the Brooklyn Bridge.  Cluster analysis of the sorting data discovered groups based on media type, age of image, image content, and purpose of image (p. 56-60).  Jörgensen (1995) used an unconstrained sorting task of images to derive think-aloud protocols that were then analyzed using content analysis to discover the categories or attributes used for grouping.  Categories discovered in this study included art historical information such as medium and style, content/story information such as setting and activity, and color.  Faiks and Hyland (2000) used a card sort technique to provide user input on the organization of help screens for the Cornell University Library digital library. 

 

4.         Research Design

            The underlying assumption of this research was that if IRSs reflect the ways in which people themselves organize documents, they would be more responsive to users' information needs.  The specific goal of the research was to begin to understand how people organize documents that comprise a particular voluminous work set.  The study addressed two questions:  What are the common groups that are created by people when they categorize documents related to a voluminous work? and, What are the characteristics that people use to group documents related to a voluminous work?  To answer these questions, an unconstrained sorting task was assigned to study participants.  Documents related to a particular work were sorted based on participant perceptions of how similar the documents were to each other and on the participants' perceived ability to retrieve them at a later time.  To answer the second question, participants wrote descriptions of each group identifying the characteristics used to create the group.  The results of the qualitative analysis of the written descriptions appear in Carlyle (1999). 

            A pilot study was conducted to test the methodology and to determine the extent to which participants might be influenced by the physical format of documents when grouping.  In the pilot study, half of the participants grouped actual documents and half grouped photocopies of covers, title pages, and other parts of the documents.  It was thought that the participants grouping actual documents might be more influenced by the physical format of the documents than participants given photocopies of the documents and, thus, that the characteristics they used for grouping would differ.  Analysis of the results revealed that the difference between characteristics used for grouping documents versus photocopies was negligible.  Because pilot participants appeared to be more comfortable and engaged in the task when grouping actual documents, and because the photocopies sometimes stuck together, making the data collection process somewhat prone to error, actual documents were used for the main study.

            Fifty participants 18 years of age or older, who could read and write in English, agreed to participate in the study, which was conducted at the Chapel Hill Mall in Akron, Ohio, U.S.A.  Complete information regarding participants reported in Carlyle (1999) is briefly summarized here.  Almost half (48%) of the participants reported reading at least one book per month.  More than half (66%) reported visiting a bookstore or library at least once per month.  Sixty-four percent of the participants were between the ages of 18 and 30, and 63 percent had not completed an undergraduate degree.  More than half (60 %) were female.

            Participants were paid ten U.S. dollars to spend approximately 30 to 45 minutes grouping 47 documents related to Charles Dickens' A Christmas Carol.  A Christmas Carol was selected for the study because many different types of edition and work related to it were in print and were easily obtainable.  Documents used in the study included book, video, and sound recording editions as well as such unusual editions as an Advent calendar and a book of Christmas Carol trivia (see Appendix 1 for complete bibliography of documents used). 

            The study was conducted at a shopping mall for a variety of reasons.  It seemed more likely that a shopping mall venue, as opposed to a university or library venue, would insure that a wide variety of persons would participate in the study.  One of the objects of IRS design is to make them reflect the expectations of any potential user.  As libraries may be intimidating and threatening to some users (e.g., Mellon, 1986), research in interface design benefits from study of a variety of potential users outside of traditional information seeking environments.  In addition, it was hoped the mall setting would inhibit the influence of traditional groupings present in libraries, making participants more likely to rely on their own preferences for grouping.

            Participants were asked to sort documents for A Christmas Carol into as many or few groups as they wanted based on how similar they were to each other.  They were told that the purpose of the groups was to help them find the documents later on; in other words, the characteristics they used to create the groups should help them remember where to find the documents later.  Participants were also asked to write down names and descriptions for each group. After participants completed the written task, they were also asked to fill out a brief form asking for demographic information. 

 

5.         Cluster Analysis Results

            Hierarchical cluster analysis was used to determine common groupings of documents based on a comparison of the composition of all the groups created by participants in the study.  Data collected on which documents appeared together in groups for each participant were compiled, and the cluster analysis calculated the frequency with which any two documents were placed in the same group by all of the study participants.  Clusters were formed one step at a time, representing those documents grouped together most frequently.  Clusters composed of one or more documents may merge with other single documents or with other clusters at any step of the analysis.

            To summarize sorting data using cluster analysis, the first step is to determine a dissimilarity, or distance, measure (Aldenderfer and Blashfield, 1984).  Distance measures allow the researcher to compute the proximity or distance of every item in the sample to every other document.  When using sorting data, a distance measure usually represents the frequency with which participants place any two items into separate groups.  Distances among all the items are calculated, a distance matrix is constructed, and clustering is accomplished by placing into groups those items that are closest to each other.

            The distance measure used for this study was the percentage of participants who placed any two documents into different groups, a common distance measure used with sorting data (Dunn-Rankin, 1983, pp. 36-37; 41-42).  An SPSS program using Ward's method, an agglomerative hierarchical clustering method, was selected to cluster the documents.  Agglomerative hierarchical clustering methods begin by regarding each individual document as its own separate cluster and proceed one step at a time by merging clusters, one at a time, until all documents are merged into a single cluster.  As this study had 47 documents, the cluster program clustered in 46 steps.

            In order to select a particular level of grouping as the most natural, one examines the distances, or fusion coefficients, between any two documents or groups as they are merged at a given step.  The step immediately before the step at which the fusion coefficient becomes quite large is the step that should be selected as the natural grouping of the documents.  In other words, when groups or items are merged which are quite distant from each other, the merging of groups or items should stop (Aldenfelder and Blashfield 1984, 53-58).  Selection of the natural grouping of items is made somewhat difficult when the distances between coefficients do not differ greatly, which was the case here.  However, the average number of groups formed by participants in the study was 7 (median 7, mean 7.3; smallest number of groups formed:  3; largest number of groups formed: 13).  Thus, it makes sense to look for a step near to 7 groups that shows the largest difference between fusion coefficients.  A six-cluster solution was selected as best representing the data because it showed the largest difference between coefficients.

            The clusters presented below are numbered according to the step at which they were formed in the clustering process.  Knowing when the clusters formed is of interest because the documents comprising the clusters formed in the first steps may be viewed as "more alike" than documents in clusters formed in later steps because study participants grouped them together more frequently.  In each cluster, numbers appearing after individual documents indicate the step at which they were merged with the document(s) appearing in the list before them.  Blank lines between documents indicate the formation of sub-clusters within a larger cluster.  Thus, in cluster 2, the two Japanese versions clustered first at step 6; then the two Spanish versions clustered at step 8;  at step 16, the Spanish and Japanese clusters merged; and at step 29 the French version merged with the Japanese/Spanish cluster, completing cluster 2.  The two Japanese versions formed at the earliest step, and may therefore be regarded as the "most alike" because participant grouped them together more frequently than, say, the Spanish and French versions.

            The presentation of each of the clusters is accompanied by a discussion and analysis of the types of attribute that are present in the documents that comprise the cluster.  Most of the types of attribute discussed are taken from the qualitative analysis of the written data.  In the qualitative analysis, participant descriptions of their groups were analyzed and the researchers derived types of attribute present in the descriptions.  Types of attribute discovered in the qualitative analysis and identifiable in the clusters formed here are: 

physical format                                               usage

audience                                                          language

content description                                         physical characteristics

 pictorial elements                                           content age or integrity. 

Physical format is distinguished from physical characteristics in the following way:  physical format refers to document type, for example, book, videorecording, or cassette;  physical characteristics refer to physical attributes of individual documents, for example, document size (small books, big books).

 

Cluster 1 (formed at step 11):   Sound Recordings

 

Cassette version.  Performed by Sir Lawrence Olivier. 

CD version.     Performed by Patrick Stewart.    (1)

Cassette version.  Performed by Patrick Stewart. (merged with above at step 4)

Cassette version.  Read by Geoffrey Palmer.  (merged with above at step 11)

 

            The first cluster was formed at step 11, very early in the clustering process, with two of the documents forming the very first subcluster created.  That participants found these documents very similar is also shown by the fact that no separate subclusters of documents were formed; once the initial subcluster formed, the additional documents were simply merged with the original subcluster.  The type of attribute present in all of the documents that undoubtedly accounts for the formation of this cluster is that all of the documents are sound recordings.  Although sound recordings represent two different physical formats, cassettes and CDs, the grouping of these documents together may indicate that participants found these formats to be similar.  In the qualitative analysis, physical format was the predominant type of characteristic identified by participants in their descriptions of grouping by a large margin, so it would be expected that physical format would appear frequently in the clusters themselves.

 

Cluster 2 (formed at step 29):  Non-English Language Versions

 

A.        Japanese hardcover version.

            Japanese paperback cartoon version. (6)

           

B.         Spanish paperback.  1990.

            Spanish paperback.  1993.  (8)

 

C.        French paperback.  1994. 

 

            Subcluster A. merged with subcluster B. at step 16.

            Subcluster A./B. merged with C. at step 29.

 

            Two attributes are shared by all of the documents in the second cluster, language (all are non-English) and physical format (book).  The subclustering of the Japanese and Spanish versions occurred very early in the clustering process, showing that they were very frequently grouped together by study participants.  The fact that the French language paperback did not cluster early with the other groups may be explained by a subjective observation made during the data collection, which is that this paperback looked very much like the English language paperbacks (unlike the other non-English language editions), and it seemed that some of the participants did not notice that it was not in English because they grouped it with other, similar-looking English language paperbacks.  I suspect that if all of the participants had been aware that this was a French version, this cluster would have formed earlier.  The language attribute was identified at least once by a large majority of the participants (86 percent) in their written descriptions as a characteristic they used to group documents, and so, again, it is expected that language will appear in the cluster formations.

 

Cluster 3 (formed at step 34):  Paperback Versions

 

A.        Pocket Books. 

            Washington Square Press.     (5)

 

B.         Airmont. 

            Bantam Books.    (9)

 

C.        Pocket Library. 

            Watermill Classic.    (14)  

 

 

D.        A Christmas Carol and Other Christmas Stories.  Signet. 

            A Christmas Carol and Other Christmas Books.  Dent.    (13)

            The Christmas Books, Volume 1, A Christmas Carol, The Chimes.  Penguin.

                                      (merged with above at step 21)

 

E.         Puffin Books. 

            Dover.     (24)

 

            Subcluster A. merged with subcluster C. at step 19.

            Subcluster A./C. merged with subcluster B. at step 23.

            Subcluster A./B./C. merged with subcluster D. at step 31.

            Subcluster A./B./C./D. merged with subcluster E. at step 34.

 

            Cluster 3 is composed of all of the English language (language), adult (audience) paperbacks (physical format) in the sample, and includes as well, in subcluster E, two paperback editions that could be considered to be aimed at a young adult or older child audience.  Audience was also a frequently mentioned attribute type in the written descriptions, so it should appear frequently in the clusters and subclusters formed.  The documents in this cluster are also identical in that they all are unabridged editions of A Christmas Carol (content age or integrity), and they have either few or no illustrations (pictorial elements).  The first two documents merged in subcluster A are virtually identical paperback editions;  the covers vary only in color (one cover is red, the other green) and other minor differences in layout. 

            The second two subclusters reflect documents that have similar physical characteristics.  Thus, the two documents in subcluster B have nearly identical dimensions.  The same is true of the two documents in subcluster E, except that the documents in subcluster E are somewhat larger sized paperbacks than the documents in subcluster B.  Subcluster D is of particular interest because it represents both physical format (paperback) and audience (adult), but also represents a content description characteristic--these documents represent the only English language paperback collections, of which A Christmas Carol is only one of the stories.  Although it seems odd initially that the collections merged with the earlier subclusters prior to merging with subcluster E, this might be explained by looking at physical characteristics.  The first four subclusters are all comprised of smaller sized paperbacks, while the paperbacks in subcluster E are a somewhat larger size. 

 

Cluster 4 (formed at step 38):            Videorecordings

 

A.        A Flintstones Christmas Carol.  

            The Muppet Christmas Carol.      (2)

            Mickey's Christmas Carol.      (3)

 

B.         A Christmas Carol.  (George C. Scott ) 

            An American Christmas Carol. (Henry Winkler)    (7)

            Scrooge.   (Albert Finney)   (merged with above at step 10)

            A Christmas Carol.  (Alastair Sim)   (merged with above at step 17)

 

            Subclusters A. and B. merged at step 38.

 

            Cluster 4 is another obvious instance of the predominance of physical format and language in grouping.  In this case, the format is videorecordings and the language, again, English.   The formation of Cluster 4 is quite striking:  the two subclusters formed early;  subcluster A formed at step 3 and subcluster B at step 17,  and both remained intact until they merged at step 38.  The difference between the two subclusters is audience.  Subcluster A consists of children's videorecordings, and subcluster B of adult videorecordings.  The early formation and late merging of the subclusters indicates the significance of the audience attribute for grouping. 

 

 

Cluster 5 (formed at step 40):  Hardcover versions

 

A.        Baronet Books.  Adapted by Malvina G. Vogel. Illustrations by Pablo Marcos Studio.

            Holiday House.  Illustrated by Trina Schart Hyman.   (12)

            Dial Books.  Illustrated by Michael Foreman.   (merged with above at step 18)

 

B.         Abridged by Vivian French.  Illustrated by Patrick Benson.  Candlewick Press. 

            A Christmas Carol:  Adapted for Theater.  Andrews and McMeel.    (20)

 

C.        Weathervane Books.  Illustrated by Arthur Rackham. 

            Christmas Stories:  A Christmas Carol, ...  McLoughlin Brothers. (1913?) (22)

 

D.        Mula, Tom. Jacob Marley's Christmas Carol.  Adams Publishing. 

            A Christmas Carol and Other Stories.  Modern Library.   (25)

            A Christmas Carol:  A Facsimile Edition ... Pierpont Morgan Library.

                        (merged with above at step 26)

            Davis, Paul.  The Lives and Times of Ebenezer Scrooge.  Yale University Press.

                        (merged with above at step 36)

 

            Subcluster A. merged with subcluster C. at step 30.

            Subcluster  A./C. merged with subcluster B. at step 39.

            Subcluster A./B./C. merged with subcluster D. at step 40.

 

            The dominant attributes present in cluster 5 are, once again, physical format and language.  All of the documents in this cluster are English language hardcover versions.  Audience attributes, children and adult, are also present.  The first three subclusters consist of documents intended for older children, and the last subcluster consists of adult documents.  The fact that the first three subclusters merged with the last subcluster very late in the clustering process again indicates the importance of audience characteristics in distinguishing documents. 

            Subclusters A and C probably grouped together earliest because they are similar in size.  In addition, all but one of the documents in these subclusters are unabridged editions (a content age or integrity attribute), and the adapted document "looks like" an unabridged edition; that is, it is relatively thick and carries no indication on the cover that it is an adaptation.  The only difference discernable between the documents in these two subclusters is that the documents in subcluster C appear to be somewhat older than the documents in A (a physical characteristic attribute).  The documents in subcluster B are large-sized picture books.  They differ from the picture books in cluster 6 in several ways.  First, they contain more text than the documents in cluster 6 and second, they are intended for an older-child audience.  In addition, these are "true" hardcover documents in that they have cloth binding with dust jackets; the documents in cluster 6 have "hard" covers, but they are made of cardboard. 

            Subcluster D is somewhat unusual in that the documents comprising it are as dislike as any of the subclusters formed among all of the clusters, which is reflected in how late this subcluster was formed (step 36).  The first two documents were merged at step 25, more than half way through the clustering process.  These two documents are alike in that they are hardcover documents of a similar small size.  The content of these two documents, however, is different.  One is a retelling of the Dickens story, while the other is a collection of Dickens short stories. The next document to merge with them is a much larger sized hardcover document, which consists of a facsimile of the original manuscript with the text in typescript.  The last document to merge is also a large-sized hardcover document, but it consists of a criticism of A Christmas Carol.  Perhaps one explanation of why these documents grouped together is that at least three of them are quite unlike any of the other documents in the sample (the retelling, the facsimile, and the criticism), and the fourth, although a collection like the collections in cluster 3, is a hardcover that has somewhat similar physical characteristics to the retelling. 

 

 

Cluster 6 (formed at step 41):  Children's and Activity Versions

 

A.        Taylor, Mark A.  The Christmas Carol.     Landoll's. 

            Disney's Mickey's Christmas Carol.  Mouse Works.    (15)

 

B.         A Christmas Carol.  Retold by I.M. Richardson.  Troll Associates. 

            Dubowski, Cathy East.  Scrooge.   Grosset & Dunlap.    (27)

 

C.        A Christmas Carol, Easy Piano Picture Book.  Faber & Faber. 

            A Christmas Carol Story Book Set & Advent Calendar.  Workman Publishing.   (28)

 

D.        A Christmas Carol.  Dramatized by Darwin Reid Payne.  So. Illinois Univ. Press. 

            Luxfield Consultants (adapters).  A Christmas Carol.  Oxford Progressive English

                        Readers.    (32)

            Sammon, Paul.  The Christmas Carol Trivia Book.  Citadel Press. 

                        (merged with above at step 37)

 

            Subclusters A. and B. merged at step 33.

            Subclusters A./B. and C. merged at step 35.

            Subclusters A./B./C. and D. merged at step 41.

 

            The last cluster formed obviously represents the documents that were seen as most dissimilar by the participants, because they were placed less frequently in the same groups.  Nonetheless, interesting similarities do exist among the majority of the documents in the cluster.  All of the documents are English language documents.  All but one (the Advent calendar) are books (physical format).  With only two, and possibly three exceptions (Sammon's The Christmas Carol Trivia Book, Payne's dramatized version, and possibly the Oxford Progressive English Reader), the documents are all aimed at a children's audience, and are heavily illustrated with mostly color illustrations (pictorial elements). 

            In addition, several of the documents represent the usage attribute type; that is, they are documents that could be used for activities other than passive reading.  These documents include the Advent calendar and the piano book (subcluster C); the large sized picture books (subcluster A), which were sometimes described as documents could be used to read to children at bedtime; and possibly the documents in subcluster D, one of which is a play version, one a version to be used for learning English, and the third a trivia book which could be used for a game or at a party.  Physical characteristics may explain the first two subclusters, in that the two documents in subcluster A are both cardboard cover, very large sized children's picture books, and the two documents in subcluster B are both paperback, very thin, medium sized children's picture books. 

 

6.         Discussion

            The most frequently appearing attributes discovered in the cluster analysis include:  physical format, language, and audience; other attributes that appear are: content age or integrity, physical characteristics, pictorial elements and usage.  Dominant attributes, that is, ones used by a majority of study participants in their written descriptions, discovered in the qualitative study, include:  physical format, language, content description, audience, pictorial elements, and usage (Carlyle, 1999).  Thus, all of the attributes that were frequently identified in the qualitative study of written descriptions were present in the clusters described here, indicating that the written descriptions were largely accurate identifications of the attributes actually used by participants when they created their groups. 

            The most notable difference between the results of the qualitative analysis of written descriptions and the cluster analysis is the relative scarcity of the physical characteristics attribute in the written descriptions (only 4 percent of all written descriptions mentioned physical characteristics, and only 20 percent of the participants mentioned physical characteristics at all), as opposed to the comparatively frequent presence of physical characteristics in subcluster formation.  Because physical characteristics played a larger role in the formation of subclusters than clusters, it seems reasonable to speculate that the physical characteristics played only a minor, and perhaps subconscious, role in participant grouping.

 

7.         Implications

7.1       Implications for catalog and other information displays

            Library cataloging records already contain indicators representing many of the attributes identified in this study.  In fact, card catalogs featured arrangements that grouped cards based on several of the attributes discovered in the study, including language and content age or integrity.  If one takes differences in physical format, audience, and usage to indicate significant changes in text, which they often do, then card catalog arrangements also reflected these attributes.  Library classification numbers contain indicators for some of the attributes identified by participants in this study.  Classification numbers created using the Universal Decimal Classification (UDC), for example, contain indicators of physical format, audience, language, and content age or integrity (Carlyle, 1997b).  Indeed, current bibliographic (MARC) records existing in online catalogs could be used automatically to create groupings based on many of the attribute types identified here (Carlyle, 1999).

            One of the problematic features of card catalog displays was that not all groupings of cards were identified or flagged in any way, so that users may not have been aware of them, or may have been confused by them.  In the online environment, groupings may be made clear using images such as boxes to indicate clusters.  In addition, all of the groups may be labeled with simple, informative labels indicating the attributes of the items present in the cluster [Fig. 3]

 

[Fig. 3 about here]

 

 

Figure 3 is an example of how a clustered display of a voluminous work might appear in an online catalog or other information retrieval system.  A variety of flexible searching features could be available.  By clicking on a box, users could browse either another, sub-clustered display (for example, nonbook formats could be subclustered into groups for specific formats) or a brief listing of items.  Records could be sorted by a variety of attributes, for example, date, publisher name, format, or language.  Users could also search for specific attributes such as a translator or illustrator’s name.  Preliminary findings of a current research project indicate that a display such as this could be created largely automatically (Carlyle & Summerlin, 2000).  The clustered display shown in Fig. 3 would be a useful default display for a work, especially helpful to people who do not have a particular edition in mind when they begin their search.  People who have particular specifications could use the “Sort” or “Search” boxes to identify the specific editions they are looking for if the clusters do not feature attributes useful to them such as publisher or date.

            Another approach making use of clusters that could be used to display voluminous works is a method that attempts to aid navigation of large information spaces referred to as distortion-oriented presentation (Leung and Apperley, 1994).  In distortion-oriented displays, an entire information space is displayed, however, focus on one small part of the space is enlarged, while all of the other space is made smaller.  Any of the cluster attributes identified in this study could be used as a focal point for a distortion-oriented display.  The contents of one cluster at a time could then be presented in the focal information space.

7.2       Implications for metadata standards[4]

            Many resources exist on the Internet that represent editions of voluminous works or related items and as a consequence, the organizational issues are similar.  For example, a search on “christmas carol charles dickens” in a search engine retrieves a wide variety of editions, adaptations, and other sites related to the original text.  Metadata standards, particularly general standards such as the Dublin Core (DC; available at: http://purl.org/metadata/dublin_ core_elements), are being developed to aid resource discovery on the World Wide Web.  Thus, these standards may be analyzed to determine the extent to which they facilitate the identification of attributes discovered in this research.  In comparison to cataloging standards, metadata standards are spare; DC identifies only fifteen elements to promote item identification and discovery.  Two of the three dominant attributes identified here are present in DC:  language (DC “language”) and physical format (DC “format”).  The “audience” attribute is missing.   Because DC is extensible, specialized applications of DC may add attributes.  Indeed, at least one extension of DC, the Gateway to Educational Materials (GEM: available at http://www.thegateway.org) element set, has added audience as an element.

            Some of the other attributes discovered here, which would also be useful in clustering documents available on the Internet, are not often available explicitly in metadata standards.  For example, clustering electronic documents that share complex relationships using attribute types such as content or age characteristics (abridged vs. unabridged versions, lower reading skill level versions, adapted versions, etc.) would be difficult in the current Internet environment because these attributes are seldom identified explicitly in metadata standards.

            An important problem with metadata standards is that, for the most part, they do not require use of standardized vocabularies for naming entities (e.g., Rust, 1998).  For example, if one were using DC to describe items, various translated editions of a work might include the translated titles in the DC title element.  As a result, retrieval of all of the editions and versions of a particular work would be inhibited because different editions would have different titles.  DC offers a solution to this problem in their DC source or relation elements where inclusion of standardized author and title information could facilitate retrieval of all of the items related to a work on the Web.

7.3       Implications for digital libraries

            If taken broadly, the findings of this study have a variety of possible implications for the design of digital libraries.  It is useful to begin by looking at the attributes identified by study participants and analyze their place in the current organization of and provision of access to library materials.  All of the attribute types identified by study participants are identified and used in libraries beyond their identification in cataloging records.  Needs for several of the most frequently identified attribute types, physical format, language, and audience are often addressed in libraries by physical arrangements of materials, as when libraries have separate sections for videorecordings, sound recordings, maps, and books; for non-English language materials; and for adults and children.  While these physical arrangements are often recorded in holdings information in cataloging records, they are not used in catalog arrangements or displays, but for location and identification purposes only, usually in single record displays.

            Other attribute types having to do with physical or content features (content description, pictorial elements, physical characteristics, content age or integrity attributes) are described in cataloging records or are discovered by users only when they handle physical documents.  To some extent, these attributes are already described in cataloging records. 

            Finally, the usage attribute type is addressed in libraries in a number of ways.  First, librarians themselves often give suggestions for some of the uses suggested by study participants (books good to read to children at bedtime, books good for introducing other cultures to primary school students).  Second, librarians produce specialized lists of items in their own collection meeting usage needs; for example, a printed bookmark with a brief list of scary books.  Third, libraries collect published bibliographies that group items by various uses. 

            What is most interesting to note about all of these means by which libraries address the attribute types identified in the study is that virtually none of them include the structure of catalog displays; in other words, catalog displays seldom highlight or display information based on the types of attributes discovered in the study.  Some of these attribute types are available only as limits after a particular search has been done; for example, limiting by language.  Some, as mentioned above, may be seen only when individual cataloging records are displayed. 

            In a world in which remote access to the library collection is not possible, it may be that the present means of addressing the attribute types described above is sufficient.  However, we no longer live in such a world.  Many users make use of library catalogs remotely, requesting documents to be delivered to them in their homes or offices.  More and more documents are electronic, and may be printed out at the user’s printer or read or experienced from the user’s computer.  These documents cannot be physically located next to tangible documents held by the library.  One of the intriguing implications of the study is, thus, that attributes such as physical format, audience, and language be incorporated into catalog structures.  For example, catalogs could be designed to provide the ability to browse materials of specific format types such as videorecordings or in a specific language or for a specific age group.  Most of the information necessary to create such browsable displays is already contained in cataloging records and could be harnessed to implement them. 

            Much could also be done in a virtual environment to present some of the attributes.  For example, more cataloging records could include tables of contents to enhance content description, or catalogs could use graphics to signify that a document contains illustrations or other pictorial elements.  The usage attribute would be harder to implement because usage information is rarely included in cataloging records.  Libraries may want to consider, as some already have, providing specialized document lists online in addition to in print, or providing readers advisory services online.  This would provide a virtual counterpart to the current practice of providing such lists or advice in house.

 

8.         Future Research

            This study represents a first step toward improving displays for voluminous works in IRSs.  Future research could begin by investigating a variety of different types of voluminous work.  The "typical" composition of a set of documents related to a particular voluminous work is unknown; moreover, it is likely that the composition of work sets varies a good deal from one work to another.  Thus, the notion of a "typical" work set may be inappropriate.  For example, documents related to the Koran, Charles Darwin's Origin of Species, or Amy Tan's Joy Luck Club are doubtless different in character from those related to A Christmas Carol.  Attributes of the Christmas Carol work set used in this study that may make it different from other work sets are the presence of a wide variety of audio-visual and children's versions and the relative scarcity of documents about A Christmas Carol.  Attributes of a work such as the audience the work is intended for, whether the work is fiction or non-fiction, and physical format of the original edition undoubtedly affect the ultimate composition of a work set.  In addition, future studies could investigate electronic documents such as those commonly found on the Web, which may vary in character from documents traditionally handled by IRSs.  Upon completion of research investigating different types of work, further research is needed in which prototype systems incorporating organized displays for voluminous works are tested to determine whether or not such displays actually do result in more effective IRSs.

            As stated above, most of the attributes identified in the study are already included in cataloging records.  However, problems are inherent with two of the attributes, audience and usage.  Audience is an attribute frequently identified in the qualitative study, and it played a significant role in cluster creation as well.  However, it is an attribute that may be seen as somewhat subjective.  Currently in library cataloging, documents are (somewhat inconsistently) identified as either being for children or not; further, distinctions are not made with respect to how old the children are intended to be.[5]  Investigation of the extent to which such distinctions might be helpful is an obvious avenue for future research. 

            The other attribute, usage, has to do with how a document could be used.  Indexing and cataloging practice has seldom, if ever, allowed or encouraged the indexing of this type of highly subjective “attribute.”  Often in the qualitative study, probably because of the type of work selected, usage descriptions had to do with the utility of the document for reading to children or using in a classroom.  In the cluster analysis, one of the clusters manifested the usage attribute, in that several of the documents clustered there were obviously created for specific uses (e.g., piano book, Advent calendar).  Others could be easily seen as being useful for activities such as bedtime story reading or classroom use.  This confirms the findings of the qualitative study that usage could be an important attribute for indexing.  Research is needed to determine whether or not it actually would be helpful to users to index such an attribute, given the subjectivity involved in its assignment and the difficulty indexers might have assigning it. 

 

9.         Conclusion

            Known-item searches, far from being uninteresting and non-problematic, pose a myriad of fascinating challenges to IRS interface designers.  In addition, solutions to problems presented by these searches may lead to exciting innovations in the structure and quality of information system displays.  Incorporation of alternatives to the long-list retrieval model in our IRSs has the potential to enhance the information environment of users by increasing their ability to identify documents of interest quickly and efficiently.  Organized displays featuring categorization, clustering, or classification may serve users well for a wide variety of information needs. 

 

References

Aldenderfer, M. S. & Blashfield, R. K.  (1984).  Cluster analysis.  Quantitative applications in the social sciences, Series no. 07-044.  Newbury Park:  Sage Publications.

 

Carlyle, A.  (1999).  User categorisation of works:  Toward improved organisation of online catalogue displays.  Journal of Documentation, 55 (2), 184-208.

 

Carlyle, A.  (1997a).  Fulfilling the second objective in the online catalog:  schemes for organizing author and work records into usable displays.  Library Resources & Technical Services, 41, 79-100. 

 

Carlyle, A.  (1997b).  The role of classification in the creation of author and work displays in online catalogues.  In Knowledge organization for information retrieval:  Proceedings of the Sixth International Study Conference on Classification Research, held at University College London, 16-18 June 1997.  FID 716 (pp. 90-96).  The Hague:  International Federation for Information and Documentation.

 

Carlyle, A.  (1996).  Ordering author and work records:  an evaluation of collocation in online catalog displays.  Journal of the American Society for Information Science, 47, 538-554.

 

Carlyle, A. & Summerlin, J. (2000).  Transforming catalog displays:  Record clustering for works of fiction.”  In Beghtol, C., Howarth, L. C., & Williamson, N. J. (Eds.), Dynamism and stability in knowledge organization:  Proceedings of the Sixth International ISKO Conference, 10-13 July 2000, Toronto, Canada (pp. 320-326).  Wurzburg:  ERGON Verlag.

 

Cooke, N. J.  (1994).  Varieties of knowledge elicitation techniques.  International Journal of Human-Computer Studies, 41, 801-849.

 

Dunn-Rankin, P.  (1983).  Scaling methods.  Hillsdale, N.J.:  Lawrence Erlbaum.

 

Faiks, A. & Hyland, N. (2000).  Gaining user insight:  A case study illustrating the card sort technique.  College & Research Libraries, 61 (4), 349-357.

 

Fidel, R.  (1994).  User-centered indexing.  Journal of the American Society for Information Science, 45 (8), 572-576.

 

Hayhoe, D.  (1990).  Sorting-based menu categories.  International Journal of Man-Machine Studies, 33,  677-705.

 

Hearst, M. A. & Pedersen, J. O.  (1996).  Reexamining the cluster hypothesis:  Scatter/Gather on retrieval results.  In H.-P. FREI, Ed.  Proceedings of the 19th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 76-84).  New York, NY:  ACM.

 

Jackson, S. L.  (1958).  Vostecky, V. (Ed.), Catalog use study.  Chicago:  American Library Association.

 

Jörgensen, C.  (1995).  Image attributes:  An investigation.  Ph.D. diss., Syracuse University.

 

Karat, J., Atwood, M. E., Dray, S. M., Rantzer, M. & Wixon, D. R.  (1996).  User centered design:  Quality or quackery?  CHI ’96, Conference Proceedings on Human Factors in Computing Systems (pp. 161-162).

 

Kinnucan, M. T.  (1992).  Fisheye views as an aid to subject access in online catalogues.  Canadian Journal of Information Science, 17 (2), 25-40.

 

Larson, R. R.  (1991).  Classification clustering, probabilistic information retrieval, and the

            online catalog.  Library Quarterly, 61, 133-173.

 

Leazer, G. H. & Smiraglia, R. A.  (1996).  Toward the bibliographic control of works:  Derivative bibliographic relationships in an online union catalog.  In Digital Libraries '96 (pp. 36-43).  Bethesda, MD:  ACM.

 

Leung, Y. K. & Apperley, M. D.  (1994).  A review and taxonomy of distortion-oriented presentation techniques.  ACM Transactions on Computer-Human Interaction, 1 (1), 126-160.

 

Levy, D. M.  (1995).  Naming the nameable:  Names, versions, and document identity in a networked environment.  In Scholarly publishing on the electronic networks:  Filling the pipeline and paying the piper.  Washington, D.C.:  Association of Research Libraries.

 

Lewis, S.  (1991).  Cluster analysis as a technique to guide interface design.  International Journal of Man-Machine Studies, 35,  251-265.

 

Lin, X.  (1997).  Map displays for information retrieval.  Journal of the American Society for Information Science, 48, 40-54.  

 

Lubetzky, S.  (1963).  Function of the main entry in the alphabetical catalogue--one approach.  Working paper no. 2.  In International Federation of Library Associations.  International Conference on Cataloguing Principles, Paris, 9th-18th October,1961.  Report.  pp. 139-143.  London:  Clive Bingley on behalf of IFLA.

 

Lubetzky, S.  (1969).  Principles of Cataloging.  Final Report.  Phase I:  Descriptive Cataloging.  Los Angeles, California:  Institute of Library Research, University of California.

 

Massicotte, M.  (1988).  Improved browsable displays for online subject access. Information Technology and Libraries, 7 (4), 373-380.

 

Matthews, J., Lawrence, G.S. & Ferguson, D.K.  (1983).  Using online catalogs.  New York, NY:  Neal Schuman.

 

McDonald, J. E. & Schvaneveldt, R. W. (1988).  The application of user knowledge to interface design.  In Guindon, R. (Ed.), Cognitive science and its applications for human-computer interaction (pp. 289-338).  Hillsdale, NJ:  Lawrence Erlbaum.

 

McDonald, J. E., Stone, J. D., & Liebelt, L. S. (1983).  Searching for items in menus:  The effects of organization and type of target.  Proceedings of the 27th Annual meeting of the Human Factors Society (pp. 834-837).

 

Mellon, C.  (1986).  Library anxiety:  A grounded theory and its development.  College and Research Libraries, 47, 160-165.

 

Miller, G. A. (1969).  A psychological method to investigate verbal concepts.  Journal of Mathematical Psychology, 6, 169-191.

 

Panizzi, A.  (1985).  Mr. Panizzi to the Right Hon. the Earl of Ellesmere.--British Museum, January 29, 1848.  In M. Carpenter & E. Svenonius (Eds.), Foundations of cataloging:  A sourcebook (pp. 18-47).  Littleton, Colo.:  Libraries Unlimited.

 

Pettee, J.  (1936).  The development of authorship entry and the formulation of authorship rules as found in the Anglo-American code.  Library Quarterly, 6, 270-290.

 

Rasmussen, J., Pejtersen, A.M., & Goodstein, L.P.  (1994).  Cognitive systems engineering.  (Chapters 9-12).  New York:  J. Wiley. 

 

Rust, G.  (1998).  Metadata:  The right approach, an integrated model for descriptive and rights metadata in e-commerce.  D-Lib Magazine (July/August). (http://www.dlib.org/dlib/july98/rust/07rust.html)

 

Schaffer, D., Zuo, Z., Greenberg, S., Bartram, L., Dill, J., Dubs, S., & Roseman, M.  (1996).  Navigating hierarchically clustered networks through fisheye and full-zoom methods.  ACM Transactions on Computer-Human Interaction, 3 (2), 162-188.

 

Svenonius, E.  (1988).  Clustering equivalent bibliographic records.  In:  Annual review of OCLC research, July 1987-June 1988 (pp. 6-8).  Dublin, OH:  OCLC.

 

Tillett, B. B.  (1991).  A taxonomy of bibliographic relationships.  Library Resources & Technical Services, 35 (2), 150-158.

 

Vidal, N. K.  (1995).  Experimental image taxonomy:  An inquiry into spontaneous image organization.  Master's thesis, Cornell University. 

 

Wiberley, S. E., Daugherty, R. A., & Danowski, J. A. (1995).  Displaying online catalog postings:  LUIS.  Library Resources & Technical Services, 39, 247-264.

 

Wilson, S., Bekker, M., Johnson, P. & Johnson, H. (1997).  Helping and hindering user involvement – a tale of everyday design.  CHI ’97, Conference Proceedings on Human Factors in Computing Systems, March 22-27, 1997, Atlanta, Georgia.  Also available at: http://www.acm.org/pubs/contents/proceedings/chi/258549/

 

Yee, M. M.  (1995).  What is a work?  Part 4:  Cataloging theorists and a definition abstract.  Cataloging & Classification Quarterly, 20, 3-24.

 

Yee, M. M. & Layne, S. S.  (1998).  Improving online public access catalogs.  Chicago:  American Library Association.

 

Zamir, O., Etzioni, O., Madani, O., & Karp, R. M.  (1997).  Fast and intuitive clustering of web documents.  In Heckerman, D., Mannila, H., Pregibon, D. & Uthurusamy, R. (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.  Menlo Park, CA: AAAI Press.


FIGURE 1. 

Sample retrieval set for a keyword search on the terms

"christmas carol" in a large online catalog

 

            12 Christmas carols, tuba quintet

            Afro-American carols for Christmas

   *       All I want for Christmas is my two front teeth

            American children sing Christmas carols

   *       The annotated Christmas carol:  a Christmas carol

            As Joseph was a-walking :  a Christmas carol :  Old English      

   *       Batman : ghosts : a tale of Halloween in Gotham City, inspired by

                        Charles Dickens' "A Christmas carol"

            The best of off-off Broadway

            The birds' Christmas carol, together with its sequel

            The Birds' Christmas carol

            The birds' Christmas carol, a play in one act

            A Carol Christmas

            A carol for Christmas

   *       A Charles Dickens Christmas

   *       Charles Dickens Christmas ghost stories

   *       Christmas carol

   *       A Christmas carol

   *       A Christmas carol, in prose, being a ghost story of Christmas

   *       A Christmas carol : in seven staves

   *       A Christmas carol : selections

   *       Christmas carol : vocal score

   *       The Christmas carol

            Christmas favorites

            A Christmas gift

   *       Christmas stories

   *       Dickens' Christmas carol

   *       The facts about A Christmas carol

            I'll be home for Christmas

   *       The lives and times of Ebenezer Scrooge

            The peace album

   *       Scrooge

   *       A tale of two cities.  A Christmas carol.  The chimes.

            The Tubadors

            Wexford carol

            White Christmas

   *       The works of Charles Dickens

           

 

* Titles with an asterisk are editions of, or are related to, or contain editions of or works related to Charles Dickens' A Christmas Carol.  Titles that do not have the words "Christmas carol" displayed have these words somewhere in the text of the retrieved record. 


FIGURE 2. 

Sample retrieval set for an author/title search on Charles Dickens/A Christmas Carol

 

A Christmas carol /                   Dickens, Charles,                     André Deutch               1998

Christmas carol.                        Dickens, Charles,                     Prószýnskii                   1998

A Christmas carol /                   Dickens, Charles,                     Holt, Rinehart               1998

Christmas carol.                        Dickens, Charles,                     Gerstenberg,                 1998

A Christmas carol.                    Dickens, Charles,                     Gallimard jeun  1998

A Christmas carol /                   Dickens, Charles,                     Fenn,                            1998

Christmas carol             Dickens, Charles,                     Candy Cane Pre           1998

A Christmas carol /                   Dickens, Charles,                     Scholastic Inc               1999

Christmas carol.                        Dickens, Charles,                     Edivisíon Com  1999

A Christmas carol.                    Dickens, Charles,                     Everyman,                    1999

A Christmas carol /                   Dickens, Charles,                     LRS,                            1999

A Christmas carol /                   Dickens, Charles,                     Umberto Press,            2000

A Christmas carol /                   Dickens, Charles,                     North-South Bo           2000

Christmas carol.                        Dickens, Charles,                     Héritage,                      2000

 

 


FIGURE 3.

Clustered Display for Dickens’ A Christmas Carol

 


APPENDIX 1. 

Complete Bibliography of Documents Used in the Study

 

Cluster 1:  Sound Recordings

Dickens, Charles.  A Christmas Carol.  Performed by Sir Lawrence Olivier and others. New York, NY:  Multilingua, Inc., 1990?  (cassette)

Dickens, Charles.  A Christmas Carol. Performed by Patrick Stewart.  New York, NY:  Simon & Schuster Audioworks, 1991.    (compact disc)

Dickens, Charles.  A Christmas Carol. Performed by Patrick Stewart.  New York, NY:  Simon & Schuster Audioworks, 1991.  (cassette) 

Dickens, Charles.  A Christmas Carol.  Read by Geoffrey Palmer.  London:  Penguin Books, 1995.  (cassette)

 

Cluster 2:  Non-English Language Versions

Dickens, Charles.  A Christmas Carol.  [Japanese hardcover version:  Dikenzu gensaku.  Kurisumasu kyaroru.  Tamura Sumiyo sakka.  Sekai no meisaku.  Samaku Shuppan.], 1995.  (ISBN:  4-309-46568-4) 

Dickens, Charles.  A Christmas Carol.  [Japanese paperback cartoon version:  Dikenzu.  Kurisumasu kyaroru. Muraoka Hanako yaku.  Sato Aki kaisetsu.  Sekai bungaku no tamatebako.  Kawade Shobo Shinsha, c1994.]  (ISBN:  4-7631-8299-4)

Dickens, Carlos.  Canción de Navidad, El Grillo del Hogar, Historia de Dos

            Ciudades.  Sexta edicíon.  Mexico:  Editorial Porrúa, S. A., 1990.  (Spanish)

Dickens, Charles.  Cuentos Navidenños:  Cancion de navidad, el poseido.

            Bogota:  Ediciones Universales, 1993.  (Spanish)

Dickens, Charles.  Un Chant de Noël.  Évreux:  Gallimard, Folio Junior, 1994.  (French)

 

Cluster 3:  Paperback Versions

Dickens, Charles.  A Christmas Carol.  New York.  Pocket Books (Simon & Schuster), 1963.

Dickens, Charles.  A Christmas Carol.  New York.  Washington Square Press  (Pocket Books), 1963.  

Dickens, Charles.  A Christmas Carol.  New York.  Airmont, 1963.

Dickens, Charles.  A Christmas Carol.  New York:  Bantam Books, 1993.

Dickens, Charles.  A Christmas Carol.  New York.  Pocket Library, 1959.

Dickens, Charles.  A Christmas Carol.  Mahwah, New Jersey:  A Watermill Classic, 1980.

Dickens, Charles.  A Christmas Carol and Other Christmas Stories.  New York:  A Signet Classic (Penguin), 1984.

Dickens, Charles.  A Christmas Carol and Other Christmas Books.  London:  Dent, 1972.

Dickens, Charles. The Christmas Books, Volume 1, A Christmas Carol, The Chimes.

            London:  Penguin Books, 1985.

Dickens, Charles.  A Christmas Carol.  New York:  Dover.  (Dover Thrift Editions), 1991.

Dickens, Charles.  A Christmas Carol.  London:  Puffin Books, 1994.

 

Cluster 4:  Videorecordings

 A Flintstones Christmas Carol. [S.l.]:  Hanna-Barbera Cartoons, 1994.

 The Muppet Christmas Carol.  Burbank, CA:  Distributed by Buena Vista Home Video, 1993.

 Mickey's Christmas Carol.  Burbank, CA:  Distributed by Buena Vista Home Video, 1992.

A Christmas Carol.  Beverly Hills, CA:  Twentieth Century Fox, 1984.  (George C. Scott) 

An American Christmas Carol.  New York:  Ft. Lauderdale, FL:  Goodtimes Home Video, 1979. (Henry Winkler)

Scrooge.  Beverly Hills, CA:  Fox Video, 1970.  (Albert Finney)

A Christmas Carol.  Blockbuster Classics, 1951.  (Alastair Sim) 

 

 

Cluster 5:  Hardcover versions

Dickens, Charles.  A Christmas Carol.  New York:  Baronet Books.  Adapted by Malvina G. Vogel. Illustrations  by Pablo Marcos Studio, 1990.

Dickens, Charles.  A Christmas Carol.  New York:  Holiday House.  Illustrated by Trina Schart Hyman, 1983.

Dickens, Charles.  A Christmas Carol.  New York:  Dial Books.  Illustrated by Michael Foreman, 1983.

Dickens, Charles.  A Christmas Carol.  Abridged by Vivian French.  Illustrated by Patrick Benson.  Cambridge, Mass.:  Candlewick Press, 1993.

Dickens, Charles.  A Christmas Carol:  Adapted for Theater.  Kansas City: Andrews and McMeel, 1993.

Dickens, Charles.  A Christmas Carol.  New York:  Weathervane Books.  Illustrated by Arthur Rackham, 1977.

Dickens, Charles.  Christmas Stories:  A Christmas Carol, The Holly Tree.  New York:  McLoughlin Brothers, 1913?

Mula, Tom.  Jacob Marley's Christmas Carol.  Holbrook, Mass.:  Adams

            Publishing, 1995.

Dickens, Charles.  A Christmas Carol and Other Stories.  New York:  Modern

            Library, 1995.

Dickens, Charles.  A Christmas Carol:  A Facsimile Edition ...  New York:  Pierpont Morgan Library, 1993.

Davis, Paul.  The Lives and Times of Ebenezer Scrooge.  New Haven:  Yale

            University Press, 1990. 

 

Cluster 6:  Childrens and Activity Versions

Taylor, Mark A.  The Christmas Carol.  Ashland, Ohio:   Landoll's.  1995?

Disney's Mickey's Christmas Carol.  Burbank, CA:  Mouse Works.  1995?

Richardson, I.M.  A Christmas Carol.  USA:  Troll Associates, 1988.

Dubowski, Cathy East.  Scrooge.   New York:  Grosset & Dunlap, 1994.

Lillington, Kenneth.  A Christmas Carol, Easy Piano Picture Book.  Text by Kenneth Lillington after Charles Dickens, Illustrations by Annabel Spenceley, Carols arranged by Timothy Roberts.  London:  Faber & Faber, 1988.

Packard, Mary.   Dickens, Charles:  A Christmas Carol Story Book Set & Advent Calendar.  Illustrations by Ray Bartkus; Story retold by Mary Packard.  New York:  Workman Publishing, 1995. 

Payne, Darwin Reid.  A Christmas Carol.  Dramatized by Darwin Reid Payne. Carbondale:  So. Illinois Univ. Press, 1981.

Luxfield Consultants (adaptors).  A Christmas Carol.  Hong Kong: Oxford University Press.  Oxford Progressive English Readers, 1992.

Sammon, Paul.  The Christmas Carol Trivia Book.  New York:  Citadel Press, 1994.

 



                [1]Technically, a “work” is an abstract notion that refers to the content of an item.  It is defined as “a distinct intellectual or artistic creation” in Functional Requirements for Bibliographic Records:  Final Report.  by the IFLA Study Group on the Functional Requirements for Bibliographic Records.  Munchen:  K.G. Saur, 1998.  Its physical embodiment is a “work set”; the group of documents that represent the work.  For stylistic reasons, both of these notions will be referred to here as “the work.”

                        [2]Hayhoe's article contains a useful discussion of methodological issues of sorting studies as well as a review of previous studies using this methodology for menu creation research.

                [3]For example, Microsoft has used sorting tasks to guide the design of their Intranet site (Amy Stevenson, personal communication, 1997).

                [4]See Carlyle (1999) for a discussion of implications for Anglo-American cataloging standards.

[5]Fixed field codes and an audience note field exist in MARC, in which detailed information about audience may be included;  however, in practice, they are seldom used.