DEVELOPING ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS
WORKS: A STUDY OF USER CLUSTERING
BEHAVIOR
Allyson Carlyle
Assistant Professor
acarlyle@u.washington.edu
(206) 543-1887 (office)
(206) 616-3152 (fax)
Acknowledgments
A grant from
Publication
information:
"Developing Organized
Information Displays for Voluminous Works: A Study of User Clustering
Behavior." Information
Processing & Management, 35, 5 (July 2001): 677-699.
DEVELOPING
ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS WORKS: A STUDY OF USER CLUSTERING BEHAVIOR
This paper investigates the ways in which
people group or categorize documents associated with a voluminous work to guide
the construction of organized displays for information retrieval systems. Fifty participants completed an unconstrained
sorting task in which they were asked to sort into groups 47 documents
associated with the voluminous work A
Christmas Carol, by Charles Dickens.
Participants were asked to group documents based on how similar they
were to each other and such that the groups would help them to remember how to
find them at a later time. Data
collected from the sorting task were summarized using cluster analysis, employed
to discover common groupings created by participants. Groupings discovered frequently shared
physical format, language, and audience attributes.
Keywords:
user studies, information retrieval system design, online catalog
design, sorting task, cluster analysis, known items, categories,
classification, information displays
DEVELOPING
ORGANIZED INFORMATION DISPLAYS FOR VOLUMINOUS WORKS: A STUDY OF USER CLUSTERING BEHAVIOR
1. Introduction
The purpose of the research
presented in this paper is to discover the ways in which people group or
categorize documents associated with a voluminous work such as Shakespeare's Hamlet, to guide the construction of
organized displays in information retrieval systems (IRSs). The research emphasizes an area of
information seeking seldom examined in information retrieval research, namely,
queries for known-items. Two types of
query are common in IRSs: queries for subjects or topics, and queries
for documents already known to the searcher, or known-item queries. Much of the emphasis in research and development
of IRSs has been on subject queries. Subject queries pose serious problems to
users and to systems designers, so it makes sense that research has focused on
them. Known-item queries, on the other
hand, are frequently assumed to be unproblematic because they involve formal
search terms, i.e., author names and titles.
However, queries for known items, like subject queries, present a
variety of significant challenges for retrieval and interface design.
One of the challenges that user
interfaces for IRSs must meet with respect to
known-item searching is how to display large groups of documents, or records
representing documents, sharing various types of bibliographic relationships
(see Tillett, 1991 for a taxonomy of these
relationships). An example of a group of
documents sharing such relationships includes:
the motion picture Gone With the
Wind; the typescript screenplay used to produce it; a book of trivia about
the making of the movie; and the original book by Margaret Mitchell, upon which
the screenplay and movie are based. This
group of documents is representative of a type of document group that will, in
this paper, be called a voluminous work or work set.[1] A
voluminous work is a large group of documents sharing a variety of
relationships that evolve out of and are linked to a common originator
document. In the case of Gone With
the Wind, the common originator document is the original written text by
Margaret Mitchell. Thus, the Koran, Stephen Hawking’s A Brief History of Time, Charles Dickens’ A Christmas Carol, or any of the individual works of
Shakespeare would fit the definition of voluminous work. Voluminous works may generate hundreds or
thousands of related documents, including editions, translations, works of
criticism, and adaptations for other audiences or into other mediums. For example, although published only in 1988,
Hawking’ A Brief History of Time has already generated over fifty
editions, translations, videorecordings, and other
related works.
2. Rationale for the Study
Records for documents associated
with voluminous works appear frequently in bibliographic databases such as
online catalogs. Even relatively small
databases may contain large numbers of records representing these works. In the Internet environment, this problem has
been identified as the "versions" problem (e.g., Leazer
and Smiraglia, 1996; Levy, 1995). Works that embody many bibliographic
relationships are an important focus for research. Because they are popular, they are likely to
be sought frequently by users. In
addition, they often engender searches that retrieve large numbers of relevant
records (Carlyle, 1996). Such searches
may impede the identification of documents relevant to a user's particular
information need (Matthews, Lawrence and Ferguson, 1983).
In IRSs
such as online catalogs, relatively unsophisticated multiple-record displays
are presented in which records are listed one after another, often in an order
(e.g., record number order) that appears to the user to be random. In these types of display, relationships
among records are often obscured by the listing of irrelevant records among
relevant ones or by the listing of related records with no information
indicating relationships among the relevant records retrieved (e.g., see figs.
1 & 2). For example, in figure 1 it
is not clear what relationship, if any, exists between Dickens' A Christmas Carol and All I want for Christmas is my two front
teeth in this display.
[Figure
1 about here]
In
figure 2, no information other than publisher and date is included that
distinguishes one item from another, so that a user would have to look at each
individual record in the list to discover the character and nature of each item
represented, for example, whether or not they are illustrated.
[Figure
2 about here]
In
other types of IRS, records or documents are displayed in order of their
predicted relevance to search terms, a manner reflecting the IRS development
focus on subject queries. In general,
relevance in these systems is predicted by the frequency with which query terms
appear in a record or document. Queries
for known-items, however, are composed of formal terms such as author names or
titles that may appear only once in a record or document. Further, when voluminous works are involved,
hundreds or thousands of records or documents may satisfy the query, but it may
not be possible to predict their relevance to a query in a meaningful way. This may occur when a user is unable to
specify a particular edition or version of interest, or does not know in
advance that he or she even wants a
particular edition or version. For
example, a user who simply wants to read a particular Shakespeare play or a
Dickens novel may not be aware of the existence of that work in many different
formats, such as videorecordings or sound recordings,
or of adapted or simplified versions, and so may not include these
specifications in the query. Thus,
queries for voluminous works may require different approaches to display,
including those that provide overviews of records or documents retrieved by
highlighting relationships among them.
One approach to display that would
aid users who submit known-item queries for voluminous works is the creation of
easily scanned, single-screen summary displays that group records or documents
by type of relationship or other common attribute. Grouping or clustering records or documents
retrieved in a search and clearly identifying the types of attribute
characterizing the groups or clusters is a promising means by which brief,
organized summary displays may be created.
Grouped or clustered displays may enable IRS users to identify records
or documents of interest more quickly and efficiently than long, unorganized
lists.
Displays that simplify large
retrieval sets by grouping or organizing records have been suggested as a means
of relieving the long display problem in online catalogs (e.g., Svenonius, 1988; Massicotte,
1988; Kinnucan, 1992; Carlyle, 1997a; Yee and Layne,
1998). Systems featuring clustered or
overview-type displays are becoming more common and are also seen as a means of
improving the effectiveness of subject searching (e.g., Larson, 1991; Hearst
and Pedersen, 1996; Lin, 1997; Zamir, Etzioni, Madani, and Karp,
1997). A study of user behavior that
investigates the ways in which users themselves organize voluminous works is an
obvious avenue to pursue in the development of organized work displays.
3. Review of Relevant Research
One of the most serious obstacles to
the representation and display of information contained in an information
retrieval system (IRS) is information overload.
Information overload plagued system design even in the manual
environment. Findings from a major study
of card catalogs in the 1950's, for example, showed that search failures tended
to increase as catalog size increased (Jackson, 1958, p. 19). In this study it was noted that search
failures occurred more often in searches for known authors and works than it
did in subject searches. Information
overload problems continue in current systems.
In research on the use of fisheye views, Schaffer et al. (1996) remark
that “… today’s computers encourage ‘tunnel vision’ interfaces, for they supply
users with very small screens to view large complex information spaces.” In online catalog research, "scanning
through a long display" was ranked fifth in a list of 27 interface
problems identified by online catalog users in the first major study of online
catalog use (Matthews, Lawrence and Ferguson, 1983, p. 124). Other online catalog research has shown that
users are unlikely to look at all of the screens presented to them when many
screens have been retrieved (e.g., Wiberley,
Daugherty and Danowski, 1995). Some users will display all of the screens
retrieved, but even relatively persistent users will look through only as many
as 200 catalog records before moving to another search.
Difficulties surrounding the
representation of voluminous works in both manual and online environments have
occupied library catalogers for many years (e.g., Panizzi,
1848; Pettee, 1936; Lubetzky,
1969; Yee, 1995). The development of
displays that organize and clearly identify records related to a voluminous
work in the catalog has often been advocated as a solution to these problems
(e.g., Lubetzky, 1963). Displays composed of clusters of records
identified by attributes common among the items in the clusters have the
potential to be an effective method of addressing this problem; they reduce the
number of screens a user has to review and they clarify relationships present
among items.
Interface design based on user knowledge, behavior, and
preferences has been advocated as a means of developing more effective systems
(e.g., Fidel, 1994). Dennis Wixon defines a user-center design process as: “ … one that
sets users or data generated by users as the criteria by which a design is
evaluated or as the generative source of design ideas.” (in Karat, Atwood, Dray, Rantzer,
and Wixon, 1996, p. 161). While systems are frequently evaluated
with user input, they are seldom designed based on it (Wilson, Bekker, Johnson, and Johnson 1997, p. 179) Basing clustered display designs on
user-categorization behavior offers the possibility of creating more effective,
efficient interfaces, understandable by larger numbers of users than
system-centered or designer-centered designs.
One of the areas in which designs based on
user-categorization behavior have been implemented with some success is in the
design of menu categories for pull-down menus in a variety of software
applications. Menu design issues are
relevant here since they are similar to summary display designs: both use categories to attempt to alleviate
information overload and to make retrieval of items efficient and
accurate. McDonald and Schvaneveldt (1988) summarize a large body of
cognitive-science research in interface design demonstrating the improvement of
menu designs based on the incorporation of user-derived categories: "The
studies ... have demonstrated that menus organized according to users' empirically-derived
cognitive structures are superior to other alternatives (e.g., alphabetical,
random, and subjective organizations)." (p. 318). In this type of research, users’ conceptual
schema are modeled, and information systems are then designed based on the user
behavior model.
A variety of studies have shown improvement of menu design
based on use of categories. In one of
them, McDonald, Stone, and Liebelt (1983) tested the
effectiveness of menus organized in three different ways: alphabetically, categorically, and
randomly. The results of their
experiment showed that categorical menus were clearly more effective than
either of the other two types. Of
particular significance for the online catalog environment, wherein many users
are novice or casual users who are unfamiliar with catalog designs, this
experiment showed that categorical menus were even more effective when study
participants were uncertain about what they were looking for.
Later studies show that not only are categorical displays
more effective than other types of displays, but moreover, those that are
constructed based on groupings common to a variety of people are even more
effective. Hayhoe
(1990)[2] used an unconstrained sorting task methodology
similar to the methodology used in this study.
In that study, he assigned a sorting task to two types of study
participants, experts and non-experts in a particular subject area. He then performed one cluster analysis on the
groupings of the experts, another one on the groupings of the the non-experts, and a third one on the groupings of all
participants together (combined participant grouping). He then created different menus, one each
based on the results of the three cluster analyses, and one that reflected each
individual participant’s own particular grouping. In Hayhoe’s study,
the combined participant grouping was found to be more effective for menu
construction than groupings created by experts.
Surprisingly, combined participant groupings were also more effective
for participants than the groupings they had created for themselves.
Many of the studies investigating user-based menu categories
have employed the method used in this paper, cluster analysis, to derive
categories from users (Lewis, 1991). A
common methodology that employs cluster analysis to derive user categories is
the unconstrained sorting task. In an
unconstrained sorting task, participants are given a set of items, frequently
cards with words and definitions printed on them, and asked to sort them into
as many groups as they like based on item similarity. This methodology is also referred to as free
clustering, free sorting, F-sorting, and
bottom-up sorting. Sorting task studies
have been used in psychological studies to discover the characteristics people
use to group words or other items (e.g., Miller, 1969). Sorting tasks of various types have also been
used as a knowledge elicitation technique for the creation of expert systems
(for a summary, see Cooke, 1994) and to assist in the design of large WWW
sites.[3] In these cases, the underlying assumption is
that categories derived from user conceptual schemas are more effective than
other types of categories.
User-centered information retrieval system designs
incorporating user-based categorizations are, unfortunately, rather rare. One that has not only been tested and
evaluated, but implemented in a real-life setting is BookHouse
(Rasmussen, Pejtersen, and Goodstein, 1994). BookHouse is a
fiction catalog developed for library users.
Its icon text-based interface design was based on an extensive study of
user needs, expectations, and conventions (Chapters 9-11). In the evaluation stage, two traditional
prototype fiction retrieval systems, one a traditional command driven interface,
and the other, a text-based interface with mouse capabilities, were developed
for testing in a real library setting against the icon text-based design (BookHouse). A
variety of methods of evaluation were combined, including questionnaires,
interviews, observations, and transaction log analyses. The BookHouse
design was shown to be more effective overall than the other, more traditional
systems (Chapter 12).
The unconstrained sorting task
methodology has been used previously in library and information science
research to discover the categories individuals use to group images. Vidal (1995) asked 58 study participants to
sort 48 images of the
4. Research Design
The underlying assumption of this
research was that if IRSs reflect the ways in which
people themselves organize documents, they would be more responsive to users'
information needs. The specific goal of
the research was to begin to understand how people organize documents that
comprise a particular voluminous work set.
The study addressed two questions:
What are the common groups that are created by people when they
categorize documents related to a voluminous work? and, What are the
characteristics that people use to group documents related to a voluminous
work? To answer these questions, an
unconstrained sorting task was assigned to study participants. Documents related to a particular work were
sorted based on participant perceptions of how similar the documents were to
each other and on the participants' perceived ability to retrieve them at a
later time. To answer the second
question, participants wrote descriptions of each group identifying the characteristics
used to create the group. The results of
the qualitative analysis of the written descriptions appear in Carlyle
(1999).
A pilot study was conducted to test
the methodology and to determine the extent to which participants might be
influenced by the physical format of documents when grouping. In the pilot study, half of the participants
grouped actual documents and half grouped photocopies of covers, title pages,
and other parts of the documents. It was
thought that the participants grouping actual documents might be more
influenced by the physical format of the documents than participants given
photocopies of the documents and, thus, that the characteristics they used for
grouping would differ. Analysis of the
results revealed that the difference between characteristics used for grouping
documents versus photocopies was negligible.
Because pilot participants appeared to be more comfortable and engaged
in the task when grouping actual documents, and because the photocopies
sometimes stuck together, making the data collection process somewhat prone to
error, actual documents were used for the main study.
Fifty participants 18 years of age
or older, who could read and write in English, agreed to participate in the
study, which was conducted at the Chapel Hill Mall in Akron, Ohio, U.S.A. Complete information regarding participants
reported in Carlyle (1999) is briefly summarized here. Almost half (48%) of the participants
reported reading at least one book per month.
More than half (66%) reported visiting a bookstore or library at least
once per month. Sixty-four percent of
the participants were between the ages of 18 and 30, and 63 percent had not
completed an undergraduate degree. More
than half (60 %) were female.
Participants were paid ten U.S.
dollars to spend approximately 30 to 45 minutes grouping 47 documents related
to Charles Dickens' A Christmas Carol. A
Christmas Carol was selected for the study because many different types of
edition and work related to it were in print and were easily obtainable. Documents used in the study included book,
video, and sound recording editions as well as such unusual editions as an
Advent calendar and a book of Christmas
Carol trivia (see Appendix 1 for complete bibliography of documents used).
The study
was conducted at a shopping mall for a variety of reasons. It seemed more likely that a shopping mall
venue, as opposed to a university or library venue, would insure that a wide
variety of persons would participate in the study. One of the objects of IRS design is to make
them reflect the expectations of any potential user. As libraries may be intimidating and
threatening to some users (e.g., Mellon, 1986), research in interface design
benefits from study of a variety of potential users outside of traditional
information seeking environments. In
addition, it was hoped the mall setting would inhibit the influence of
traditional groupings present in libraries, making participants more likely to
rely on their own preferences for grouping.
Participants were asked to sort
documents for A Christmas Carol into
as many or few groups as they wanted based on how similar they were to each
other. They were told that the purpose
of the groups was to help them find the documents later on; in other words, the
characteristics they used to create the groups should help them remember where
to find the documents later.
Participants were also asked to write down names and descriptions for
each group. After participants completed the written task, they were also asked
to fill out a brief form asking for demographic information.
5. Cluster Analysis Results
Hierarchical cluster analysis was
used to determine common groupings of documents based on a comparison of the
composition of all the groups created by participants in the study. Data collected on which documents appeared
together in groups for each participant were compiled, and the cluster analysis
calculated the frequency with which any two documents were placed in the same
group by all of the study participants.
Clusters were formed one step at a time, representing those documents
grouped together most frequently.
Clusters composed of one or more documents may merge with other single
documents or with other clusters at any step of the analysis.
To summarize sorting data using
cluster analysis, the first step is to determine a dissimilarity, or distance,
measure (Aldenderfer and Blashfield,
1984). Distance measures allow the
researcher to compute the proximity or distance of every item in the sample to
every other document. When using sorting
data, a distance measure usually represents the frequency with which
participants place any two items into separate groups. Distances among all the items are calculated,
a distance matrix is constructed, and clustering is accomplished by placing
into groups those items that are closest to each other.
The distance measure used for this
study was the percentage of participants who placed any two documents into
different groups, a common distance measure used with sorting data
(Dunn-Rankin, 1983, pp. 36-37; 41-42).
An SPSS program using Ward's method, an agglomerative hierarchical
clustering method, was selected to cluster the documents. Agglomerative hierarchical clustering methods
begin by regarding each individual document as its own separate cluster and
proceed one step at a time by merging clusters, one at a time, until all
documents are merged into a single cluster.
As this study had 47 documents, the cluster program clustered in 46
steps.
In order to select a particular
level of grouping as the most natural, one examines the distances, or fusion
coefficients, between any two documents or groups as they are merged at a given
step. The step immediately before the step
at which the fusion coefficient becomes quite large is the step that should be
selected as the natural grouping of
the documents. In other words, when
groups or items are merged which are quite distant from each other, the merging
of groups or items should stop (Aldenfelder and Blashfield 1984, 53-58).
Selection of the natural grouping of items is made somewhat difficult
when the distances between coefficients do not differ greatly, which was the
case here. However, the average number
of groups formed by participants in the study was 7 (median 7, mean 7.3;
smallest number of groups formed: 3;
largest number of groups formed: 13).
Thus, it makes sense to look for a step near to 7 groups that shows the
largest difference between fusion coefficients.
A six-cluster solution was selected as best representing the data
because it showed the largest difference between coefficients.
The clusters presented below are
numbered according to the step at which they were formed in the clustering
process. Knowing when the clusters
formed is of interest because the documents comprising the clusters formed in
the first steps may be viewed as "more alike" than documents in
clusters formed in later steps because study participants grouped them together
more frequently. In each cluster,
numbers appearing after individual documents indicate the step at which they
were merged with the document(s) appearing in the list before them. Blank lines between documents indicate the
formation of sub-clusters within a larger cluster. Thus, in cluster 2, the two Japanese versions
clustered first at step 6; then the two Spanish versions clustered at step
8; at step 16, the Spanish and Japanese
clusters merged; and at step 29 the French version merged with the
Japanese/Spanish cluster, completing cluster 2.
The two Japanese versions formed at the earliest step, and may therefore
be regarded as the "most alike" because participant grouped them
together more frequently than, say, the Spanish and French versions.
The presentation of each of the
clusters is accompanied by a discussion and analysis of the types of attribute
that are present in the documents that comprise the cluster. Most of the types of attribute discussed are
taken from the qualitative analysis of the written data. In the qualitative analysis, participant descriptions
of their groups were analyzed and the researchers derived types of attribute
present in the descriptions. Types of
attribute discovered in the qualitative analysis and identifiable in the
clusters formed here are:
physical format usage
audience language
content description physical
characteristics
pictorial elements content age or integrity.
Physical format is distinguished from physical characteristics in the
following way: physical format refers to document type, for example, book, videorecording, or cassette; physical
characteristics refer to physical attributes of individual documents, for
example, document size (small books, big books).
Cluster
1 (formed at step 11): Sound Recordings
Cassette
version. Performed by Sir Lawrence
Olivier.
CD
version. Performed by Patrick
Stewart. (1)
Cassette
version. Performed by Patrick Stewart. (merged with above at step 4)
Cassette
version. Read by Geoffrey Palmer. (merged
with above at step 11)
The first cluster was formed at step
11, very early in the clustering process, with two of the documents forming the
very first subcluster created. That participants found these documents very
similar is also shown by the fact that no separate subclusters
of documents were formed; once the initial subcluster
formed, the additional documents were simply merged with the original subcluster. The type
of attribute present in all of the documents that undoubtedly accounts for the
formation of this cluster is that all of the documents are sound
recordings. Although sound recordings
represent two different physical formats, cassettes and CDs, the grouping of
these documents together may indicate that participants found these formats to
be similar. In the qualitative analysis,
physical format was the predominant
type of characteristic identified by participants in their descriptions of
grouping by a large margin, so it would be expected that physical format
would appear frequently in the clusters themselves.
Cluster
2 (formed at step 29): Non-English
Language Versions
A. Japanese hardcover version.
Japanese paperback cartoon version. (6)
B. Spanish paperback. 1990.
Spanish paperback. 1993. (8)
C. French paperback. 1994.
Subcluster A. merged with subcluster
B. at step 16.
Subcluster A./B. merged with C. at step 29.
Two attributes are shared by all of
the documents in the second cluster, language
(all are non-English) and physical format
(book). The subclustering
of the Japanese and Spanish versions occurred very early in the clustering
process, showing that they were very frequently grouped together by study
participants. The fact that the French
language paperback did not cluster early with the other groups may be explained
by a subjective observation made during the data collection, which is that this
paperback looked very much like the English language paperbacks (unlike the
other non-English language editions), and it seemed that some of the
participants did not notice that it was not in English because they grouped it
with other, similar-looking English language paperbacks. I suspect that if all of the participants had
been aware that this was a French version, this cluster would have formed
earlier. The language attribute was identified at least once by a large majority
of the participants (86 percent) in their written descriptions as a
characteristic they used to group documents, and so, again, it is expected that
language will appear in the cluster formations.
Cluster
3 (formed at step 34): Paperback
Versions
A. Pocket Books.
B. Airmont.
Bantam Books. (9)
C. Pocket Library.
Watermill Classic. (14)
D. A
Christmas Carol and Other Christmas Stories. Signet.
A
Christmas Carol and Other Christmas Books.
Dent. (13)
The
Christmas Books, Volume 1, A Christmas Carol, The Chimes. Penguin.
(merged
with above at step 21)
E. Puffin Books.
Subcluster A. merged with subcluster
C. at step 19.
Subcluster A./C. merged with subcluster
B. at step 23.
Subcluster A./B./C. merged with subcluster
D. at step 31.
Subcluster A./B./C./D. merged with subcluster
E. at step 34.
Cluster 3 is composed of all of the
English language (language), adult (audience) paperbacks (physical format) in the sample, and
includes as well, in subcluster E, two paperback
editions that could be considered to
be aimed at a young adult or older child audience. Audience
was also a frequently mentioned attribute type in the written descriptions, so
it should appear frequently in the clusters and subclusters
formed. The documents in this cluster
are also identical in that they all are unabridged editions of A Christmas Carol (content age or integrity), and they have either few or no
illustrations (pictorial elements). The first two documents merged in subcluster A are virtually identical paperback
editions; the covers vary only in color
(one cover is red, the other green) and other minor differences in layout.
The second two subclusters
reflect documents that have similar physical
characteristics. Thus, the two
documents in subcluster B have nearly identical
dimensions. The same is true of the two
documents in subcluster E, except that the documents
in subcluster E are somewhat larger sized paperbacks
than the documents in subcluster B. Subcluster D is of
particular interest because it represents both physical format (paperback) and audience
(adult), but also represents a content
description characteristic--these documents represent the only English
language paperback collections, of which A
Christmas Carol is only one of the stories.
Although it seems odd initially that the collections merged with the
earlier subclusters prior to merging with subcluster E, this might be explained by looking at physical characteristics. The first four subclusters
are all comprised of smaller sized paperbacks, while the paperbacks in subcluster E are a somewhat larger size.
Cluster
4 (formed at step 38): Videorecordings
A. A
Flintstones Christmas Carol.
The
Muppet Christmas Carol. (2)
Mickey's Christmas Carol. (3)
B. A Christmas Carol. (George C. Scott )
An
American Christmas Carol. (Henry Winkler)
(7)
Scrooge. (Albert Finney) (merged
with above at step 10)
A
Christmas Carol. (Alastair Sim) (merged
with above at step 17)
Subclusters A. and B. merged at step 38.
Cluster 4 is another obvious
instance of the predominance of physical
format and language in
grouping. In this case, the format is videorecordings and the language, again, English. The formation of Cluster 4 is quite
striking: the two subclusters
formed early; subcluster
A formed at step 3 and subcluster B at step 17, and both remained intact until they merged at
step 38. The difference between the two subclusters is audience. Subcluster A
consists of children's videorecordings, and subcluster B of adult videorecordings. The early formation and late merging of the subclusters indicates the significance of the audience attribute for grouping.
Cluster
5 (formed at step 40): Hardcover
versions
A. Baronet Books. Adapted by Malvina
G. Vogel. Illustrations by Pablo Marcos Studio.
Dial Books. Illustrated by Michael Foreman. (merged
with above at step 18)
B. Abridged
by Vivian French. Illustrated by Patrick
Benson. Candlewick Press.
A Christmas Carol: Adapted for Theater. Andrews
and McMeel. (20)
C. Weathervane Books. Illustrated by Arthur Rackham.
Christmas
Stories: A Christmas Carol, ... McLoughlin Brothers.
(1913?) (22)
D. Mula, Tom. Jacob Marley's Christmas Carol.
A
Christmas Carol and Other Stories. Modern Library. (25)
A Christmas Carol: A Facsimile Edition ... Pierpont Morgan
Library.
(merged with above at step 26)
Davis, Paul. The
Lives and Times of Ebenezer Scrooge.
(merged with above at step 36)
Subcluster A. merged with subcluster
C. at step 30.
Subcluster A./C. merged with subcluster B. at step 39.
Subcluster A./B./C. merged with subcluster D. at step
40.
The dominant attributes present in
cluster 5 are, once again, physical
format and language. All of the documents in this cluster are
English language hardcover versions. Audience attributes, children and adult,
are also present. The first three subclusters consist of documents intended for older
children, and the last subcluster consists of adult
documents. The fact that the first three
subclusters merged with the last subcluster
very late in the clustering process again indicates the importance of audience characteristics in
distinguishing documents.
Subclusters
A and C probably grouped together earliest because they are similar in
size. In addition, all but one of the
documents in these subclusters are unabridged editions
(a content age or integrity attribute), and the adapted document
"looks like" an unabridged edition; that is, it is relatively thick
and carries no indication on the cover that it is an adaptation. The only difference discernable between the
documents in these two subclusters is that the
documents in subcluster C appear to be somewhat older
than the documents in A (a physical
characteristic attribute). The
documents in subcluster B are large-sized picture
books. They differ from the picture
books in cluster 6 in several ways.
First, they contain more text than the documents in cluster 6 and
second, they are intended for an older-child audience. In addition, these are "true"
hardcover documents in that they have cloth binding with dust jackets; the
documents in cluster 6 have "hard" covers, but they are made of
cardboard.
Subcluster
D is somewhat unusual in that the documents comprising it are as dislike as any
of the subclusters formed among all of the clusters,
which is reflected in how late this subcluster was
formed (step 36). The first two
documents were merged at step 25, more than half way through the clustering
process. These two documents are alike
in that they are hardcover documents of a similar small size. The content of these two documents, however,
is different. One is a retelling of the
Dickens story, while the other is a collection of Dickens short stories. The
next document to merge with them is a much larger sized hardcover document, which
consists of a facsimile of the original manuscript with the text in
typescript. The last document to merge
is also a large-sized hardcover document, but it consists of a criticism of A Christmas Carol. Perhaps one explanation of why these
documents grouped together is that at least three of them are quite unlike any
of the other documents in the sample (the retelling, the facsimile, and the
criticism), and the fourth, although a collection like the collections in
cluster 3, is a hardcover that has somewhat similar physical characteristics to
the retelling.
Cluster
6 (formed at step 41): Children's and
Activity Versions
A. Taylor,
Mark A. The Christmas Carol. Landoll's.
Disney's
Mickey's Christmas Carol. Mouse Works. (15)
B. A Christmas Carol. Retold by I.M. Richardson. Troll Associates.
Dubowski,
Cathy East. Scrooge. Grosset & Dunlap.
(27)
C. A
Christmas Carol, Easy Piano Picture Book. Faber &
Faber.
A Christmas Carol Story Book Set &
Advent Calendar. Workman
Publishing. (28)
D. A Christmas Carol. Dramatized by
Luxfield Consultants (adapters). A
Christmas Carol.
Readers. (32)
Sammon, Paul. The Christmas Carol Trivia Book. Citadel Press.
(merged with above at step 37)
Subclusters A. and B. merged at step 33.
Subclusters A./B. and C. merged at step 35.
Subclusters A./B./C. and D. merged at step 41.
The last cluster formed obviously
represents the documents that were seen as most dissimilar by the participants,
because they were placed less frequently in the same groups. Nonetheless, interesting similarities do
exist among the majority of the documents in the cluster. All of the documents are English language
documents. All but one (the Advent
calendar) are books (physical format). With only two, and possibly three exceptions
(Sammon's The
Christmas Carol Trivia Book, Payne's dramatized version, and possibly the
Oxford Progressive English Reader), the documents are all aimed at a children's
audience, and are heavily illustrated with mostly color illustrations (pictorial elements).
In addition, several of the
documents represent the usage
attribute type; that is, they are documents that could be used for activities
other than passive reading. These
documents include the Advent calendar and the piano book (subcluster
C); the large sized picture books (subcluster A),
which were sometimes described as documents could be used to read to children
at bedtime; and possibly the documents in subcluster
D, one of which is a play version, one a version to be used for learning
English, and the third a trivia book which could be used for a game or at a
party. Physical characteristics may explain the first two subclusters, in that the two documents in subcluster A are both cardboard cover, very large sized
children's picture books, and the two documents in subcluster
B are both paperback, very thin, medium sized children's picture books.
6. Discussion
The most frequently appearing
attributes discovered in the cluster analysis include: physical format, language, and audience;
other attributes that appear are: content age or integrity, physical
characteristics, pictorial elements and usage. Dominant attributes, that is, ones used by a
majority of study participants in their written descriptions, discovered in the
qualitative study, include: physical
format, language, content description, audience, pictorial
elements, and usage (Carlyle, 1999).
Thus, all of the attributes that were frequently identified in the
qualitative study of written descriptions were present in the clusters
described here, indicating that the written descriptions were largely accurate
identifications of the attributes actually used by participants when they
created their groups.
The most notable difference between
the results of the qualitative analysis of written descriptions and the cluster
analysis is the relative scarcity of the physical
characteristics attribute in the written descriptions (only 4 percent of
all written descriptions mentioned physical
characteristics, and only 20 percent of the participants mentioned physical characteristics at all), as
opposed to the comparatively frequent presence of physical characteristics in subcluster
formation. Because physical characteristics played a larger role in the formation of subclusters than clusters, it seems reasonable to speculate
that the physical characteristics played only a minor, and perhaps
subconscious, role in participant grouping.
7. Implications
7.1 Implications
for catalog and other information displays
Library cataloging records already
contain indicators representing many of the attributes identified in this
study. In fact, card catalogs featured
arrangements that grouped cards based on several of the attributes discovered
in the study, including language and content age or integrity. If one takes differences in physical format,
audience, and usage to indicate significant changes in text, which they often
do, then card catalog arrangements also reflected these attributes. Library classification numbers contain
indicators for some of the attributes identified by participants in this
study. Classification numbers created
using the Universal Decimal Classification (UDC), for example, contain
indicators of physical format, audience, language, and content age or integrity
(Carlyle, 1997b). Indeed, current
bibliographic (MARC) records existing in online catalogs could be used
automatically to create groupings based on many of the attribute types
identified here (Carlyle, 1999).
One of the problematic features of
card catalog displays was that not all groupings of cards were identified or
flagged in any way, so that users may not have been aware of them, or may have
been confused by them. In the online
environment, groupings may be made clear using images such as boxes to indicate
clusters. In addition, all of the groups
may be labeled with simple, informative labels indicating the attributes of the
items present in the cluster [Fig. 3]
[Fig.
3 about here]
Figure
3 is an example of how a clustered display of a voluminous work might appear in
an online catalog or other information retrieval system. A variety of flexible searching features
could be available. By clicking on a
box, users could browse either another, sub-clustered display (for example, nonbook formats could be subclustered
into groups for specific formats) or a brief listing of items. Records could be sorted by a variety of
attributes, for example, date, publisher name, format, or language. Users could also search for specific
attributes such as a translator or illustrator’s name. Preliminary findings of a current research
project indicate that a display such as this could be created largely
automatically (Carlyle & Summerlin, 2000). The clustered display shown in Fig. 3 would
be a useful default display for a work, especially helpful to people who do not
have a particular edition in mind when they begin their search. People who have particular specifications
could use the “Sort” or “Search” boxes to identify the specific editions they
are looking for if the clusters do not feature attributes useful to them such
as publisher or date.
Another approach making use of
clusters that could be used to display voluminous works is a method that
attempts to aid navigation of large information spaces referred to as
distortion-oriented presentation (Leung and Apperley,
1994). In distortion-oriented displays,
an entire information space is displayed, however, focus on one small part of
the space is enlarged, while all of the other space is made smaller. Any of the cluster attributes identified in
this study could be used as a focal point for a distortion-oriented
display. The contents of one cluster at
a time could then be presented in the focal information space.
7.2 Implications
for metadata standards[4]
Many resources exist on the Internet
that represent editions of voluminous works or related items and as a
consequence, the organizational issues are similar. For example, a search on “christmas
carol charles dickens” in a search engine retrieves a
wide variety of editions, adaptations, and other sites related to the original
text. Metadata standards, particularly
general standards such as the Dublin Core (DC; available at: http://purl.org/metadata/dublin_
core_elements), are being developed to aid resource
discovery on the World Wide Web. Thus,
these standards may be analyzed to determine the extent to which they
facilitate the identification of attributes discovered in this research. In comparison to cataloging standards, metadata
standards are spare; DC identifies only fifteen elements to promote item
identification and discovery. Two of the
three dominant attributes identified here are present in DC: language (DC “language”) and physical format
(DC “format”). The “audience” attribute
is missing. Because DC is extensible,
specialized applications of DC may add attributes. Indeed, at least one extension of DC, the
Gateway to Educational Materials (GEM: available at http://www.thegateway.org)
element set, has added audience as an element.
Some of the other attributes
discovered here, which would also be useful in clustering documents available
on the Internet, are not often available explicitly in metadata standards. For example, clustering electronic documents
that share complex relationships using attribute types such as content or age characteristics (abridged
vs. unabridged versions, lower reading skill level versions, adapted versions,
etc.) would be difficult in the current Internet environment because these
attributes are seldom identified explicitly in metadata standards.
An
important problem with metadata standards is that, for the most part, they do
not require use of standardized vocabularies for naming entities (e.g., Rust,
1998). For example, if one were using DC
to describe items, various translated editions of a work might include the
translated titles in the DC title element.
As a result, retrieval of all of the editions and versions of a
particular work would be inhibited because different editions would have
different titles. DC offers a solution
to this problem in their DC source or relation elements where inclusion of
standardized author and title information could facilitate retrieval of all of
the items related to a work on the Web.
7.3 Implications
for digital libraries
If taken broadly, the findings of
this study have a variety of possible implications for the design of digital
libraries. It is useful to begin by
looking at the attributes identified by study participants and analyze their
place in the current organization of and provision of access to library
materials. All of the attribute types
identified by study participants are identified and used in libraries beyond
their identification in cataloging records.
Needs for several of the most frequently identified attribute types, physical
format, language, and audience are often addressed in
libraries by physical arrangements of materials, as when libraries have
separate sections for videorecordings, sound
recordings, maps, and books; for non-English language materials; and for adults
and children. While these physical
arrangements are often recorded in holdings information in cataloging records,
they are not used in catalog arrangements or displays, but for location and
identification purposes only, usually in single record displays.
Other attribute types having to do
with physical or content features (content description, pictorial elements,
physical characteristics, content age or integrity attributes) are
described in cataloging records or are discovered by users only when they
handle physical documents. To some
extent, these attributes are already described in cataloging records.
Finally, the usage attribute
type is addressed in libraries in a number of ways. First, librarians themselves often give
suggestions for some of the uses suggested by study participants (books good to
read to children at bedtime, books good for introducing other cultures to
primary school students). Second,
librarians produce specialized lists of items in their own collection meeting
usage needs; for example, a printed bookmark with a brief list of scary
books. Third, libraries collect
published bibliographies that group items by various uses.
What is most interesting to note
about all of these means by which libraries address the attribute types
identified in the study is that virtually none of them include the structure of
catalog displays; in other words, catalog displays seldom highlight or display
information based on the types of attributes discovered in the study. Some of these attribute types are available
only as limits after a particular search has been done; for example, limiting
by language. Some, as mentioned above,
may be seen only when individual cataloging records are displayed.
In a world in which remote access to
the library collection is not possible, it may be that the present means of
addressing the attribute types described above is sufficient. However, we no longer live in such a world. Many users make use of library catalogs
remotely, requesting documents to be delivered to them in their homes or
offices. More and more documents are
electronic, and may be printed out at the user’s printer or read or experienced
from the user’s computer. These
documents cannot be physically located next to tangible documents held by the
library. One of the intriguing
implications of the study is, thus, that attributes such as physical format,
audience, and language be incorporated into catalog structures. For example, catalogs could be designed to
provide the ability to browse materials of specific format types such as videorecordings or in a specific language or for a specific
age group. Most of the information
necessary to create such browsable displays is already contained in cataloging records
and could be harnessed to implement them.
Much could also be done in a virtual
environment to present some of the attributes.
For example, more cataloging records could include tables of contents to
enhance content description, or catalogs could use graphics to signify that a
document contains illustrations or other pictorial elements. The usage attribute would be harder to
implement because usage information is rarely included in cataloging records. Libraries may want to consider, as some already
have, providing specialized document lists online in addition to in print, or
providing readers advisory services online.
This would provide a virtual counterpart to the current practice of
providing such lists or advice in house.
8. Future Research
This study represents a first step
toward improving displays for voluminous works in IRSs. Future research could begin by investigating
a variety of different types of voluminous work. The "typical" composition of a set
of documents related to a particular voluminous work is unknown; moreover, it
is likely that the composition of work sets varies a good deal from one work to
another. Thus, the notion of a
"typical" work set may be inappropriate. For example, documents related to the Koran, Charles Darwin's Origin of Species, or Amy Tan's Joy Luck Club are doubtless different in
character from those related to A
Christmas Carol. Attributes of the Christmas Carol work set used in this
study that may make it different from other work sets are the presence of a
wide variety of audio-visual and children's versions and the relative scarcity
of documents about A Christmas Carol. Attributes of a work such as the audience the
work is intended for, whether the work is fiction or non-fiction, and physical
format of the original edition undoubtedly affect the ultimate composition of a
work set. In addition, future studies
could investigate electronic documents such as those commonly found on the Web,
which may vary in character from documents traditionally handled by IRSs. Upon
completion of research investigating different types of work, further research
is needed in which prototype systems incorporating organized displays for
voluminous works are tested to determine whether or not such displays actually
do result in more effective IRSs.
As stated above, most of the
attributes identified in the study are already included in cataloging
records. However, problems are inherent
with two of the attributes, audience and usage. Audience is an attribute frequently identified
in the qualitative study, and it played a significant role in cluster creation
as well. However, it is an attribute
that may be seen as somewhat subjective.
Currently in library cataloging, documents are (somewhat inconsistently)
identified as either being for children or not; further, distinctions are not
made with respect to how old the children are intended to be.[5] Investigation of the extent to which such
distinctions might be helpful is an obvious avenue for future research.
The other attribute, usage,
has to do with how a document could be used.
Indexing and cataloging practice has seldom, if ever, allowed or
encouraged the indexing of this type of highly subjective “attribute.” Often in the qualitative study, probably
because of the type of work selected, usage descriptions had to do with the
utility of the document for reading to children or using in a classroom. In the cluster analysis, one of the clusters
manifested the usage attribute, in that several of the documents
clustered there were obviously created for specific uses (e.g., piano book,
Advent calendar). Others could be easily
seen as being useful for activities such as bedtime story reading or classroom
use. This confirms the findings of the
qualitative study that usage could be an important attribute for
indexing. Research is needed to
determine whether or not it actually would be helpful to users to index such an
attribute, given the subjectivity involved in its assignment and the difficulty
indexers might have assigning it.
9. Conclusion
Known-item searches, far from being
uninteresting and non-problematic, pose a myriad of fascinating challenges to
IRS interface designers. In addition,
solutions to problems presented by these searches may lead to exciting
innovations in the structure and quality of information system displays. Incorporation of alternatives to the
long-list retrieval model in our IRSs has the
potential to enhance the information environment of users by increasing their
ability to identify documents of interest quickly and efficiently. Organized displays featuring categorization,
clustering, or classification may serve users well for a wide variety of
information needs.
Aldenderfer,
M. S. & Blashfield, R. K. (1984).
Cluster analysis. Quantitative
applications in the social sciences, Series no. 07-044.
Carlyle, A. (1999).
User categorisation of works: Toward improved organisation
of online catalogue displays. Journal of
Documentation, 55 (2), 184-208.
Carlyle, A. (1997a).
Fulfilling the second objective in the online catalog: schemes for organizing author and work
records into usable displays. Library
Resources & Technical Services, 41, 79-100.
Carlyle, A. (1997b).
The role of classification in the creation of author and work displays
in online catalogues. In Knowledge
organization for information retrieval:
Proceedings of the Sixth International Study Conference on
Classification Research, held at University College London, 16-18 June 1997. FID 716 (pp. 90-96).
Carlyle, A. (1996).
Ordering author and work records:
an evaluation of collocation in online catalog displays. Journal of the American Society for Information
Science, 47, 538-554.
Carlyle,
A. & Summerlin, J. (2000). Transforming catalog displays: Record clustering for works of fiction.” In Beghtol, C., Howarth, L. C., & Williamson, N. J. (Eds.), Dynamism
and stability in knowledge organization:
Proceedings of the Sixth International ISKO Conference, 10-13 July 2000,
Toronto, Canada (pp. 320-326).
Cooke, N. J. (1994).
Varieties of knowledge elicitation techniques. International Journal of Human-Computer
Studies, 41, 801-849.
Dunn-Rankin, P. (1983).
Scaling methods.
Faiks, A. & Hyland, N. (2000). Gaining user insight: A case study illustrating the card sort
technique. College & Research
Libraries, 61 (4), 349-357.
Fidel, R.
(1994). User-centered
indexing. Journal of the American
Society for Information Science, 45 (8), 572-576.
Hayhoe, D.
(1990). Sorting-based menu
categories. International Journal of
Man-Machine Studies, 33, 677-705.
Hearst, M. A. & Pedersen, J. O. (1996).
Reexamining the cluster hypothesis:
Scatter/Gather on retrieval results.
In H.-P. FREI, Ed. Proceedings of
the 19th Annual International ACM SIGIR Conference on Research &
Development in Information Retrieval (pp. 76-84).
Jackson, S. L. (1958).
Vostecky, V. (Ed.), Catalog use study.
Jörgensen, C.
(1995). Image attributes: An investigation. Ph.D. diss.,
Karat, J., Atwood, M. E., Dray, S. M., Rantzer, M. & Wixon, D.
R. (1996). User centered design: Quality or quackery? CHI ’96, Conference Proceedings on Human
Factors in Computing Systems (pp. 161-162).
Kinnucan, M. T.
(1992). Fisheye views as an aid
to subject access in online catalogues.
Canadian Journal of Information Science, 17 (2), 25-40.
Larson,
R. R. (1991). Classification clustering, probabilistic
information retrieval, and the
online catalog. Library Quarterly, 61, 133-173.
Leazer, G. H. & Smiraglia,
R. A. (1996). Toward the bibliographic control of
works: Derivative bibliographic
relationships in an online union catalog.
In Digital Libraries '96 (pp. 36-43).
Leung, Y. K. & Apperley,
M. D. (1994). A review and taxonomy of distortion-oriented
presentation techniques. ACM
Transactions on Computer-Human Interaction, 1 (1), 126-160.
Levy, D. M. (1995).
Naming the nameable: Names,
versions, and document identity in a networked environment. In Scholarly publishing on the electronic
networks: Filling the pipeline and
paying the piper.
Lewis, S.
(1991). Cluster analysis as a
technique to guide interface design.
International Journal of Man-Machine Studies, 35, 251-265.
Lin, X.
(1997). Map displays for information
retrieval. Journal of the American
Society for Information Science, 48, 40-54.
Lubetzky, S.
(1963). Function of the main
entry in the alphabetical catalogue--one approach. Working paper no. 2. In International Federation of Library Associations. International Conference on Cataloguing
Principles,
Lubetzky, S.
(1969). Principles of
Cataloging. Final Report. Phase I:
Descriptive Cataloging.
Massicotte, M.
(1988). Improved browsable displays for online subject access. Information
Technology and Libraries, 7 (4), 373-380.
Matthews, J.,
McDonald, J. E. & Schvaneveldt,
R. W. (1988). The application of user
knowledge to interface design. In Guindon, R. (Ed.), Cognitive science and its applications
for human-computer interaction (pp. 289-338).
McDonald, J. E., Stone, J.
D., & Liebelt, L. S. (1983). Searching for items in menus: The effects of organization and type of
target. Proceedings of the 27th
Annual meeting of the Human Factors Society (pp. 834-837).
Mellon, C. (1986).
Library anxiety: A grounded
theory and its development. College and
Research Libraries, 47, 160-165.
Miller, G. A. (1969). A psychological method to investigate verbal
concepts. Journal of Mathematical
Psychology, 6, 169-191.
Panizzi, A.
(1985). Mr. Panizzi
to the Right Hon. the Earl of Ellesmere.--
Pettee, J.
(1936). The development of
authorship entry and the formulation of authorship rules as found in the
Anglo-American code. Library Quarterly,
6, 270-290.
Rasmussen, J., Pejtersen,
A.M., & Goodstein, L.P. (1994). Cognitive systems engineering. (Chapters 9-12).
Rust, G.
(1998). Metadata: The right approach, an integrated model for
descriptive and rights metadata in e-commerce.
D-Lib Magazine (July/August). (http://www.dlib.org/dlib/july98/rust/07rust.html)
Schaffer,
D., Zuo, Z., Greenberg, S., Bartram, L., Dill, J., Dubs, S., & Roseman,
M. (1996). Navigating hierarchically clustered networks
through fisheye and full-zoom methods.
ACM Transactions on Computer-Human Interaction, 3 (2), 162-188.
Svenonius, E.
(1988). Clustering equivalent
bibliographic records. In: Annual review of OCLC research, July
1987-June 1988 (pp. 6-8).
Tillett, B. B. (1991). A taxonomy of bibliographic
relationships. Library Resources &
Technical Services, 35 (2), 150-158.
Vidal, N. K. (1995).
Experimental image taxonomy: An
inquiry into spontaneous image organization.
Master's thesis,
Wiberley, S. E., Daugherty, R. A., & Danowski, J. A. (1995).
Displaying online catalog postings:
LUIS. Library Resources &
Technical Services, 39, 247-264.
Wilson, S., Bekker, M., Johnson, P.
& Johnson, H. (1997). Helping and
hindering user involvement – a tale of everyday design. CHI ’97, Conference Proceedings on Human
Factors in Computing Systems,
Yee, M. M. (1995).
What is a work? Part 4: Cataloging theorists and a definition
abstract. Cataloging &
Classification Quarterly, 20, 3-24.
Yee, M. M. & Layne, S. S. (1998).
Improving online public access catalogs.
Zamir, O., Etzioni,
O., Madani, O., & Karp, R. M. (1997).
Fast and intuitive clustering of web documents. In Heckerman, D., Mannila,
H., Pregibon, D. & Uthurusamy,
R. (Eds.), Proceedings of the Third International Conference on Knowledge
Discovery and Data Mining.
FIGURE 1.
Sample retrieval set for a keyword
search on the terms
"christmas
carol" in a large online catalog
12
Christmas carols, tuba quintet
Afro-American
carols for Christmas
* All I want for Christmas is my two front
teeth
American
children sing Christmas carols
* The annotated Christmas carol: a Christmas carol
As Joseph
was a-walking : a Christmas carol : Old English
* Batman : ghosts : a tale of Halloween in
Charles
Dickens' "A Christmas carol"
The best of
off-off Broadway
The birds'
Christmas carol, together with its sequel
The Birds'
Christmas carol
The birds'
Christmas carol, a play in one act
A Carol
Christmas
A carol for
Christmas
* A Charles Dickens Christmas
* Charles Dickens Christmas ghost stories
* Christmas carol
* A Christmas carol
* A Christmas carol, in prose, being a
ghost story of Christmas
* A Christmas carol : in seven staves
* A Christmas carol : selections
* Christmas carol : vocal score
* The Christmas carol
Christmas
favorites
A Christmas
gift
* Christmas stories
* Dickens' Christmas carol
* The facts about A Christmas carol
I'll be home for Christmas
* The lives and times of Ebenezer Scrooge
The peace
album
* Scrooge
* A tale of two cities. A Christmas carol. The chimes.
The Tubadors
Wexford
carol
White
Christmas
* The works of Charles Dickens
* Titles with an asterisk are editions of, or are related
to, or contain editions of or works related to Charles Dickens' A Christmas Carol. Titles that do not have the words
"Christmas carol" displayed have these words somewhere in the text of
the retrieved record.
FIGURE 2.
Sample retrieval set for an author/title
search on Charles Dickens/A Christmas Carol
A Christmas carol / Dickens,
Charles, André
Deutch 1998
Christmas carol. Dickens,
Charles, Prószýnskii 1998
A Christmas carol / Dickens, Charles, Holt,
Rinehart 1998
Christmas carol. Dickens, Charles, Gerstenberg, 1998
A Christmas carol. Dickens, Charles, Gallimard jeun 1998
A Christmas carol / Dickens, Charles, Fenn, 1998
Christmas carol Dickens, Charles, Candy
Cane Pre 1998
A Christmas carol / Dickens, Charles, Scholastic
Inc 1999
Christmas carol. Dickens, Charles, Edivisíon Com 1999
A Christmas carol. Dickens, Charles, Everyman,
1999
A Christmas carol / Dickens, Charles, LRS, 1999
A Christmas carol / Dickens, Charles, Umberto
Press, 2000
A Christmas carol / Dickens, Charles, North-South
Bo 2000
Christmas carol. Dickens,
Charles, Héritage, 2000
FIGURE 3.
Clustered Display for Dickens’ A
Christmas Carol

APPENDIX 1.
Complete Bibliography of Documents Used
in the Study
Cluster
1: Sound Recordings
Dickens,
Charles. A Christmas Carol. Performed
by Sir Lawrence Olivier and others.
Dickens,
Charles. A Christmas Carol. Performed by Patrick Stewart.
Dickens, Charles. A
Christmas Carol. Performed by Patrick Stewart.
Dickens,
Charles. A Christmas Carol. Read by
Geoffrey Palmer.
Cluster
2: Non-English Language Versions
Dickens,
Charles. A Christmas Carol. [Japanese
hardcover version: Dikenzu
gensaku. Kurisumasu kyaroru. Tamura Sumiyo sakka. Sekai no meisaku. Samaku Shuppan.], 1995.
(ISBN: 4-309-46568-4)
Dickens,
Charles. A Christmas Carol. [Japanese
paperback cartoon version: Dikenzu. Kurisumasu kyaroru. Muraoka Hanako yaku. Sato Aki kaisetsu. Sekai bungaku no tamatebako. Kawade Shobo Shinsha, c1994.]
(ISBN: 4-7631-8299-4)
Dickens, Carlos. Canción
de Navidad, El Grillo del Hogar, Historia de Dos
Ciudades.
Sexta edicíon.
Mexico:
Editorial Porrúa, S. A., 1990. (Spanish)
Dickens, Charles. Cuentos
Navidenños: Cancion de navidad, el poseido.
Bogota: Ediciones Universales, 1993. (Spanish)
Dickens, Charles. Un Chant de Noël.
Évreux: Gallimard, Folio Junior,
1994. (French)
Cluster
3: Paperback Versions
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol and Other Christmas Stories.
Dickens,
Charles. A Christmas Carol and Other Christmas Books.
Dickens, Charles.
The Christmas Books, Volume 1, A
Christmas Carol, The Chimes.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Cluster
4: Videorecordings
A
Flintstones Christmas Carol. [S.l.]: Hanna-Barbera
Cartoons, 1994.
The
Muppet Christmas Carol.
Mickey's
Christmas Carol.
A Christmas Carol.
An American Christmas Carol.
Scrooge.
A Christmas Carol. Blockbuster Classics, 1951. (Alastair Sim)
Cluster
5: Hardcover versions
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. A Christmas Carol. Abridged
by Vivian French. Illustrated by Patrick
Benson.
Dickens,
Charles. A Christmas Carol: Adapted for
Theater.
Dickens,
Charles. A Christmas Carol.
Dickens,
Charles. Christmas Stories: A Christmas
Carol, The Holly Tree.
Mula, Tom. Jacob Marley's Christmas Carol.
Publishing, 1995.
Dickens,
Charles. A Christmas Carol and Other Stories.
Library, 1995.
Dickens,
Charles. A Christmas Carol: A Facsimile
Edition ...
Davis, Paul. The
Lives and Times of Ebenezer Scrooge.
University Press, 1990.
Cluster
6: Childrens
and Activity Versions
Taylor, Mark
A. The
Christmas Carol.
Disney's Mickey's Christmas Carol.
Dubowski, Cathy East.
Scrooge.
Lillington,
Kenneth. A Christmas Carol, Easy Piano Picture Book. Text by Kenneth Lillington after Charles
Dickens, Illustrations by Annabel Spenceley, Carols
arranged by Timothy Roberts.
Packard,
Mary. Dickens, Charles: A Christmas
Carol Story Book Set & Advent Calendar.
Illustrations by Ray Bartkus; Story retold
by Mary Packard.
Payne, Darwin
Reid. A Christmas Carol.
Dramatized by
Luxfield Consultants (adaptors). A
Christmas Carol.
Sammon, Paul. The Christmas Carol Trivia Book.
[1]Technically,
a “work” is an abstract notion that refers to the content of an item. It is defined as “a distinct intellectual or
artistic creation” in Functional Requirements for Bibliographic
Records: Final Report. by the IFLA Study Group on the Functional
Requirements for Bibliographic Records.
Munchen:
K.G. Saur, 1998. Its physical embodiment is a “work set”; the
group of documents that represent the work.
For stylistic reasons, both of these notions will be referred to here as
“the work.”
[2]Hayhoe's article contains a useful discussion of
methodological issues of sorting studies as well as a review of previous
studies using this methodology for menu creation research.
[3]For
example, Microsoft has used sorting tasks to guide the design of their Intranet
site (Amy Stevenson, personal communication, 1997).
[5]Fixed
field codes and an audience note field exist in MARC, in which detailed
information about audience may be included;
however, in practice, they are seldom used.