USER
CATEGORISATION OF WORKS: TOWARD IMPROVED
ORGANISATION OF ONLINE CATALOGUE DISPLAYS
ALLYSON
CARLYLE
acarlyle@u.washington.edu
Graduate
Examines a user categorisation of documents related to a particular
literary work. Fifty study participants
completed an unconstrained sorting task of documents related to Charles
Dickens' A Christmas Carol. After they had finished the sorting task,
participants wrote descriptions of the attributes they used to create each
group. Content analysis of these
descriptions revealed categories of attributes used for grouping. Participants used physical format, audience,
content description, pictorial elements, usage, and language most frequently
for grouping. Many of the attributes
participants used for grouping already exist in bibliographic records and may
be used to cluster records related to works automatically in online catalogue
displays. The attributes used by people
in classifying or grouping documents related to a work may be used to guide the
design of summary online catalogue work displays.
INTRODUCTION
The content and organisation
of information presented in screen displays is critical to the successful
performance of online catalogues.
Current research indicates that many catalogue users do not look beyond
the first few screens of search results presented to them [e.g., 1], which
places a significant burden on those screens to present information clearly and
effectively. Unfortunately, information
displayed in online catalogues is frequently in the form of long lists of unorganised bibliographic records. Such lists do little to inform users of the
nature of the materials retrieved or to highlight the relationships present
among those materials.
One strategy for improving the
effectiveness of screen displays in online catalogues is to summarise
the contents of sets of retrieved records in one or two screens instead of
displaying long lists. Displays
employing classification, clustering, and other structured approaches have been
suggested as a means of summarising the contents of
large sets of retrieved records. Massicotte [2], the Association for Library Collections and
Technical Services [3], Buckland, Norgard, and Plaunt [4], Fattahi [5], and Yee
and Layne [6], among others, argue persuasively for the implementation of
catalogue displays in which the information presented is grouped or summarised. Larson
[7] and McGarry and Svenonius
[8], for example, demonstrate that bibliographic records may be clustered by
subject in online catalogue displays. Svenonius [9] proposes clustering of records representing
editions of the same work in displays based on relationships among items.
One of the questions that may be
asked prior to the construction of clustered displays for online catalogues is,
On what basis should such displays be constructed? Interface design for information retrieval
systems is frequently based on the intuition and judgement
of systems designers, or on the structure and content of the database. For instance, in much of the research cited
in the previous paragraph, clustering is based on the content of individual
records. An alternative to
system-designer or content-centered design is design based on user behaviour [e.g., 10].
Ideally, information retrieval
systems will reflect users perceptions and expectations, so that the
information presented to them is understandable, and responds effectively to
their needs. Unfortunately, few
information retrieval systems have been designed based on empirical study of
user behaviour and perceptions. One reason for this is the lack of sufficient
research on user behaviour. A notable exception is BOOK HOUSE, a fiction
retrieval system based on in-depth study of user queries for fiction [11]. The development of BOOK HOUSE was preceeded by research that included an analysis of three
hundred user-librarian conversations, identifying dimensions of works of
fiction that users mentioned in their searches for new fiction to read [12].
To construct clustered displays that
respond to user needs and perceptions, empirical research must be conducted
that identifies the categories of documents that people use naturally and
describe frequently. The research
reported here investigates how people organise
documents related to a particular work.
In the study, participants sorted documents related to Charles Dickens' A Christmas Carol into groups. For each group the participants created, they
wrote a description of the attribute or attributes used to form that
group. These descriptions are analysed using content analysis to discover the categories
of attributes users employed in sorting.
The frequency with which each of the categories was employed is also
calculated. Finally, the categories are
evaluated for their suitability for use in current online catalogue displays.
RATIONALE
Facilitation of the location of the
editions of a work is one of the functions, or objects, of the library
catalogue. This function, as articulated
by Lubetzky [13], requires the catalogue to 'relate
and display together the editions which a library has of a given work' [p.
ix]. Card and book catalogues frequently
employ highly organised displays to meet this
objective [14]. Categories of editions
and works related to a particular work were arranged in classified displays to
facilitate the location of particular editions, translations, adaptations, etc.
of that work. Classified displays serve
the second objective of the catalogue because they highlight relationships
among items by grouping like items together.
Organised
displays have seldom been incorporated into online catalogue designs. Research by Carlyle [15] shows that records
relevant to searches for individual works existing in many editions are
frequently scattered randomly among irrelevant records. Some evidence also exists indicating that
poorly designed catalogue displays contribute to search failure. For instance, a study of interlibrary loan
search failures at the University at
[an] intractable problem [resulting from an online
catalog] which lists only brief titles if there are many matches for a
search. Sometimes this can produce a
very long list of brief entries, and it can be time-consuming and frustrating
to identify a particular volume.
Apparently, many people just give up in frustration or do not understand
what is going on... [p. 234]
They
attributed six percent of interlibrary loan search failures to these long,
brief title displays.
The poor organisation
of works in information retrieval system displays has been cited as a factor
contributing to the duplicate record problem in bibliographic utilities
[17]. O'Neill, Rogers, and Oskins [18], while investigating the duplicate record
problem in OCLC, report that:
Failure to find an existing record can be due to
inability or unwillingness to thoroughly search the database. Derived keys may be inadequate to retrieve an
existing record. Improper searching
techniques or errors in a bibliographic record can also prevent retrieval. [p. 60]
One reason
for OCLC users' 'inability or unwillingness to search the database' may be the
long, strictly alphabetical lists of records retrieved from OCLC searches.
Organised,
summary displays are currently being introduced into experimental information
retrieval systems. A research team at
the
In summary, research suggests that
current online catalogue displays, which often consist of unorganised
lists of retrieved items, are inadequate, particularly for searches in which
many items are retrieved. Early research
efforts show that summary displays that classify or cluster search results may
enhance a user's ability to identify items of interest. Cataloguing theory and practice have long
supported the use of classification to enhance catalogue displays. One of the first steps in the development of
clustered-based or summary screen displays for online catalogues is to discover
the types of clusters or categories that people actually use for
themselves. This study investigates the
categories people use to organise items related a
particular work.
RESEARCH DESIGN
User
Sorting and Categorisation Studies in Information
Retrieval Research
An unconstrained sorting task
methodology, also called free-sort, F-sort, or bottom-up sort, was used to
discover how users group documents related to a particular work for
themselves. McDonald and Schvaneveldt [21] argue that research on user knowledge
employing methodologies such as the unconstrained sorting task must guide
interface design in such areas as menu construction, citing existing evidence
that such research improves system design by making menus easier to use and
understand. The sorting task methodology
has been employed in various ways to guide the design of several types of
computer systems. Lohse
et al. [22, 23] conducted studies employing the unconstrained sorting
methodology to determine how people sort images that represent information or
knowledge, which they call 'visual representations'. Hayhoe [24] used
the results of a sorting task to obtain information on categories to aid in
software menu construction. His results
showed that menu construction based on categories constructed from sorting task
data provided better performance than other menu constructions. The sorting methodology has also been used by
Microsoft to guide the design their WWW intranet site design [Amy Stevenson,
personal communication, 1997].
The unconstrained sorting task
methodology has been used in LIS research to investigate user behaviour. Vidal
[25] presented 58 study participants with 48 varying images of the
Jörgensen
[26] used both a sorting task and a
variety of other tasks in an attempt to discover how people categorise
and describe images. As a part of this
study, eighteen participants sorted 77 images into groups. A talk-aloud methodology was used to elicit
descriptions of attributes used for grouping.
Participant comments were transcribed, and content analysis was performed
on the transcribed group descriptions to discover the categories of attributes
used for grouping and the frequency with which these categories were used. Participants employed art historical
attributes (e.g., artist, format, medium, style, and technique), abstract
attributes (e.g., atmosphere, theme), content/story attributes (e.g., activity,
event, setting) and object attributes (e.g., object, text, body part, clothing)
most frequently in the sorting task. Less
frequently used attributes included people attributes, viewer response (e.g.,
personal reaction, conjecture), colour, visual
elements (e.g., composition, orientation, perspective), and description.
LIS researchers have used various
other qualitative techniques to study the organisational
habits of individuals to inform information retrieval system interface
design. Several studies have
investigated how scholars organise materials in their
offices. Kwasnik
[27] used content analysis on verbal data collected from eight faculty members
regarding the organisation of documents in their
offices to determine dimensions used in classifying and storing those
documents. She found that form, use,
location, circumstance, and time were dimensions of documents that were most
frequently used in making decisions regarding where a document would be
stored. Case [28], studying the organisation of the offices of historians, found that
spatial constraints, such as the need to keep some documents close at hand to
serve as a reminder to complete a task, and form of document, such as book
versus periodical article, were used to determine the storage location of a
document in a given historian's office.
Research
Design
A test of the user categorisation methodology was carried out in a pilot study
prior to the initiation of the main study.
In June 1996, twenty people at
After completion of the pilot study,
user categorisation data were collected for the main
study from 50 participants solicited in a shopping mall in
Each participant examined and sorted
into groups 47 items related to Charles Dickens' A Christmas Carol. Items
included unabridged hardcover and paperback editions, children's adaptations,
videos, sound recordings, and works about A
Christmas Carol. A complete list of
items appears in Appendix 1.
Participants were given the following verbal instructions:
In the box are items that are
related to A Christmas Carol, by
Charles Dickens. Please look at each
item carefully and put items into groups based on how alike they are to each
other. That is, things that are similar
should go into the same group. Each
group may be as large or as small as you want it to be. The purpose of the groups is to help you find
the items later; so the characteristics you use to create the groups should
help you remember how to find the items at a later time.
Participants
were not informed about the specific object of the study so as not to influence
the sorting process. It was thought that
if the words 'library' or 'catalogue' were mentioned, participants might have
been inclined to have created groups based on their impressions of current
library practise or on how they thought librarians
might create them. Since the ultimate
purpose of the study was to reflect user needs and preferences, the
instructions were specifically worded to make the grouping based on individual
preference, while at the same time remaining within the framework of
information retrieval ('to help you find the items later...').
When finished sorting, participants
wrote down a name for each group and a brief description of the attributes they
used to create that group. The name and
the description could be the same if the name was a description of the attributes of the group. After participants finished, one of the
researchers read the written names and descriptions of the groups while the
participant filled out a brief form that supplied information about themselves
(see figure 1 for summary data regarding participants). Occasionally, participants had to be
questioned further to clarify ambiguous names or descriptions. For example, one participant named and
described a group as 'Assorted Christmas books.' In clarifying the description, the
participant said the group contained documents that 'have other stories' in
them, so this phrase was added to the description.
[Figure 1
about here]
DATA ANALYSIS
Data collected from the names and
brief descriptions that participants gave to each of their groups were analysed using content analysis to discover the categories
of attributes used for grouping. This
analysis did not distinguish between a group name and a group description, but
considered the two together as a single description. After this point, group names and
descriptions will be referred to collectively as descriptions. All of the potential categories of attributes
that may have been used to group items were identified, thus, a single group
description often revealed more than one category. For example, one description read: 'These books are very animated and have lots
of pictures. The kind parents read to
children who can't read yet.' The categories
of attributes present in this description include: physical format (books), audience
(parents, children), pictorial
elements (very animated and have lots of pictures), and usage (the kind parents read to children
who can't read yet).
Descriptions were analysed independently by two researchers who identified
categories, or types of attributes, used for grouping. After the researchers completed their
independent analyses, they compared results to determine a working list of
attribute categories. Categories were
refined further when attributes were coded and attribute frequencies were
tallied. Independent analysis by
different researchers was considered to be particularly important in this study
because it was not possible to check with the participants of the study later
to see if the categories identified matched their perceptions, which is one of
the primary ways in which trustworthiness of qualitative analyses is ensured
[e.g., 29, 30].
Categories of attributes used for
grouping are summarised in figure 2. Examples of participant descriptions follow
the name of each category of attribute.
Categories are listed in order of the frequency with which they were
used.
[Figure 2
about here.]
Once all of the attributes used in
the descriptions were coded, the frequency with which each category was used
was calculated. Frequency of use of a
category may indicate the extent to which display of that category would be
useful in an online catalogue. However,
this is speculation; further research is necessary to determine the usefulness
of individual category types in catalogue displays.
Frequency was calculated in two
ways: first, the number of times each
category was used [total use], and second, the number of participants who used
a category at least once [participant use].
For the total use count, each category was counted only once per group
formed, regardless of the number of times it was used in the group
description. For example, one group had
the following description: 'foreign
language ... books are written in a different language.' This was counted in the language category
only once.[1] Again, independent researcher coding was
used to increase trustworthiness of the analysis. Intercoder
reliability between the two researchers was 83 percent; that is, 83 percent of
the descriptions coded were coded identically by the two researchers. Figure 3 summarises
statistics for total use of categories in all descriptions and use by
individual participant.
[Figure 3
about here.]
Physical
format
Physical format refers to the type
of physical object the item represents.
For example, books, videorecordings, CDs, and
calendars are physical formats.
Occasionally physical format descriptions were not format per se, but
instead the type of information typically contained in a particular format. For example, some participants described a
group of CDs and cassettes as 'music' or 'audio'. In addition, some participants described the
way in which the format was used or the purpose of the format, e.g., 'viewing
... you can look or listen instead of reading.'
These descriptions were all coded as physical format because all of the
descriptions of this nature described groups composed of items in a particular
physical format.
Physical format was the category
used most frequently by participants in the study. It was mentioned more than twice as often as
the next most frequently used category.
Forty-eight out of fifty participants used this category at least once
for grouping. Almost all of the
participants who used physical format more than once for grouping used it
often; half of the participants used it five or more times. Some participants relied almost exclusively
on physical format characteristics for their grouping, creating paperback, hard
cover, audio, and video groups.
Audience
Audience refers to the type of
person by whom the item was intended to be used. Characteristics mentioned varied from age of
audience ('elderly') to ability ('sight impaired') to occupation
('teachers'). Age characteristics were
the most frequently mentioned type of audience characteristic. It is probable that one of the reasons that
audience, and particularly age characteristics, were mentioned so frequently
was the large number of items used in the study that were intended to be used
by children. Had the study used another
literary work, such as Margaret Mitchell's Gone
with the Wind or a Shakespeare play, or a nonfiction work such as
Content
description
Content description includes all
attributes that describe item content, for example, attributes describing the
form of an item ('manuscript of A Christmas Carol'), the subject of an item
('about Scrooge', '[not] actually the story but facts, trivia etc. about
it'), characters ('cartoon characters
act out', 'scary or angry characters'), type of content included in an item
('version allows the feeling the emotion the "intent" of each
character to be not only heard but understood', 'told from another character's
point of view', 'events and when happened and questions and thinking about the
movie'), and the presence of the story in a collection with other stories
('other stories included', '...feature A Christmas Carol but also include other
stories'). No single type of content
description predominated.
Content description was one of the
most difficult categories to identify and define. At first, it seemed desirable to create
separate categories for content-related characteristics such as subject of an
item, form of an item, etc. However, as
has often been noted in the literature on subject analysis, it is not always
possible to distinguish clearly between subject and form. Because of this, all of the content-related
characteristics were folded into a single category.
Pictorial
elements
Pictorial elements refer to
illustrations or pictures appearing in or on the items and to other aesthetic
qualities of items. These descriptions
included general pictorial elements ('pictures', 'cartoons', '[illustrations]
resemble ink drawings or sketches') and descriptions of the content of
pictorial elements ('very simple cover designs,' 'covers containing images of
Tiny Tim'). Other aesthetic qualities
included 'bright covers', 'beautiful', and 'these covers gave the impression to
me that they were the original covers they looked old-fashioned'.
Although pictorial elements were
mentioned relatively little overall, most of the study participants (52
percent) mentioned them at least once in their descriptions. As with audience, the frequent identification
of pictorial elements in this task may have resulted largely from the many
children's items, particularly picture books, contained in the collection used.
Usage
Usage attributes include
descriptions that discuss when, where (setting), or why the items might be
used, or how the items make one feel (emotion evoked). For example, one participant was a school
teacher who formed almost all of her groups based on how they could be used in
the classroom. One of her descriptions
read: 'these could be used to show
world-wide interest in the story, and thus to get students inspired to read it,
or to open up the story to students from the cultures represented.' All of the foreign language books were in
this group. Examples from other
participants include: 'books that would be good to read to children before bed
one night or a few nights', 'deep thinking', 'I picked these 2 because I would
enjoy reading them', 'research
applications', and 'opens door for games and discussions'. Most of the attributes in this category were
indicative of setting; a very small number described the emotion evoked by item
or in the participant.
Language
Language refers to the language in
which the items were written. All
identifications of this attribute referred to items that were not written in
English ('foreign', 'different language', 'not in English', 'foreign hand
writes of a Christmas Carol') or mentioned specific languages
('Japanese'). One participant in the
pilot study who was Japanese grouped the two Japanese versions in one pile and
the other non-English versions (French, Spanish) in another. This suggests that an individual's language
groupings may be unique to that individual's native language and to the
predominant language spoken where he or she lives.
Although language accounted for a
relatively small proportion of total use, almost all of the participants used language
at least once (82 percent). In fact, all
of the participants who mentioned language except one used language only once
to create a 'foreign language' group.
Even those participants who created groups based primarily on physical
format almost always created a foreign language group as well.
Physical
characteristics
Physical characteristics include
physical dimensions of items and physical age characteristics of items. Examples of descriptions including physical
characteristics are: 'books that are
smaller in size (around 8 x 4 inches or so)', 'they are shorter and a little
thicker', and 'older versions not as glossy modern'. Physical characteristics could also include
item provenance. In the pilot study, one
participant grouped one of the items, which was a discard from a school
library, on its own based on its having been owned previously by a library.
Content
age, integrity
Content age, integrity refers to the
perceived age of the content of an item or to the integrity or similarity of
the content of an item to an original or unabridged edition of the work. Descriptions in this category ranged from
the extent to which the original text had been changed to how difficult or easy
the content would be to read or understand.
Also included here were descriptions regarding the manner in which the
original content had been changed.
Examples of age and authenticity attributes include: 'modern version', 'older style stories', and
'fairly accurate to the text'. Examples
of text changes include: 'abridged versions' and 'adaptations that do not
follow the original story line'.
Examples of difficulty or complexity level include: 'basic level' and 'lower reading skill level'.
The length of an item, for example,
'they are all pretty long', was also included in this category. Although length could be interpreted as a
physical characteristic, or even as content description, it was finally
included here because the few times it was mentioned it was associated with
content integrity. It is interesting to
note that, in a study of cataloguing records looking for elements that reliably
distinguished one edition from another, paging was the most reliable indicator
of change of edition [31]. This
distinguishing capability of paging lends support to including length in the
content age, integrity category. Length
was used in this study only three times, each time by different participants.
Age and integrity attributes are
closely related to content description attributes described in an earlier
category, because they also have to do with the character of the content of the
item. For example, a dramatisation
or theater version has something to do with how the original text has been
changed, although this aspect may not have been mentioned explicitly in participant
descriptions. This category could easily
be merged with the content description category to become a single
content-based category.
One reason the age, integrity
category was presented separately is that it roughly corresponds to the
practice of identifying main authorship in Anglo-American cataloguing. Editions of works whose content has been
significantly changed from original editions, for example, condensed versions
or children's versions, are normally identified with different main authorship
from those editions whose content closely matches the original edition. Knowledge of how many people regard these
changes as significant enough to describe them in a sorting task provides a
preliminary indication of whether or not the practice of identifying primary
authorship is one that may be useful to catalogue users.
Although only four percent of total
categories mentioned referred to content age, integrity, 32 percent of the
study participants mentioned this category at least once. Most of the references were to text changes
or difficulty/integrity level; fewer referred to the age of the content. These results indicate that catalogue users
may indeed benefit from the identification of changes in primary authorship in
the catalogue.
Textual
characteristics
Textual characteristics include
attributes that have to do with specific words appearing on items, for example,
'all the soft back books that are not named A Christmas Carol', and
descriptions of the physical shape or size of printed text, for example,
'larger print.' One participant grouped
all items that had titles beginning with the articles 'the', 'a-an', and no
initial articles into separate groups.
Creator,
performer
Creator, performer characteristics
refer to descriptions that mention the name of an author, actor, producer,
publisher, etc. Authors and performers
were mentioned most frequently, for example, 'Charles Dickens is the author to
these books', 'Patrick Stewart' (who reads A
Christmas Carol on two audio versions), 'George Scott' (who plays Scrooge
in one of the video versions), and 'played by Disney characters'. It is notable that the descriptions
containing this type of information did not, for the most part, mention Dickens
nor did their groups contain textual editions of A Christmas Carol, but instead mentioned creators and performers
other than Dickens, and their groups contained items that were not textual
editions of A Christmas Carol, but
were adaptations of the Dickens work for children or videorecording
or sound recording adaptations of it.
Again, this may be an indication of the importance of the identification
of primary authorship responsibilities in cataloguing practice.
'Odds
and ends'
The 'odds and ends' category
contains descriptions that singled out items because they did not fit in other
categories. Descriptions included: 'these items did not fit into any of the
other categories', 'odd things like calendars and CDs, etc.', and 'misc.' When this category was used, the actual items
grouped often consisted of the Advent calendar, or the Advent calendar and the
Christmas Carol trivia book.
Ambiguous
Only two groups were created for
which no category could be discerned.
These two groups were created by two different participants. The number of ambiguous descriptions was low
because all descriptions were read by researchers while participants waited, so
that most of the ambiguous responses were clarified at the time of data
collection.
Overall
Group and Category Usage Statistics
Figure 4 contains statistics for
overall group formation and category usage per participant. Categories were
often used more than once by individual participants. For instance, some participants formed five
or six groups, but used only three or four categories for grouping.
[Figure 4
about here]
POTENTIAL USE OF PARTICIPANT CATEGORIES
IN ONLINE CATALOGUE DISPLAYS
The impetus for this research was to
discover types of categories that people use to organise
items related to a particular work in order to guide the construction of
summary work displays in online catalogues.
In this section, categories used by participants in the study are analysed, first, with respect to their current use in
library organisational schemes such as shelf
arrangements, cataloguing rules, subject headings lists, and classification
schemes and, second, with respect to their potential use in online catalogue
displays. Particular attention is given
to the possibility of creating clusters of bibliographic records automatically,
although detailed recommendations for implementation are not included. Figure 5 summarises
the areas in existing bibliographic records that may contain the types of
information identified in the study.
[Figure 5
about here.]
Physical
format displays
Physical format issues are complex,
and many theoretical aspects of these issues have yet to be resolved. As a result, the discussion here must be recognised as covering a wide range of issues, all of which
could be discussed in much greater detail.
Physical format, the category
identified most frequently by participants in this study, has been used in a
variety ways in traditional library organisation
schemes. Shelf arrangements in libraries
are often based on physical format, where videos, audiocassettes, CDs, and books,
for example, are shelved in separate areas.
The information regarding where an item is shelved is frequently
included in holdings information in a bibliographic record. If shelf location information is included in
a record in such a way as to make it easy to extract automatically, online
catalogues could use it to group records by physical format automatically. Some libraries shelve items of the same
physical format in different areas, for example, adult videos in one section
and children's videos in another. In
this case the online catalogue could become a mechanism by which items that are
of the same physical format but are shelved in different locations could be
brought together.
Other parts of the bibliographic
record also contain information regarding physical format. Physical format information is required by
both the Anglo-American Cataloguing
Rules, 2nd. ed. revised (AACR2) and the MARC format. Unfortunately, the identification of physical
format information in bibliographic records is not consistent. As a consequence, it could prove difficult to
use record content to assemble records automatically, particularly that
information regulated by AACR2.
In AACR2, physical format is identified first in the general material
designation (GMD). Unfortunately, the
GMD may not be very efficient for automatic clustering of records associated
with particular physical formats for two reasons. First, it is an optional element of a
bibliographic record. Second, although
many libraries use the GMD, they do not use the same terms to describe the same
formats. For example, cataloguing
agencies in the
MARC also contains provisions for
identification of physical format. In
the 006 field, form of material is specified for fourteen material types, for
example, printed language material, projected medium (e.g., motion pictures, videorecordings, filmstrips), three dimensional materials,
kits, and cartographic materials. The
007 field contains other material type information and is tied to the AACR2
mandated information in the physical description and notes areas. Because it is coded, physical format
information in the 006 and 007 fields may be better suited for automatic
manipulation than AACR2 regulated areas of a bibliographic record.
Other potential sources of physical
format information exist in the subject content portions of bibliographic
records. First is 'form' subdivision
information from the Library of Congress
Subject Headings (LCSH), which is used occasionally to indicate physical
format. Physical characteristics information, now coded in the 655 field in
bibliographic records, may also contain information regarding physical
format. For example, 'miniature books'
and 'shaped books' are terms that appear in Printing
and Publishing Evidence [32], a thesaurus for rare book and special
collections cataloguing published by the Association of College and Research
Libraries, Rare Books and Manuscripts Section, Standards Committee. Some classification schemes, for instance,
the Universal Decimal Classification
(UDC), provide physical format facets in their notational scheme. In the UDC,
s special auxiliary subdivision ((0.0)) exists for physical features of
items.
Audience
displays
Audience characteristics have also
been identified in various ways by libraries and information centers. Age, the audience attribute mentioned most
often by participants in the study, is frequently used in shelf arrangements,
where children's materials are shelved in one area and adult materials in
another. Materials for other special
audiences are also sometimes shelved together, for example, large print or
Braille materials, English as a second language (ESL) materials, etc. Again, if holdings information in
bibliographic records includes indications of shelving areas for particular
audiences, this information could be used to construct audience categories in
displays automatically.
The MARC 008 field also contains
information regarding target audience of the item. Audiences identified include various school-aged
audiences (preschool, primary, elementary, secondary), adults, children
(juvenile), and specialized audiences.
MARC also provides an indication as to the nature of the terms assigned
in the 650 and 655 fields, in that particular thesauri are identified. When a thesaurus is directed at a particular
audience such as children, this information could be used to group materials
for specific audiences automatically.
Audience information is also included in LCSH. Children's materials are designated with special headings and
with form subdivisions, for example, 'Juvenile films', 'Juvenile sound
recordings', and 'Juvenile software', which could be used to group children's
materials automatically and subarrange them by
physical format.
Content
description displays
Content description covers a wide
variety of attributes only some of which have been identified in bibliographic
records. Form attributes are sometimes
indicated by LCSH form subdivisions
and sometimes by terms in other thesauri.
However, form subdivisions have only very recently been differentiated
from other types of subdivisions in MARC fields and, as a consequence, may, at
least for the near future, be of little use in organising
records in catalogue displays. Subject
attributes may not contribute much in the construction of organised
work displays because of the similarity of items related to a particular work;
these items would often be on the same
subject. In addition, the few subject
attributes identified by participants in this study were often at a level of
detail not covered in traditional subject cataloguing.
Other types of content descriptions,
such as point of view or character of content, for example, 'research
oriented', have rarely been included in bibliographic records. Occasionally, this information appears in
genre or physical characteristics headings in 655 fields or in tables of
contents (505 field), indexes, or document abstracts or summaries (now included
for children's materials in the 520 field).
For many years the mandatory addition of table of contents information
into bibliographic records has been discussed, but it has not yet become
standard practice for all items.
Finally, whether or not an item is composed of a collection of works is
frequently not recorded at all in bibliographic records. Current practice would have to change
significantly to incorporate the range of content description information
identified by participants in this study.
Pictorial
elements displays
Pictorial element information is available
in several places in bibliographic records.
Information regarding the existence of at least a few illustrations in
an item is available in two places:
first, in the MARC 006 field, where information for books is coded
regarding the presence and type of illustrations present, for example, maps,
portraits, charts, and illustrations.
This information is almost always repeated in the physical description
area. Information regarding the presence
of illustrations may also be indicated in the statement of responsibility area,
in which illustrators are identified, or in the LCSH form subdivision 'Pictorial works'. The MARC 006 field and the LCSH 'Pictorial works' subdivision could
both be employed in creating clusters of items with pictorial elements. Terms from other thesauri may also contain
information regarding pictorial elements of items. Unfortunately, little to no information is
included in records describing particular aspects of pictorial elements, for
example, the types of illustrations included, such as cartoons or
oil-painting-like drawings, which were sometimes described by study
participants.
Usage
displays
Information regarding when, where,
how (including emotion evoked) or why items might be used is seldom available
in formal library organisation schemes or
bibliographic records. One explanation
for this lack may be the difficulty of identifying this type of information
objectively. However, this type of
information is sometimes provided by librarians in booklists and occasionally
occurs in reference books. For example,
children's librarians frequently make lists for specific uses, for example,
'bedtime story' books or 'scary' books.
Also, children's reference books exist that contain reading lists for
various settings such as reading aloud to children. The fact that so many study participants
identified this category at least once in their descriptions suggests that the
incorporation of this type of information into bibliographic descriptions be
considered. Further research is necessary
to discover whether this is a category of information missing in our library organisational schemes that would be useful to incorporate
or if resources such as booklists and reference books are sufficient to
accommodate user needs in this area. It
may be that such information is most desirable for children's materials, which,
perhaps more than other types of materials, have many different types of uses.
Language
displays
Language attributes of items, like
physical format and audience attributes, are frequently identified in physical
document arrangements, particularly in public libraries. Public libraries often shelve items in the
English language and materials in other languages in separate locations. Again, if shelving locations are noted in the
holdings portion of bibliographic records they could be used to group items by
specific language automatically in display.
Language is also noted in other locations in bibliographic records, in
the MARC 008 and 041 fields, and in the language subfield of uniform title
fields. In the 008 field a code
indicates the predominant language of the item.
The 041 field is used both for items that are in more than one language
and for items that represent a translation from one language to another. The language subfield of uniform title fields
are used to indicate translations.
Indications of translations in 041 fields and uniform title language
subfields could be used to group translations of a work automatically, either all
together or by individual language, in catalogue displays.
Physical
characteristics displays
A few physical characteristics are
noted in library organisation schemes. The height of books is sometimes used as a
basis for shelving. Books of a certain
height or higher are sometimes shelved in a special area, the 'oversize'
area. Again, if this information is
included in holdings information in bibliographic records, it could be used to
cluster records in displays. A few
existing online catalogues use height and page number information to create
graphical displays of books on shelves.
Information regarding height is also available in the physical
description portion of a bibliographic record and could possibly be used for
automatic clustering. Notes are
sometimes used in bibliographic descriptions to describe special physical
characteristics of items.
Other information regarding physical
characteristics, for example, whether books are smooth or thin, is not
available. Since only four percent of
attributes used were physical characteristics, it may not be worth the cost of
incorporating into item descriptions.
Content
age, integrity displays
Attributes having to do with the
difficulty or integrity of item content or with the age of the content of an
item are sometimes available indirectly or partially in bibliographic
records. One of the functions of
bibliographic records in general is to distinguish differences between
editions; thus, the description itself may be viewed as an indicator of content
age, integrity. Publication and copyright
dates, available both in the MARC 008 field and in the publication information
area, are sometimes correct indicators of content age. Content integrity information is available
indirectly in a number of ways. If a
text has been adapted, this fact appears in a change of main entry heading as
well as an addition either to the statement of responsibility or to the notes
area. If a text has been abridged or
revised, the statement of responsibility, edition, or notes areas may reflect
it. Difficulty level, for example, with
respect to reading level for beginning readers, may be available indirectly
through certain types of audience information, discussed above.
Length of an item is often recorded
in the physical description area of a record.
However, grouping items automatically based on varying lengths could
prove difficult. For example, books
frequently have varying groups of numbered pages, for example, a group of
preliminary pages numbered in roman numerals and the main part of the text in
Arabic numerals.
Not all content age or integrity
attributes are included in records, nor would those that are included be easy
to manipulate for clustering in displays.
Further research would be necessary to discover how easily such displays
might be created automatically, as well as the extent to which they would be
helpful to catalogue users.
Textual
characteristics displays
Some textual characteristics are
available for use in displays, and others are not. Large print, one of the notable examples from
the study, is an attribute that may be identified in the GMD in a bibliographic
record, in the edition area, if the item catalogued has a 'large print edition'
statement on it, or in the LCSH heading,
'Large type books'. Some libraries
shelve large print materials in a separate section, making it possible, again,
to use holdings information for automatic grouping of this particular type of
material. Textual characteristics such
as the existence of particular words in a title are easily retrievable via
keyword search functions. Textual
characteristics having to do with print sizes other than large print, for
example, tiny print, or particular text markings, are not noted in library organisational schemes.
Again, because the addition of information to bibliographic descriptions
is costly, research would be necessary to determine whether or not it would be
worthwhile to add textual characteristics of this type.
Creator,
performer displays
Creator, performer attributes are
readily available in bibliographic records.
Most of the creators/performers identified by participants in the study
would be controlled access points in currently constructed records. Summary displays based on subsidiary authors
could be created easily by using added entry fields. The only difficulty of creating such displays
would be if the desired creator or performer were either not included in the
bibliographic description, or if their names existed only in uncontrolled areas
such as notes or statements of responsibility.
Further research is desirable to determine the extent to which catalogue
users would request the names of creators/performers which are not currently
assigned access points.
'Odds
and ends' displays
An 'odds and ends' category
consisting of records that were in categories by themselves could easily be
envisioned in an online catalogue display. Each library's holdings of items
related to a particular work is unique, and items that are one of a kind would
also be unique to a particular library.
An 'odds and ends' display cluster could be constructed automatically by
collapsing the smallest clusters into a single 'miscellaneous' cluster. Some experimentation on how such displays
could be constructed and when they would be most helpful to users would be
desirable.
DIRECTIONS FOR
FUTURE RESEARCH
User behaviour,
particularly with respect to author and work searching, is a rich area for
future research. Although we know much
about simple aspects of users' searching behaviour,
for example, how often users search for authors and titles, we know little
about more complex aspects, such as what they are actually looking for when
they enter these searches, or what attributes of items would actually satisfy
their information needs.
Research is needed that follows up
on this study by investigating how people group items related to a variety of
different kinds of works such as nonfiction works, literary works of a
different character from A Christmas
Carol, and works that originate in nonbook
formats. After further research of this
type is completed, common categories could be derived from the individual
studies and used to design a prototype catalogue featuring summary
displays. The prototype summary display
catalogue could then be tested for its effectiveness compared to existing
alphabetical or random arrangement catalogues, or to a summary display
catalogue in which the summaries are based on non-user derived categories such
as the ones derived from filing rules and bibliographic relationships suggested
by Carlyle [14]. Another extension of
the research presented here could study the particular attributes that appear
in each category, for example, the particular pictorial elements people use in
grouping.
CONCLUSION
Current online catalogue displays
are frequently composed of linear orderings of bibliographic records. It has often been noted that multiple-record
displays that are so constructed take little advantage of the power of the
technology upon which they are designed.
Computer technology allows us to create any number of different types of
displays, including displays which cluster individual records into groups based
on similar attributes. As with any
aspect of information system design, the content and organisation
of screen displays should be informed by extensive knowledge of user behaviour. This
study has furthered our understanding of user behaviour
by discovering categories of attributes that people may use when they organise items related to a particular work for
themselves.
ACKNOWLEDGEMENTS
This research was supported by a
grant from
REFERENCES
1. Wiberley, S.
E., Daugherty, R. A., and Danowski, J. A.
Displaying online catalog postings:
LUIS, Library Resources & Technical
Services, 39, 1995, 247-264.
2. Massicotte,
M. Improved browsable
displays for online subject access,Information Technology
and Libraries, 7 (4), 1988, 373-380.
3. Association for Library Collections and
Technical Services. Headings for tomorrow: public
access display of subject headings.
4. Buckland, M.K., Norgard,
B.A. and Plaunt,
C. Filing, filtering, and the first few
found, Information technology and
libraries, 12, 1993, 311-319.
5. Fattahi,
R. Super records: an approach toward the description of works
appearing in various manifestations, Library
Review, 45 (4), 1996, 19-29.
6. Yee, M.M. and Layne, S.S. Online
public access catalogs, Encyclopedia of
Library and Information Science, v. 58, supp. 21, 1996, 149-238.
7. Larson, R.R. Classification clustering, probabilistic
information retrieval, and the online catalog, Library Quarterly, 61 (2), 1991, 133-173.
8. McGarry, D. and Svenonius,
E. More on improved browsable
displays for online subject access, Information
Technology and Libraries, 10 (3), 1991, 185-191.
9. Svenonius,
E. Clustering equivalent bibliographic
records. In: Annual review of OCLC
research, July 1987-June 1988.
10. Fidel, R.
User-centered indexing, Journal of
the American Society for Information Science, 45 (8), 1994, 572-576.
11. Pejtersen,
A.M. 1989. A library system for information retrieval
based on a cognitive task analysis and supported by an icon-based interface.
In: Belkin, N.J. and Van Rijsbergen, C.J., eds. SIGIR '89: Proceedings of the twelfth annual
International ACMSIGIR Conference on Research and Development in Information
Retrieval.
12. Pejtersen, A.M.and Austin,
J. Fiction retrieval: experimental design and evaluation of a
search system based on users' value criteria (part 1), Journal of Doucmentation, 39 (4), 1983,
230-246.
13. Lubetzky,
S. Code
of cataloging rules: author and title
entries. An unfinished draft. [S.l.]: American Library Association, 1960.
14. Carlyle, A. Fulfilling the second objective in the online
catalog: schemes for organizing author
and work records into usable displays, Library
Resources & Technical Services, 41
(2), 1997, 79-100.
15. Carlyle, A. Ordering author and work records: an evaluation of collocation in online
catalog displays, Journal of the American
Society for Information Science, 47 (7), 1996, 538-554.
16. Dwyer, C.M., Gossen,
E.A.and
Martin, L.M. Known-item search failure
in an OPAC, RQ, 31 (2), 1991,
228-236.
17. Ayres, F.H. Duplicates and other manifestations: a new approach to the presentation of
bibliographic information, Journal of
Librarianship, 22 (4), 1990, 236-251.
18. O'Neill, E.T.,
19. Ayres, F.H., Nielsen, L.P.S., Ridley,
M.J., and Torsun,
I.S. The
Bradford OPAC: a new concept in
bibliographic control. British
Library R & D Report 6183.
20. Fattahi,
R. [Prototype catalog available
at:
http://wilma.silas.unsw.edu.au/ students/rfattahi] 1996.
21. McDonald, J.E. and Schvaneveldt, R.W. The application of user knowledge to
interface design. In: Guindon, R., ed. Cognitive
science and its applications for human-computer interaction.
22. Lohse, J., Rueter, H., Biolsi, K., and Walker, N. Classifying visual knowledge
representations: A foundation for
visualization research. In: Kaufman,
A., ed. Visualization '90: proceedings of the first IEEE Conference on
Visualization.
23. Lohse, G.L., Biolsi, K.,
24. Hayhoe, D. Sorting-based menu categories. International
Journal of Man-Machine Studies, 33, 1990, 677-705.
25. Vidal, N.K. Experimental image
taxonomy: An inquiry into spontaneous
image organization. Master's thesis,
26. Jörgensen,
C. Image
attributes: An investigation. Ph.D. diss.,
27. Kwasnik,
B.H. The importance of factors that are
not document attributes in the organisation of
personal documents, Journal of
Documentation, 47(4), 1991, 389-398.
28. Case, D.O. Collection and organization of written
information by social scientists and humanists:
a review and exploratory study, Journal
of Information Science, 12, 1986,
97-104.
29.
30. Maykut, P. and Morehouse, R. Beginning
qualitative research: a philosophic and
practical guide.
31. Pierce, E.G. Appendix D:
Testing value of full title-page transcription in cataloguing. In: Studies of descriptive cataloging: a report to the Librarian of Congress by the
Director of the Processing Department.
32. Association of College and Research
Libraries. Rare Books and Manuscripts
Section. Standards Committee. Printing
and publishing evidence: thesauri for
use in rare book and special collections cataloging.
APPENDIX 1:
LIST OF ITEMS USED IN THE STUDY
Arranged by physical format and other user
characteristics.
Books - Hardcover - Children's
Dickens, Charles. A Christmas Carol. Abridged by Vivian French. Illustrated by Patrick Benson.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. Christmas Stories: A Christmas Carol, The Holly Tree.
Disney's
Mickey's Christmas Carol.
Taylor, Mark A. The Christmas Carol.
Books - Paperback - Children's
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol. Retold by I.M. Richardson.
Dickens, Charles. A Christmas Carol.
Dubowski, Cathy East. Scrooge.
Lillington, Kenneth.
A Christmas Carol, Easy Piano
Picture Book. Text by Kenneth
Lillington after Charles Dickens, Illustrations by Annabel Spenceley,
Carols arranged by Timothy Roberts.
Books - Hardcover - Adult
Davis, Paul. The Lives and Times of Ebenezer Scrooge.
University
Press, 1990.
Dickens, Charles. A Christmas Carol and Other Stories.
Library,
1995.
Dickens, Charles. A Christmas Carol: A Facsimile Edition ...
Mula, Tom. Jacob Marley's Christmas Carol.
Publishing,
1995.
Books - Paperback - Adult
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol.
Dickens, Charles. A Christmas Carol and Other Christmas Books.
Dickens, Charles. A Christmas Carol and Other Christmas
Stories.
Dickens, Charles. The
Christmas Books, Volume 1, A Christmas Carol, The Chimes.
Dickens, Charles. A Christmas Carol: Adapted for Theater.
Payne, Darwin Reid.
A Christmas Carol. Dramatized by
Sammon, Paul. The Christmas Carol Trivia Book.
Books - Foreign Language
Dickens, Carlos. Canción
de Navidad, El Grillo del Hogar, Historia de Dos
Ciudades. Sexta edicíon. Mexico: Editorial Porrúa,
S. A., 1990. (Spanish)
Dickens, Charles. Cuentos
Navidenños: Cancion de navidad, el poseido.
Bogota: Ediciones Universales, 1993. (Spanish)
Dickens, Charles. Un Chant de Noël. Évreux:
Gallimard, Folio Junior, 1994. (French)
Dickens, Charles. A Christmas Carol. [Japanese hardcover version: Dikenzu gensaku. Kurisumasu kyaroru. Tamura Sumiyo sakka. Sekai no meisaku. Samaku Shuppan.], 1995.
(ISBN: 4-309-46568-4)
Dickens, Charles. A Christmas Carol. [Japanese paperback cartoon version: Dikenzu. Kurisumasu kyaroru. Muraoka Hanako yaku. Sato Aki kaisetsu. Sekai bungaku no tamatebako. Kawade Shobo Shinsha, c1994.] (ISBN:
4-7631-8299-4)
Sound Recordings
Dickens, Charles. A Christmas Carol. Performed by Sir Lawrence Olivier and others.
Dickens, Charles. A Christmas Carol. Performed by Patrick
Stewart.
Dickens, Charles. A Christmas Carol. Performed by Patrick
Stewart.
Dickens, Charles. A Christmas Carol. Read by Geoffrey Palmer.
Videorecordings - Adult & General
A Christmas
Carol.
Blockbuster Classics, 1951. (Alastair Sim version)
A Christmas
Carol.
An American
Christmas Carol.
Scrooge.
Videorecordings - Children's
A Flintstones Christmas Carol. [S.l.]: Hanna-Barbera Cartoons, 1994.
Mickey's Christmas Carol.
The Muppet Christmas Carol.
Miscellaneous
Packard, Mary. Dickens, Charles: A Christmas Carol Story Book Set & Advent
Calendar. Illustrations by Ray Bartkus; Story retold by Mary Packard.
Average amount of book reading Number Percent
1-2 books a month or more 24 48%
1 book every two to three months 13 26%
a few books a year 8 16%
not much of a book reader _5 _10%
Total 50 100%
Visit the
library or a bookstore
1 or more times a month 33 66%
once every two to four months 13 26%
once or twice a year _4 __8%
Total 50 100%
Education
(highest level attained)
high school or less 31 63%
undergraduate degree (bachelors) 14 29%
graduate degree (masters) 4 8%
degree higher than masters _0 __0%
Total 49** 100%
Age
18 - 30 32 64%
31 - 40 6 12%
41 - 50 9 18%
51 - 60 2 4%
over 60 _1 __2%
Total 50 100%
Sex
Female 30 60%
Male 20 _40%
Total 50 100%
________
Total
number of participants: 50
** One
subject did not report data for education.
FIGURE 1. Summary Data for Research Participants
Physical format: audio tapes, hard back books, VCR tapes,
movies, little kid tapes
Audience: youth, sight impaired, juvenile, grown up
people, piano players
Content description: novel, play, more involved plots with more
details, Americanized
version
of the same moral central to A Christmas Carol
Pictorial elements: animated, cartoon pictorial, not as many
pictures, had a mans face
on the
front, color artwork, dull covers
Usage:
could be read by small group for presentation, theater, for
relaxation, fun, dull
Language:
foreign language, foreign
version, Spanish, non-English
Physical characteristics: medium size, largest books, thick hard bind
Content age, integrity unabridged, abbreviated versions, classic, standard
edition,
original text-line, short version
Textual characteristics: big print, large print, book [sic.] that say
Scrooge on them,
every
five lines ... marked in book as in a poem reading
Creator, performer: produced other than Charles Dickens, Disney
type story, adapted
by other
author's take from original
'Odds & ends': alone!,
miscellaneous, other
FIGURE 2: Categories of attributes used for grouping,
with sample participant descriptions
Total
Use Participant Use
Number
Percent Number Percent of Participants
Using Category
Physical format 239 37 48 96
Audience 93 14 34 68
Content description 90 14 35 70
Pictorial elements 54 8 26 52
Usage 44 7 25 50
Language 32 5 41 82
Physical characteristics 26 4 10 20
Content age, integrity 25 4 16 32
Textual characteristics 22 3 12 24
Creator, performer 16 2 9 18
'Odds and ends' 8 1 8 16
Ambiguous 2 1 2 4
TOTAL 651 100 ** **
**Totals not applicable.
FIGURE 3. Frequency of
Category Use
Mean number of groups formed per participant: 7.3
Smallest
number of groups formed: 3
Largest
number of groups formed: 13
Mean number of total categories used per
participant: 13.5
Smallest
number of total categories used: 5
Largest
number of total categories used: 34
Mean number of unique categories used per
participant: 5.3
Smallest
number of unique categories used: 3
Largest
number of unique categories used: 9
FIGURE 4. Group and
Category Usage Statistics
Physical format
MARC
006, 007 Notes
GMD LCSH
& other thesauri
Physical
description Holdings
Audience
MARC
006
LCSH
& other thesauri
Holdings
Content description
Notes
LCSH
& other thesauri
Pictorial elements
MARC
006 Statement
of responsibility
Physical
description
LCSH
& other thesauri
Usage
None?
Language
MARC
006, 041
Uniform
title language subfields
Holdings
Physical characteristics
Physical
description
Holdings
Content age, integrity
MARC
006, 245 Publication
information
Main
entry heading Physical
description
Edition
statement Notes
Textual characteristics
GMD Holdings
Edition
statement LCSH
or other thesauri
Creator, performer
All
access points
Statement
of responsibility
Notes
FIGURE 5. Possible Sources
of Information for Automatic Grouping of User-Identified Categories
Publication Information:
"User Categorisation of Works: Toward Improved Organisation of Online Catalogue Displays." Journal of Documentation,
55, 2 (March 1999): 184-208.