THE ROLE OF CLASSIFICATION IN THE CREATION OF AUTHOR AND WORK DISPLAYS
IN ONLINE CATALOGS
1. INTRODUCTION
The
role of classification in information retrieval is often assigned to subject
attributes of documents. However,
classification in its broadest sense, grouping by attribute, has potential uses
in information retrieval systems (IRSs) beyond its
use in grouping subjects. The purpose of
this paper is to explore methods by which classification may enhance online
catalog interface design by organizing author and work records in online
catalogs.
One
of the major problems affecting online catalog design is the large retrieval
set problem (e.g., Wiberley, Daugherty & Danowski 1995).
Larson (1991) and others promote the use of library classification schemes
such as the Library of Congress Classification (LCC) to cluster bibliographic
records in online catalogs when retrieval sets from subject searches are
large. The shorter displays resulting
from clustering would provide users with an overview of the range of items
retrieved, organizing them by subject or discipline. An emphasis on overview-type displays has
appeared in general IRS research as well as online catalog research. Lin (1997), for example, has developed a
system that presents graphic, map-like displays of terms to give users a
picture of term distributions among documents.
Although
author and work searches are frequently dismissed as being non-problematic, the
large retrieval set problem is associated with them as well. Searches for works by William Shakespeare,
for example, may retrieve hundreds or thousands of records, even in relatively
small online catalogs. Large numbers of
records representing well-known authors and works may be retrieved in a search,
and these records are often presented in poorly organized displays (Carlyle
1996). Ayres (1990) suggests that the
problem of record duplication in bibliographic utilities may result in part
from displays that do little to aid users in identifying records sought. Fattahi (1996) has developed a possible solution in a prototype system
incorporating a "super record" approach, which presents organized
displays of authors and works (see:
http://wilma.silas.unsw.edu.au/students/RFATTAHI/super.htm). Researchers at the
This
paper analyzes the role of classification in the design of organized displays
for author and work records, which include records works by particular authors
and editions of particular works as well as records for works about them. Classification may be employed to enhance
author and work displays in a variety of ways.
User classifications may be used to discover the common ways in which
people organize editions of works and works of prolific authors for
themselves. Traditional library
classifications, including physical document arrangements, catalog filing
rules, and library classification schemes, also offer methods for organizing author
and work records that may be used to enhance online catalog displays.
2. USER CLASSIFICATIONS
One
method of improving IRS design is to build systems that reflect the way users
themselves view and manage information.
In the case of author and work displays, the organizational schemes
should address how users
organize author and work documents for themselves.
In
July 1996, I conducted a study exploring how people organize items related to
works. Fifty participants for the study
were solicited in a shopping mall and asked to group 47 documents related to
Charles Dickens' A Christmas Carol,
including hard cover and paperback editions, nonbook
editions, and works about A Christmas
Carol. Grouping was to be based on
two criteria: 1) item similarity and 2) ability of the
groups to help participants find items at a later time. When finished, participants wrote down a name
and a brief description for each group including the characteristics they used
to create the groups. Data analysis
examined two types of data: the
frequency with which any two items appeared in the same group, and
participants' written group names and descriptions. The frequency data were analyzed quantitatively
using cluster analysis to determine common groupings used by participants in
the study. The written data were
analyzed qualitatively using content analysis to discover categories of
characteristics used by participants to create their groups.
2.1 Cluster Analysis
Cluster
analysis calculates the frequency with which any two items have appeared in the
same group, and then clusters into groups, one step at a time, items that have
been placed most frequently in the same groups by participants. The groups resulting from the cluster analysis
are presented below. Thirteen groups
appear in this clustering, which discriminates relatively finely among items.
• audios (cassettes
and CDs) • children's illustrated hard cover materials
• children's videos • trivia book
• adult videos • picture book versions with lots of text
• large format
paperbacks • picture book versions with not much text
• small format
paperbacks • activity versions (piano book, Advent
calendar)
• foreign language
materials • about A
Christmas Carol
• adult hard cover
materials
2.2 Content Analysis
Content
analysis was performed on the names and descriptions of groups assigned by
participants in the study. Group names
and descriptions were analyzed to discover the categories of characteristics
used by participants in the study. The
categories of characteristics used for grouping follow. Each is accompanied by a sample of
participant descriptions in parentheses following the name of the
characteristic. Except for the first
four categories, which were the most frequently mentioned by participants, the
categories are not in any particular order.
• physical format (books, tapes, cd)
• audience
characteristics (kids, adults,
sophisticated readers)
• pictorial elements of
items (ghosts or spirits on the
covers, minimal illustrations)
• language of text (foreign, Japanese, different version
than English)
• content , form, topic
description (focus on specific
characters, detailed book, poem book, play, trivia book, has background factual
materials on the text)
• emotion or feeling (fun, mysterious, serious, scary, I would
enjoy reading them)
• textual
characteristics, both graphic and content
(you got to squint to read it, smaller print, book that say Scrooge on
them, every 5 lines ... marked.. as in a poem)
• length of item (short, all pretty long)
• where, why, or how to
use item; setting (for relaxation,
read to children, for theater)
• creator, performer
characteristics (authors other than Charles Dickens, Disney type story, has
a different author [from Charles Dickens])
• aesthetic quality of
item (beautiful, dull covers, more
modern, interesting covers)
• Christmas
Carol published in a collection (multiple stories)
• 'odds
and ends' (miscellaneous, other, odds
and ends)
• age or authenticity of
content (older original stories,
more original text, modern version)
• physical
characteristics (smooth, thin,
largest books, more pages, pocket books)
• difficulty or
complexity level (simple version,
unabridged copies, abbreviated versions)
3. LIBRARY CLASSIFICATIONS
Libraries
classify documents and records representing documents in a variety of ways
using methods that can also be used to enhance author and work displays. Three methods are analyzed here: physical document arrangements, catalog
filing rules, and library classification schemes.
3.1
Physical Document
Arrangements
Libraries
frequently group documents physically based on audience age level and document
format (Rowley 1992: 471). For example,
American public libraries often divide their collections into adult, young
adult, and children's sections. All
types of libraries group items according to physical format, with separate
sections for books, videos, and sound recordings, for example. These two major characteristics used for
grouping documents physically correspond directly to two of the categories of
characteristics used most frequently by participants in the user classification
study reported above. Although the
correspondence is not surprising, the results of the study support the
usability of physical document arrangements currently employed by libraries.
3.2
Catalog Filing Rules
Catalog
filing rules frequently make use of classification in hierarchical arrangements
provided for the display of author and work records. In recent research, I analyzed various codes
of filing rules to discover the groupings that were commonly used to organize
author and work records (for a complete presentation of the filing rules
analysis, see Carlyle 1997). The
analysis revealed that many of the codes of filing rules provided similar types
of groupings. Groupings commonly used
for the editions of a work include,
for
works: for
the works of an author:
• editions in the
original language • complete works of the author
• analytical entries • selected works
• translations • selections
• related works • single works (filed as above)
• works about the
work • works about the author.
Current
codes of filing rules provide less classificatory structure than they did
previously. One of the motivations for
eliminating highly classified arrangements was the widespread adoption of the
online catalog. Early online catalogs
had great difficulty creating the complex classified arrangements that earlier
codes of filing rules required. In
addition, changes in bibliographic records themselves would have been required
to create arrangements such as these online.
It was deemed more cost effective to simplify the rules than to
instigate the changes necessary to create classified displays. An additional incentive to avoid classified
arrangements was the confusion they sometimes caused card catalog users. Classified arrangements are difficult to see
in the card environment unless guide cards alert users to the non-alphabetical
structure. Because guide cards are
seldom used in card catalogs, filing rules that created complex classified
structures were seen as detrimental to catalog use.
Current
technology, however, allows us to exploit the classificatory structure present
in relationships among author and work records without the risks accompanying
the card environment. Graphic display
programs may be used to create outline or tree structure displays that make
clear the classificatory structure inherent in the relationships among items
displayed. For example, editions of a
particular textual work could be organized in an outline display that follows
the structure suggested by traditional filing rules arrangements:
•
Book editions
• Editions arranged in some systematic
manner, e.g., by date
•
Recordings
• CDs
• Cassettes
•
Special text formats
• Large print
• Braille
•
Revisions, updated editions, ...
•
Translations
• Language 1 translations
• Language 2 translations
•
Works about the work
In the online environment,
classified arrangements such as this would make it possible to display large
numbers of records on a single screen.
Further, displays that show classificatory structure explicitly are less
likely to obscure items of interest to users, so long as the content of the
classes is clear.
3.3
Library Classification
Schemes
Library
classification schemes frequently provide classified arrangements of the works
of particular authors and the editions of particular works. Because documents in most library collections
are assigned classifications numbers, classification numbers have a great
potential to organize author and work records in online catalogs
automatically. Here, the Universal
Decimal Classification (UDC), the Dewey Decimal Classification (DDC), and the
Library of Congress Classification (LCC) are analyzed to discover classified
arrangements provided to editions of works and works of authors. In each scheme, I scanned selected tables
manually. Because of the logistical
problems of examining the entire contents of all three classification schemes
manually, I limited the analysis to general provisions for all subjects, i.e.,
the UDC auxiliary tables and the DDC tables, and to tables for religion and
literature only.
3.3.1 Universal Decimal
Classification
The
UDC (International Medium ed., English ed., 2nd ed.) auxiliary tables and main
tables in religion (2) and literature (8) were scanned manually for instances
of author and work classifications. The
results of this analysis show that of the three classification schemes analyzed
here, the UDC makes the most generous provisions for organizing editions of
works and works of authors. Common
auxiliary subdivisions may be used to organize any item classified, regardless
of its subject content. This
organization includes:
• Multilingual or polyglot editions (=00)
• Originals or their adaptations untranslated (=02)
• Original
versions (=021)
• Adapted,
edited, amended versions (=025)
• Translations (=03).
Common
auxiliary subdivisions allow groupings of editions of works and the works of
authors by language (=1/=9).
Additionally, UDC makes extensive provisions in the special auxiliary
subdivisions for grouping specific types
of works by:
• physical
or external form ((0.02))
• method of
production ((0.03))
• stage of
production ((0.04))
• particular kinds
of user [audience] ((0.05))
• level of
presentation: low-level, elementary, popular exposition ((0.062-0.064))
• availability
((0.067-0.068))
• presence of
supplementary matter ((0.07))
• publication of
separately issued supplement or part ((0.08)).
Special auxiliary subdivisions
identify pictorial or graphic elements of documents ((084)) and various
physical formats of documents ((086)).
All of these type-of-work/audience and element-of-work subdivisions could
facilitate the construction of organized author and work displays. Furthermore, many of them correspond to the
categories of characteristics users identified in the research reported in
section 2 above.
One
particular work, the Bible (22), is given an extensive
classification of its own. It is represented by 92 principle
divisions, including Old Testament, Pentateuch, Genesis, Poetic books of the
Old Testament and by twenty special auxiliary subdivisions, including textual
criticism, concordances, texts, and paraphrases. In Literature (82), UDC makes special
provisions for classifying the works of literary authors. The following organization is applied to the
works of these authors:
• Kinds of edition (82...A/Z, Works of
specific authors)
• In
original language (.01)
• In
bilingual or annotated edition (.02)
• Translated
text. Transliterated (e.g. romanized) text (.03)
• Abridgments. Paraphrases (.05)
• Interpretations. Recensions (.07)
• Language
and style (of author or work) (.08)
• Sources
of the work. Author's source material
(.091)
• Continuations,
sequels (with different title or by other writers) (.092)
• Vindications. Apologias (.093)
• Complete works or sets (82...A/Z1)
• Incomplete collections (82...A/Z2)
• Selections.
Anthologies (82...A/Z3)
• Contributions to others' works. Joint works (82...A/Z4)
• Introductions. Forewords ...
(82...A/Z411)
• Joint,
composite, collective works (82...A/Z412)
• Anonymous works attributed to an author
... (82...A/Z5)
• Works in the author's name but attributed
to others. Spurious works (82...A/Z6)
• Individual works (82...A/Z7)
3.3.2 Dewey Decimal
Classification
In
the DDC (21st ed.), the main classes of religion (200) and literature (800),
and Tables 1 and 3 were scanned manually. Dewey for Windows was searched using selected
words (editions, translations, adaptations, selections, paraphrases, criticism,
commentaries, format, revision, abridgments, spurious
works) as a check on the manual scan.
An
analysis of the DDC shows that no general provisions are made for organizing
works of authors or editions of works.
However, specialized arrangements for some individual works and authors
are provided. For example, editions and
works about the Bible (220) are organized extensively. Editions of the Bible are grouped into classes consisting of:
• original texts,
early versions, and early translations (220.4)
• subdivided
into language groupings
• modern versions
and translations (220.5)
• subdivided
into language groupings.
Further organization is provided
for English versions such as the Authorized (King James) version and works
about them, which are arranged as follows (see table, 220.5201-220.5209):
• standard editions
• concordances, indexes, dictionaries;
• special editions (e.g., annotated, study editions, editions
notable for illustrations)
• selections
• paraphrases
• history, criticism, explanation of the translation .
Separately published parts of the Bible are organized in a similar fashion
(221-229).
Editions
of other sacred works receive special arrangements as well, although not as
extensive as the arrangement for the Bible. For example, see 294.382 for Buddhist sacred
works, 294.592 for Hindi sacred works, 296.12-196.14 for Talmudic literature
and Midrash,
and 297.1224-297.1225 for the Koran.
In
literature, works by individual literary authors may be scattered according to
the language in which they were written or literary form of the work. However, an optional
arrangement collocates works by and about literary authors (see
arrangement at 822.33, William Shakespeare).
This arrangement includes classes for:
biography, sources, concordances, adaptations, complete works, partial
collections, and individual works, in addition to others. Works about individual literary authors are
grouped with the comprehensive or main number used for the author.
3.3.3 Library of Congress
Classification
In
the LCC, the religion (BL-BX) and selected literature (PN, PQ-PT) tables were
scanned manually for instances of author and work classifications. In addition, selected literature tables
(tables XXX-XLIV) were scanned. As in
the DDC, no general provisions are made for classifying the authors and
works. Instead, many special
classifications are created. However,
some of these classifications are created from internal tables (e.g., author
table at BL1316, Jain authors, or work example at BM499+, Babylonian Talmud) and many others follow similar
arrangements. These common arrangements
are summarized in the following analysis.
Arrangements
for authors are frequently composed of the following classes:
• collected works by the author, organized by
date or by editor
• translations of
collected works
• selected works
• translations of
selected works
• individual works
in the original language
• translations of
individual works arranged by language
• works about the
author, including biography and criticism.
In the special classifications
provided for individual authors, translations or editions in specific languages
pertinent to that author may have their own numbers, while translations in
other languages are grouped in a single number.
In individual arrangements, some of the classes listed above may be
missing, while other classes pertinent to that individual author are added.
Works
existing in many editions are sometimes given their own class numbers in the
LCC. Editions of and works about the Bible comprise an entire main class
(BS). The general arrangement of
editions and works about the Bible is
similar to arrangements provided for other religious texts, and includes
classes for:
• early versions
• modern texts and
versions, subarranged by language
• works about the Bible
• parts (for the Bible, separate numbers for the Old Testament and the New Testament)
Provisions for other types of
works are similar to the list provided under authors above,
which includes classes for:
• editions in the
original language subarranged by editor or date
• translations, subarranged by language
• parts, subarranged either alphabetically or in the order provided
in the work
• works about the work,
including history, commentary, and criticism.
Special classes
are occasionally provided for adaptations, paraphrases, excerpts, and
selections.
4. BOOK NUMBERING SCHEMES
Book
numbering schemes, in particular, Cutter numbers, also
make general provisions for classifying authors and works. An individual Cutter number under a class
number may be used to represent a single author. Editions of works by that author may then be subarranged by the addition of letters at the end of the
Cutter number designating particular works.
Subarrangements for the individual works may
then be created by adding numbers or letters representing specific editions or
translations of the work, or works about the work. In the LCC, author and work arrangements are
sometimes provided entirely by Cutter numbering (Comaromi,
1981).
5. AUTOMATIC GROUPING OF AUTHOR AND WORK
RECORDS
A
variety of methods making use of the data in existing bibliographic records may
organize records for the works of an author or the editions of a work in an
online catalog automatically. Of all of
the existing data elements in bibliographic records, classification numbers may
prove the most effective for this task.
Faceted classification schemes, such as UDC, that provide facet
indicators with unique notation for types of works, language of text, etc.,
could be used to organize author and work records into groups with great
accuracy. Classifications in schemes
that provide a series of numbers for particular authors and works would also
provide accurate groupings automatically.
For
classifications that separate the works of an author by subject area or that
separate works by a particular author from works about that author,
classification numbers could be used together with author names appearing in
author fields in MARC records to create displays of author records
automatically. These displays, organized
in broad subject categories, could be offered as an alternative to alphabetical
title listings. For example, LCC numbers
could be used for a display of Albert Schweitzer's works arranged in major
subject groupings such as philosophy, religion, history, political science,
music, literature, science, and medicine.
Displays constructed from DDC numbers for literary authors could
organize literary works by authors into language and literary form
groupings.
Classification
numbers in combination with book numbers may prove the most effective tool for
organizing editions of works automatically in collections cataloged with Cutter
numbers. Uniform titles, which are
provided in AACR2-constructed bibliographic records to identify editions of
works, have not been assigned with any consistency because of their optional
status in AACR2. However, even if
uniform titles are not assigned, classification numbers in combination with
book numbers often identify the editions of a work uniquely and may make it
possible to organize the editions of a particular work automatically. Cutter numbers would have to be used with
MARC author fields because Cutter numbers by themselves do not identify authors
uniquely. Thus a combination of
classification number, Cutter number, and author name could be used to organize
the works of an author and the editions of a work automatically in online
catalogs.
Holdings
information included in bibliographic records may also be used to help organize
records automatically. Physical
arrangements in libraries are noted frequently in location information in
holdings fields, so that age and format groupings may be made automatically in
libraries whose holdings information reflect these
characteristics.
6. CONCLUSION
The
power and versatility of classification have the potential to be invaluable
assets in the improvement of IRSs. That potential has not yet been
realized. As Bryce Allen said in his
recent work, Toward a User-Centered
Approach to Information Systems, "One feature of information devices
that has not always been used to its best advantage in meeting ... information
need is the classified arrangement of knowledge" (1996: 159). This is particularly true of classification's
role in meeting information need with respect to authors and works. At present, the disorganization of current
online catalog displays may cause users to abandon their searches, leaving the
catalog frustrated and confused. Even
experienced catalog users may be thwarted in their searches for particular
authors and works by the lack of organization that exemplifies current online
catalog displays. Classification may
have as critical a role in the task of improving catalog displays as any other
aspect of library and information science.
7. Works Cited
Allen, Bryce (1996). Toward
a user centered approach to information systems.
Ayres, F. H. (1990). 'Duplicates and other
manifestations: a new approach to the
presentation of bibliographic information',
Journal of Librarianship 22:4 (1990), pp. 236-251
Ayres, F.H., Nielsen, L.P.S., Ridley, M.J., and Torsun,
I.S. (1995). The Bradford
OPAC: A new concept in bibliographic
control. British Library R & D
Report 6183.
Carlyle, Allyson (1996). 'Ordering author and work records: an evaluation of collocation in online
catalog displays', Journal of the American Society for Information Science 47:7 (1996), pp. 538-554
Carlyle, Allyson (1997). 'Fulfilling the second objective in the
online catalog: schemes for organizing
author and work records into usable displays',
Library Resources & Technical
Services 41:2 (1997), pp. 79-100
Comaromi, John P.
(1981). Book numbers: a historical study and
practical guide to their use.
Fattahi, Rahmatollah (1996). 'Super records: an approach toward the description of works
appearing in various manifestations', Library Review
45:4 (1996), pp. 19-29
Larson, Ray
R. (1991). 'Classification clustering,
probabilistic information retrieval, and the
online
catalog', Library Quarterly 61:2
(1991), pp. 133-173
Lin, Xia (1997). 'Map displays for information retrieval', Journal of the
American Society for Information Science
48:1 (1997), pp. 40-54
Rowley, Jennifer (1992). Organizing knowledge: an introduction
to information retrieval. 2nd
ed.
Wiberley, S. E.,
Daugherty, R. A., & Danowski, J. A. (1995). 'Displaying online catalog postings: LUIS', Library Resources & Technical Services 39:3 (1995), pp. 247-264
Publication
Information:
"The Role of
Classification in the Creation of Author and Work Displays in Online
Catalogs." In Knowledge
Organization for Information Retrieval, Proceedings of the Sixth International
Study Conference on Classification Research, Held at