THE ROLE OF CLASSIFICATION IN THE CREATION OF AUTHOR AND WORK DISPLAYS IN ONLINE CATALOGS

 

1.         INTRODUCTION

            The role of classification in information retrieval is often assigned to subject attributes of documents.  However, classification in its broadest sense, grouping by attribute, has potential uses in information retrieval systems (IRSs) beyond its use in grouping subjects.  The purpose of this paper is to explore methods by which classification may enhance online catalog interface design by organizing author and work records in online catalogs.

            One of the major problems affecting online catalog design is the large retrieval set problem (e.g., Wiberley, Daugherty & Danowski 1995).  Larson (1991) and others promote the use of library classification schemes such as the Library of Congress Classification (LCC) to cluster bibliographic records in online catalogs when retrieval sets from subject searches are large.  The shorter displays resulting from clustering would provide users with an overview of the range of items retrieved, organizing them by subject or discipline.  An emphasis on overview-type displays has appeared in general IRS research as well as online catalog research.  Lin (1997), for example, has developed a system that presents graphic, map-like displays of terms to give users a picture of term distributions among documents.

            Although author and work searches are frequently dismissed as being non-problematic, the large retrieval set problem is associated with them as well.  Searches for works by William Shakespeare, for example, may retrieve hundreds or thousands of records, even in relatively small online catalogs.  Large numbers of records representing well-known authors and works may be retrieved in a search, and these records are often presented in poorly organized displays (Carlyle 1996).  Ayres (1990) suggests that the problem of record duplication in bibliographic utilities may result in part from displays that do little to aid users in identifying records sought.    Fattahi (1996) has developed a possible solution in  a prototype system incorporating a "super record" approach, which presents organized displays of authors and works (see:  http://wilma.silas.unsw.edu.au/students/RFATTAHI/super.htm).  Researchers at the University of Bradford are also working on implementing an online catalog that collocates the editions of a work (Ayres, F.H. et al. 1995).

            This paper analyzes the role of classification in the design of organized displays for author and work records, which include records works by particular authors and editions of particular works as well as records for works about them.  Classification may be employed to enhance author and work displays in a variety of ways.  User classifications may be used to discover the common ways in which people organize editions of works and works of prolific authors for themselves.  Traditional library classifications, including physical document arrangements, catalog filing rules, and library classification schemes, also offer methods for organizing author and work records that may be used to enhance online catalog displays.

 

2.         USER CLASSIFICATIONS

            One method of improving IRS design is to build systems that reflect the way users themselves view and manage information.  In the case of author and work displays, the organizational schemes should address how users  organize author and work documents for themselves.

            In July 1996, I conducted a study exploring how people organize items related to works.  Fifty participants for the study were solicited in a shopping mall and asked to group 47 documents related to Charles Dickens' A Christmas Carol, including hard cover and paperback editions, nonbook editions, and works about A Christmas Carol.  Grouping was to be based on two criteria:  1)  item similarity and 2) ability of the groups to help participants find items at a later time.  When finished, participants wrote down a name and a brief description for each group including the characteristics they used to create the groups.  Data analysis examined two types of data:  the frequency with which any two items appeared in the same group, and participants' written group names and descriptions.  The frequency data were analyzed quantitatively using cluster analysis to determine common groupings used by participants in the study.  The written data were analyzed qualitatively using content analysis to discover categories of characteristics used by participants to create their groups.

2.1       Cluster Analysis

            Cluster analysis calculates the frequency with which any two items have appeared in the same group, and then clusters into groups, one step at a time, items that have been placed most frequently in the same groups by participants.  The groups resulting from the cluster analysis are presented below.   Thirteen groups appear in this clustering, which discriminates relatively finely among items.

 

     audios (cassettes and CDs)    children's illustrated hard cover materials

     children's videos                                trivia book

     adult videos                                       picture book versions with lots of text

     large format paperbacks                    picture book versions with not much text

     small format paperbacks                    activity versions (piano book, Advent calendar)

     foreign language materials       about A Christmas Carol

     adult hard cover materials

2.2       Content Analysis

            Content analysis was performed on the names and descriptions of groups assigned by participants in the study.  Group names and descriptions were analyzed to discover the categories of characteristics used by participants in the study.  The categories of characteristics used for grouping follow.  Each is accompanied by a sample of participant descriptions in parentheses following the name of the characteristic.  Except for the first four categories, which were the most frequently mentioned by participants, the categories are not in any particular order.

 

     physical format   (books, tapes, cd)

     audience characteristics   (kids, adults, sophisticated readers)

     pictorial elements of items   (ghosts or spirits on the covers, minimal illustrations)

     language of text  (foreign, Japanese, different version than English)

     content , form, topic description   (focus on specific characters, detailed book, poem book, play, trivia book, has background factual materials on the text)

     emotion or feeling  (fun, mysterious, serious, scary, I would enjoy reading them)

     textual characteristics, both graphic and content   (you got to squint to read it, smaller print, book that say Scrooge on them, every 5 lines ... marked.. as in a poem)

     length of item   (short, all pretty long)

     where, why, or how to use item; setting   (for relaxation, read to children, for theater)

     creator, performer characteristics (authors other than Charles Dickens, Disney type story, has a different author [from Charles Dickens])

     aesthetic quality of item    (beautiful, dull covers, more modern, interesting covers)

     Christmas Carol published in a collection   (multiple stories)

     'odds and ends'  (miscellaneous, other, odds and ends)

     age or authenticity of content    (older original stories, more original text, modern version)

     physical characteristics   (smooth, thin, largest books, more pages, pocket books)

     difficulty or complexity level   (simple version, unabridged copies, abbreviated versions)

 

3.         LIBRARY CLASSIFICATIONS

            Libraries classify documents and records representing documents in a variety of ways using methods that can also be used to enhance author and work displays.  Three methods are analyzed here:  physical document arrangements, catalog filing rules, and library classification schemes.

3.1  Physical Document Arrangements

            Libraries frequently group documents physically based on audience age level and document format (Rowley 1992: 471).  For example, American public libraries often divide their collections into adult, young adult, and children's sections.  All types of libraries group items according to physical format, with separate sections for books, videos, and sound recordings, for example.  These two major characteristics used for grouping documents physically correspond directly to two of the categories of characteristics used most frequently by participants in the user classification study reported above.  Although the correspondence is not surprising, the results of the study support the usability of physical document arrangements currently employed by libraries.

3.2  Catalog Filing Rules

            Catalog filing rules frequently make use of classification in hierarchical arrangements provided for the display of author and work records.  In recent research, I analyzed various codes of filing rules to discover the groupings that were commonly used to organize author and work records (for a complete presentation of the filing rules analysis, see Carlyle 1997).  The analysis revealed that many of the codes of filing rules provided similar types of groupings.  Groupings commonly used for the editions of a work include,

for works:                                                         for the works of an author:

 

     editions in the original language                         complete works of the author

     analytical entries                                               selected works

     translations                                                      selections

     related works                                                  single works (filed as above)

     works about the work                                      works about the author. 

            Current codes of filing rules provide less classificatory structure than they did previously.  One of the motivations for eliminating highly classified arrangements was the widespread adoption of the online catalog.  Early online catalogs had great difficulty creating the complex classified arrangements that earlier codes of filing rules required.  In addition, changes in bibliographic records themselves would have been required to create arrangements such as these online.  It was deemed more cost effective to simplify the rules than to instigate the changes necessary to create classified displays.  An additional incentive to avoid classified arrangements was the confusion they sometimes caused card catalog users.  Classified arrangements are difficult to see in the card environment unless guide cards alert users to the non-alphabetical structure.  Because guide cards are seldom used in card catalogs, filing rules that created complex classified structures were seen as detrimental to catalog use. 

            Current technology, however, allows us to exploit the classificatory structure present in relationships among author and work records without the risks accompanying the card environment.  Graphic display programs may be used to create outline or tree structure displays that make clear the classificatory structure inherent in the relationships among items displayed.    For example, editions of a particular textual work could be organized in an outline display that follows the structure suggested by traditional filing rules arrangements:

 

         Book editions

              Editions arranged in some systematic manner, e.g., by date

         Recordings

              CDs 

              Cassettes

         Special text formats

              Large print               

              Braille

         Revisions, updated editions, ...

         Translations

              Language 1 translations

              Language 2 translations

         Works about the work

In the online environment, classified arrangements such as this would make it possible to display large numbers of records on a single screen.  Further, displays that show classificatory structure explicitly are less likely to obscure items of interest to users, so long as the content of the classes is clear.

3.3  Library Classification Schemes

            Library classification schemes frequently provide classified arrangements of the works of particular authors and the editions of particular works.  Because documents in most library collections are assigned classifications numbers, classification numbers have a great potential to organize author and work records in online catalogs automatically.  Here, the Universal Decimal Classification (UDC), the Dewey Decimal Classification (DDC), and the Library of Congress Classification (LCC) are analyzed to discover classified arrangements provided to editions of works and works of authors.  In each scheme, I scanned selected tables manually.  Because of the logistical problems of examining the entire contents of all three classification schemes manually, I limited the analysis to general provisions for all subjects, i.e., the UDC auxiliary tables and the DDC tables, and to tables for religion and literature only.

3.3.1    Universal Decimal Classification

            The UDC (International Medium ed., English ed., 2nd ed.) auxiliary tables and main tables in religion (2) and literature (8) were scanned manually for instances of author and work classifications.  The results of this analysis show that of the three classification schemes analyzed here, the UDC makes the most generous provisions for organizing editions of works and works of authors.  Common auxiliary subdivisions may be used to organize any item classified, regardless of its subject content.  This organization includes:

 

     Multilingual or polyglot editions (=00)

     Originals or their adaptations untranslated (=02)

           Original versions (=021)

           Adapted, edited, amended versions (=025)

     Translations (=03). 

            Common auxiliary subdivisions allow groupings of editions of works and the works of authors by language (=1/=9).  Additionally, UDC makes extensive provisions in the special auxiliary subdivisions for grouping specific types of works by:

     physical or external form ((0.02))

     method of production ((0.03))

     stage of production ((0.04))

     particular kinds of user [audience] ((0.05))

     level of presentation: low-level, elementary, popular exposition ((0.062-0.064))

     availability ((0.067-0.068))

     presence of supplementary matter ((0.07))

     publication of separately issued supplement or part ((0.08)). 

Special auxiliary subdivisions identify pictorial or graphic elements of documents ((084)) and various physical formats of documents ((086)).  All of these type-of-work/audience and element-of-work subdivisions could facilitate the construction of organized author and work displays.  Furthermore, many of them correspond to the categories of characteristics users identified in the research reported in section 2 above.

            One particular work, the Bible  (22), is given an extensive classification of its own.  It is represented by 92 principle divisions, including Old Testament, Pentateuch, Genesis, Poetic books of the Old Testament and by twenty special auxiliary subdivisions, including textual criticism, concordances, texts, and paraphrases.  In Literature (82), UDC makes special provisions for classifying the works of literary authors.  The following organization is applied to the works of these authors:

 

     Kinds of edition (82...A/Z, Works of specific authors)

           In original language (.01)

           In bilingual or annotated edition (.02)

           Translated text.  Transliterated (e.g. romanized) text (.03)

           Abridgments.  Paraphrases (.05)          

           Interpretations.  Recensions (.07)

           Language and style (of author or work) (.08)    

           Sources of the work.  Author's source material (.091)

           Continuations, sequels (with different title or by other writers) (.092)

           Vindications.  Apologias (.093)

     Complete works or sets (82...A/Z1)

     Incomplete collections (82...A/Z2)

     Selections.  Anthologies (82...A/Z3)

     Contributions to others' works.  Joint works  (82...A/Z4)

           Introductions.  Forewords ...  (82...A/Z411)    

           Joint, composite, collective works (82...A/Z412)

     Anonymous works attributed to an author ...  (82...A/Z5)

     Works in the author's name but attributed to others.  Spurious works (82...A/Z6)

     Individual works  (82...A/Z7)

3.3.2    Dewey Decimal Classification

            In the DDC (21st ed.), the main classes of religion (200) and literature (800), and Tables 1 and 3 were scanned manually.  Dewey for Windows was searched using selected words (editions, translations, adaptations, selections, paraphrases, criticism, commentaries, format, revision, abridgments, spurious works) as a check on the manual scan. 

            An analysis of the DDC shows that no general provisions are made for organizing works of authors or editions of works.  However, specialized arrangements for some individual works and authors are provided.  For example, editions and works about the Bible (220) are organized extensively.  Editions of the Bible are grouped into classes consisting of: 

 

     original texts, early versions, and early translations (220.4)

           subdivided into language groupings

     modern versions and translations (220.5)

           subdivided into language groupings. 

Further organization is provided for English versions such as the Authorized (King James) version and works about them, which are arranged as follows (see table, 220.5201-220.5209):

 

     standard editions

     concordances, indexes, dictionaries;

     special editions  (e.g., annotated, study editions, editions notable for illustrations)

     selections

     paraphrases

     history, criticism, explanation of the translation .

 

Separately published parts of the Bible  are organized in a similar fashion (221-229). 

            Editions of other sacred works receive special arrangements as well, although not as extensive as the arrangement for the Bible.  For example, see 294.382 for Buddhist sacred works, 294.592 for Hindi sacred works, 296.12-196.14 for Talmudic literature and Midrash, and 297.1224-297.1225 for the Koran. 

            In literature, works by individual literary authors may be scattered according to the language in which they were written or literary form of the work.  However, an optional arrangement collocates works by and about literary authors (see arrangement at 822.33, William Shakespeare).  This arrangement includes classes for:  biography, sources, concordances, adaptations, complete works, partial collections, and individual works, in addition to others.  Works about individual literary authors are grouped with the comprehensive or main number used for the author. 

3.3.3    Library of Congress Classification

            In the LCC, the religion (BL-BX) and selected literature (PN, PQ-PT) tables were scanned manually for instances of author and work classifications.  In addition, selected literature tables (tables XXX-XLIV) were scanned.  As in the DDC, no general provisions are made for classifying the authors and works.  Instead, many special classifications are created.  However, some of these classifications are created from internal tables (e.g., author table at BL1316, Jain authors, or work example at BM499+, Babylonian Talmud) and many others follow similar arrangements.  These common arrangements are summarized in the following analysis.

            Arrangements for authors are frequently composed of the following classes:

 

     collected works by the author, organized by date or by editor

     translations of collected works

     selected works

     translations of selected works

     individual works in the original language

     translations of individual works arranged by language

     works about the author, including biography and criticism. 

In the special classifications provided for individual authors, translations or editions in specific languages pertinent to that author may have their own numbers, while translations in other languages are grouped in a single number.  In individual arrangements, some of the classes listed above may be missing, while other classes pertinent to that individual author are added.

            Works existing in many editions are sometimes given their own class numbers in the LCC.  Editions of and works about the Bible comprise an entire main class (BS).  The general arrangement of editions and works about the Bible is similar to arrangements provided for other religious texts, and includes classes for:

 

     early versions

     modern texts and versions, subarranged by language

     works about the Bible

     parts (for the Bible, separate numbers for the Old Testament and the New Testament)

Provisions for other types of works are similar to the list provided under authors above, which includes classes for:

 

     editions in the original language subarranged by editor or date

     translations, subarranged by language

     parts, subarranged either alphabetically or in the order provided in the work

     works about the work, including history, commentary, and criticism. 

 

Special classes are occasionally provided for adaptations, paraphrases, excerpts, and selections.

4.         BOOK NUMBERING SCHEMES

            Book numbering schemes, in particular, Cutter numbers, also make general provisions for classifying authors and works.  An individual Cutter number under a class number may be used to represent a single author.  Editions of works by that author may then be subarranged by the addition of letters at the end of the Cutter number designating particular works.  Subarrangements for the individual works may then be created by adding numbers or letters representing specific editions or translations of the work, or works about the work.  In the LCC, author and work arrangements are sometimes provided entirely by Cutter numbering (Comaromi, 1981).

 

 

5.         AUTOMATIC GROUPING OF AUTHOR AND WORK RECORDS

            A variety of methods making use of the data in existing bibliographic records may organize records for the works of an author or the editions of a work in an online catalog automatically.  Of all of the existing data elements in bibliographic records, classification numbers may prove the most effective for this task.  Faceted classification schemes, such as UDC, that provide facet indicators with unique notation for types of works, language of text, etc., could be used to organize author and work records into groups with great accuracy.  Classifications in schemes that provide a series of numbers for particular authors and works would also provide accurate groupings automatically. 

            For classifications that separate the works of an author by subject area or that separate works by a particular author from works about that author, classification numbers could be used together with author names appearing in author fields in MARC records to create displays of author records automatically.  These displays, organized in broad subject categories, could be offered as an alternative to alphabetical title listings.  For example, LCC numbers could be used for a display of Albert Schweitzer's works arranged in major subject groupings such as philosophy, religion, history, political science, music, literature, science, and medicine.  Displays constructed from DDC numbers for literary authors could organize literary works by authors into language and literary form groupings. 

            Classification numbers in combination with book numbers may prove the most effective tool for organizing editions of works automatically in collections cataloged with Cutter numbers.  Uniform titles, which are provided in AACR2-constructed bibliographic records to identify editions of works, have not been assigned with any consistency because of their optional status in AACR2.  However, even if uniform titles are not assigned, classification numbers in combination with book numbers often identify the editions of a work uniquely and may make it possible to organize the editions of a particular work automatically.  Cutter numbers would have to be used with MARC author fields because Cutter numbers by themselves do not identify authors uniquely.  Thus a combination of classification number, Cutter number, and author name could be used to organize the works of an author and the editions of a work automatically in online catalogs.

            Holdings information included in bibliographic records may also be used to help organize records automatically.  Physical arrangements in libraries are noted frequently in location information in holdings fields, so that age and format groupings may be made automatically in libraries whose holdings information reflect these characteristics. 

 

6.         CONCLUSION

            The power and versatility of classification have the potential to be invaluable assets in the improvement of IRSs.  That potential has not yet been realized.  As Bryce Allen said in his recent work, Toward a User-Centered Approach to Information Systems, "One feature of information devices that has not always been used to its best advantage in meeting ... information need is the classified arrangement of knowledge" (1996: 159).  This is particularly true of classification's role in meeting information need with respect to authors and works.  At present, the disorganization of current online catalog displays may cause users to abandon their searches, leaving the catalog frustrated and confused.  Even experienced catalog users may be thwarted in their searches for particular authors and works by the lack of organization that exemplifies current online catalog displays.  Classification may have as critical a role in the task of improving catalog displays as any other aspect of library and information science. 

 

 

7.         Works Cited

 

Allen, Bryce  (1996).  Toward a user centered approach to information systems.  San Diego:  Academic Press

 

Ayres, F. H.  (1990).  'Duplicates and other manifestations:  a new approach to the presentation of bibliographic information',  Journal of Librarianship  22:4 (1990), pp.  236-251

 

Ayres, F.H., Nielsen, L.P.S., Ridley, M.J., and Torsun, I.S. (1995). The Bradford OPAC:  A new concept in bibliographic control.  British Library R & D Report 6183.  West Yorkshire:  British Library Research and Development Department.

 

Carlyle, Allyson  (1996).  'Ordering author and work records:  an evaluation of collocation in online catalog displays',  Journal of the American Society for Information Science  47:7 (1996), pp.  538-554

 

Carlyle, Allyson  (1997).  'Fulfilling the second objective in the online catalog:  schemes for organizing author and work records into usable displays',  Library Resources & Technical Services  41:2 (1997),  pp. 79-100

 

Comaromi, John P.  (1981).  Book numbers:  a historical study and practical guide to their use.  Littleton, Colorado:  Libraries Unlimited

 

Fattahi, Rahmatollah  (1996).  'Super records:  an approach toward the description of works appearing in various manifestations',  Library Review  45:4 (1996), pp.  19-29

 

Larson, Ray R.  (1991).  'Classification clustering, probabilistic information retrieval, and the

            online catalog',  Library Quarterly  61:2 (1991), pp.  133-173

 

Lin, Xia  (1997).  'Map displays for information retrieval',  Journal of the American Society for Information Science  48:1 (1997), pp. 40-54

 

Rowley, Jennifer  (1992).  Organizing knowledge:  an introduction to information retrieval.  2nd ed.  Hants, England:  Ashgate

 

Wiberley, S. E., Daugherty, R. A., & Danowski, J. A. (1995).  'Displaying online catalog postings:  LUIS',  Library Resources & Technical Services  39:3 (1995), pp. 247-264

 

 

 

 

Publication Information:

"The Role of Classification in the Creation of Author and Work Displays in Online Catalogs." In Knowledge Organization for Information Retrieval, Proceedings of the Sixth International Study Conference on Classification Research, Held at University College London, 16-18 June 1997. The Hague, Netherlands: International Federation for Information and Documentation, 1997: 90-96.