CREATING EFFICIENT AND SYSTEMATIC CATALOGS:  LUBETZKY'S SECOND OBJECTIVE AND EMIPIRICAL INVESTIGATIONS OF AUTHORS AND WORKS

 

by

ALLYSON CARLYLE

 

 

 

 

1.         Introduction

 

The intellectual challenge stimulated by the study of descriptive cataloging is matched by few topics in library and information science.  As a student entering library school at UCLA in 1982, I had no idea that cataloging would provide such a rich area of study.  I was, however, quickly enlightened, and my life changed as a result.  Seymour Lubetzky was largely responsible for this change in that it was his work, which has provided the conceptual foundation of the study of descriptive cataloging in this century, that fascinated me first.  I was fortunate to have been introduced to descriptive cataloging by Betty Baughman, who worked with Seymour Lubetzky in the development of the cataloging courses at UCLA.  The combination of her excellent teaching and the challenges posed by Lubetzky's analysis of cataloging problems drew me and my research to the heart of descriptive cataloging.

 

Central to Lubetzky's thought is the notion of a catalog as "a systematically designed instrument in which all entries, as component parts, must be properly integrated."  (Lubetzky, 1969: p. 3).  One important means by which a catalog becomes such an instrument is in meeting the second objective of the catalog.  The second objective of the catalog, most clearly articulated by Lubetzky in his Code of Cataloging Rules:  Author and Title Entries.  An Unfinished Draft, states that a catalog must "relate and display together the editions which a library has of a given work and the works which it has of a given author." (Lubetzky, 1960: p. ix.)

 

In this paper I briefly review my research, which has as its focus the second objective; in particular, the organization of author and work records in online catalogs.  My dissertation (1994), published in summary form in the Journal of the American Society for Information Science (1996), "Ordering Author and Work Records:  An Evaluation of Collocation in Online Catalog Displays," examines the effect of various system features on the collocation of author and work records in online catalogs.  "Fulfilling the Second Objective in the Online Catalog:  Schemes for Organizing Author and Work Records Into Usable Displays", published in Library Resources & Technical Services (1997), investigates how codes of filing rules and Barbara Tillett's bibliographic relationship taxonomy (1991) may be used to help organize author and work records more effectively in online catalogs.  "The Role of Classification in the Creation of Author and Work Displays in Online Catalogues," delivered at the Sixth International Study Conference on Classification Research (1997),  investigates the methods by which library classification schemes have organized author and work records.  Current research, "User Categorisation of Works:  Toward Improved Organisation of Online Catalogue Displays" (in press), looks at the characteristics people use for grouping the editions and works related to a particular work. 

 

 

2.         Ordering Author and Work Records:  An Evaluation of Collocation in Online Catalog Displays

 

2.1       Introduction

Computerization has vastly expanded the catalog's power to retrieve records.  It has, at the same time, confounded catalog designers' attempts to create sensible displays.  In my search for a dissertation topic, I was intrigued that no study had ever been done to determine how well a catalog of any kind fulfilled the second objective.  I also suspected that the computerization of catalogs had an effect on the ability of catalogs to fulfill the second objective.  As a result, I decided that my dissertation research would consist of a survey of online catalogs posing the question:  What is the effect of online catalog features on the collocation of author and work records in online catalogs?

 

Several measures were used to analyze the effect of online catalog system features on collocation in online catalogs.  Five worst-case authors (Homer, William James, H.D., Alice Walker, and Peter Gray) were searched in eighteen sample online catalogs using author commands available in those catalogs.  Five worst-case works (Paradise Lost, John Milton; A Christmas Carol, Charles Dickens; Ulysses, James Joyce; Sonnets, William Shakespeare; Utopia, Sir Thomas More) were searched in the same catalogs using title commands.  The worst-case method was used because worst cases were seen as more likely to bring out the weaknesses of online catalogs than a random sample of searches. 

 

Dependent variables to measure the collocation of author and work records included: number of interruptions of author and record sets, number of irrelevant intervening records retrieved, and precision.  Independent variables representing system features analyzed for their effects on collocation included match type (character-string vs. keyword) and catalog size (large, medium, or small). 

 

The authors and works selected for sample searches, as mentioned above, were examples of worst cases.  Worst cases were defined as authors and works that had many records in online catalogs and that were represented by a variety of record types.  Sample catalogs were selected based on catalog vendor (major vendors were selected), availability via Internet, size (three sizes of catalog per vendor were selected:  small, medium, and large), and collection characteristics (at least 75 percent of retrospective conversion to online catalog complete; English language, general library collection; located in the United States).

 

2.2       Selected Results and Discussion

Descriptive statistics were used to analyze the data collected in this study as the selection of worst case searches and sample catalogs was not random.  Because sample sizes were relatively small and many of the standard deviations were large, the median was reported instead of the mean.

 

Match type, that is, left-to right phrase matching, here called character string matching, versus keyword matching, was investigated for its effect on collocation.  Character-string matches performed better than keyword matches for most searches.  When author searches were measured using number of interruptions, the results were dramatic (Figure 1).  Only ten percent of character-string matches had three or more interruptions.  The median number of interruptions for keyword matches (3.5) was greater than the median for character-string matches (2.0).  That most character-string matches had two or fewer interruptions indicated that in most cases character-string matches accomplished the goal of collocating author record sets.

 

 

[Figure 1:  Match Type (Authors):  No. of Interruptions about here]

 

 

The effect of match type on number of interruptions in work record sets was less clear, although character-string matches still performed better than keyword matches (Figure 2).  Seventy percent of character-string matches for work record sets had three or more interruptions, as opposed to 83 percent of keyword matches.  The median number of interruptions for character-string matches was four and keyword, five. 

 

 

[Figure 2:  Match Type (Works):  No. of Interruptions about here]

 

 

 

These results were not unexpected, particularly with respect to author record sets.  Character-string matches matched a search string in a single field.  Most character-string matches arranged records with identical author headings together, ensuring collocation of author record sets.  Although author commands with character-string matches were available in every catalog surveyed, author commands with keyword matches were not.  This was perhaps because systems designers assumed the superiority of character-string matches for searching for individual authors.  The results of this research support such an assumption.

 

The finding that character-string matches performed as poorly as keyword matches in collocating worst-case work record sets was somewhat unexpected.  However, because uniform titles for works are not required in AACR2, it was not surprising that work records did not collocate well.  Also, a work is, by definition, determined by the contents of two fields, an author and a title field, as opposed to an author, which is determined by the contents of a single field, an author field.  The level of complexity engendered by the additional field, the title field, may itself have had an effect on record arrangement.  This variable, record structure, was not studied.  What was unexpected was that character-string matches differed so little from keyword matches in achieving collocation, especially considering that keyword matches often arranged records in essentially random (record number) order, and character-string title matches almost always arranged records in alphabetical order by title.

 

Catalogs searched were selected to be representative of catalog databases of different sizes.  Small catalogs contained fewer than 299,999 bibliographic records, medium catalogs contained between 300,000 and 999,999 bibliographic records, and large catalogs contained more than 1,000,000 records.

 

Catalog size had a much smaller effect on collocation of author and work records than may have been expected; only about half of the results in the study showed an impact.  Catalog size had the smallest effect on collocation of author record sets.  Measured by the number of irrelevant records intervening in an author work set (no. irrelevant intervening records), very little effect was seen (Figure 3).  Although catalog size had some effect on the collocation of work record sets, when measured by precision, that effect was negligible (Figure 4).  

 

 

[Figure 3:  Catalog Size (Authors):  No. Irrelevant Intervening Records about here]

 

 

[Figure 4:  Catalog Size (Works):  Precision about here]

 

 

The finding that catalog size had an effect on collocation only in about half the searches performed was surprising for two reasons.  First, two of the measures used in the study, number of interruptions and number of irrelevant intervening records, were directly related to the number of records retrieved.  One would expect that increasing numbers of records would be retrieved in small, medium, and large catalogs, respectively, and that measures based on record numbers would reflect that increase.  It is also reasonable to expect that number of interruptions would increase as number of records retrieved increases, and that precision would be lower in a large catalog search than a small catalog search.         

 

That author record sets were little affected by catalog size was perhaps not so surprising when one examines the performance of the system variables.  For author searches, the variables that determined collocation most strongly were those associated with match type; character-string matches collocated author records more successfully than did keyword matches.  Since catalogs of all sizes had character-string and keyword matches, one might predict that catalog size would not be an important factor.

 

The finding that catalog size did not have a powerful influence on collocation has implications for catalog maintenance and cataloging policies in small and medium-sized catalogs.  Cataloging folklore purports collocation to be better in smaller catalogs because not as many records exist to interrupt a related record set.  For example, Anglo-American Cataloguing Rules, 2nd ed., 1988 Revision (AACR2) Rule 1.0.D, which provides catalogs options for three different levels of description, and Rule 25.1.A, which allows catalogers the option not to use uniform titles, are evidence that smaller catalogs have been seen as having different requirements from larger catalogs.  The findings of this research indicate that this assumption may be incorrect.  Smaller catalogs may not be exempt from collocation problems, particularly for worst cases.  They may require use of uniform author names and uniform titles as much as a larger catalog.

 

 

3.         Fulfilling the Second Objective in the Online Catalog:  Schemes for Organizing Author and  Work Records Into Usable Displays 

 

3.1 Introduction

 

A study of filing rules as schemes for display came about as a result of both my dissertation research and Lubetzky's perception of the importance of the catalog as a "systematically designed instrument."  I had attempted a study of filing rules for my doctoral qualifying exam, which, for various reasons, did not work out.  However, an initial analysis of filing rules informed my thinking as I collected data for my dissertation, which in turn informed my thinking further with respect to filing rules.  What followed from this interaction is the research described in this section.

 

3.2       Filing Rules as Schemes for Display

 

Catalog displays as constructed by codes of filing rules are frequently highly organized, consisting of a variety of categories or groups of similar records.  An historical analysis of codes of filing rules discovered the following common categories:

 

Work Categories

     editions of the work in the original language

     analytics, that is, editions of the work contained within collections

     translations

     special classes of materials, including selections and manuscripts

     works about the work

 

Author Categories

     complete works

     selected works

     selections from a single work or from various works

     single works

     spurious and doubtful works

     works about the author

 

While these categories may be used to create systematic displays in online catalogs, they fall short with respect to works, particularly when viewed in the light of Tillett's bibliographic relationship theory (1991).  They do not, for instance, clearly distinguish or identify works related to a work, nor do they clearly distinguish or identify sequential relationships.

 

3.3       Tillett's Taxonomy of Bibliographic Relationships as a Scheme for Display

 

Tillett's taxonomy of bibliographic relationships (1991) and Smiraglia's refinement of the derivative relationship (1992) were analyzed for their potential contribution to the creation of systematic displays of work records in the catalog.  The analysis suggested the following interpretation of the bibliographic relationships taxonomy to be used as a basis of a scheme for display organization:

 

     equivalence relationships, including:

           equivalent texts, which share identical content and authorship    

           near equivalents, which in addition to identical content and authorship,

                        share other characteristics as well

     derivative relationships, including:

           revisions          

           adaptations

           translations

           extractions

           amplifications

     whole-part relationships

     sequential relationships

     descriptive relationships

     shared characteristic relationships

 

Using this scheme as a basis for online catalog work displays also has limitations, particularly, the lack of a distinction between derivations whose intellectual or artistic content are close to the original edition and those whose intellectual or artistic content are not.

 

3.4       A Relationship-Based, Organized Scheme for Display of Author and Work Records in the Online Catalog

 

Analysis of the strengths of the filing rules scheme and the bibliographic relationships scheme led to the proposal of a new, organized scheme for display of author and works records in online catalogs based on relationships among items.  The proposed scheme also incorporated records that could be retrieved in keyword searching which might or might not be related to the author or work searched, including records for items which might be only peripherally related to them (see Figures 5 and 6).

 

 

 

[Figure 5:  An Organized Display for Works about here]

 

 

 

[Figure 6:  An Organized Display for Authors about here]

 

 

 

4.         The Role of Classification in the Creation of Author and Work Displays in Online Catalogues

 

Other organizational schemes that could be used to improve online catalog displays for author and work records are library classifications.  Library classification schemes such as the Universal Decimal Classification (UDC), the Library of Congress Classification (LCC) and the Dewey Decimal Classification (DDC) organize authors and works associated with many items into specific classes, each with its own notation.  In this research I analyzed the types of classes used in selected religion and literature schedules and in auxiliary tables in the UDC, the DDC, and the LCC. 

 

Classes identified correspond closely to the types of groupings created by the codes of catalog filing rules.  Commonly occurring classes for authors in the classification schemes included:

 

     complete works of the author

     partial collections or selected works

     individual works

     biographies, criticism, concordances, etc.

 

Commonly occurring classes for works in the classification schemes included:

 

     editions in the original language, sometimes including groups for early versions, translations, annotated editions, and sequels

     translated editions, sometimes including a special group for bilingual editions

     auxiliary materials, including concordances, indexes, dictionaries, sources

     parts or selections

     adaptations, paraphrases

     works about the work, including history, commentary, criticism, etc.

 

Classification numbers, possibly in combination with book numbers such as Cutter numbers, which further refine the groupings of authors and works on the shelf, might be used to create summary or grouped author and work displays automatically in online catalogs.  Further research is necessary to determine whether or not automatic grouping would organize records successfully.

 

 

5.         User Categorisation of Works:  Toward Improved Organisation of Online Catalogue Displays

 

The design of online information systems, including online catalogs, should respond effectively to user needs and searching behavior.  The last research project I review here investigated how people organize items related to a work.  In this research project, fifty study participants were solicited in a shopping mall in Akron, Ohio and asked to divide 47 editions and works related to Charles Dickens' A Christmas Carol into groups.  Items in the study included hardcover and paperback versions, translations, children's versions, including picture book versions, videorecordings of motion pictures and animated film versions, sound recordings, a trivia book, and an advent calendar.  Participants were asked to sort the items into groups based on their similarity to each other and the ability of the groups to help the participants find the items at a later time.  Any number of groups was allowed, so long as it was more than one.  When participants were finished with the sorting task, they were asked to name and describe in writing each of the groups they had created.  The project produced two types of data:  written descriptions and grouping data. 

 

Written descriptions were analyzed using content analysis to discover the types of characteristics that were used to sort items in the study.  Eleven types of characteristics were discovered.  In the list below, types of characteristics are listed with sample participant descriptions in parentheses following each type.

 

     physical format  (hard back books, VCR tapes, little kid tapes)

     audience  (youth, sight impaired, grown up people, piano players)

     content description  (play, more involved plots with more details,

            short version)

     pictorial elements  (animated, cartoon pictorial, had a mans face

            on the front, color artwork, dull covers)

     usage  (could be read by small group for presentation, theater,

            for relaxation, fun, dull)

     language  (foreign language, Spanish, non-English)

     physical characteristics  (medium size, largest books, thick hard

            bind)

     content age, integrity  (unabridged, abbreviated versions,

            classic, original text-line)

     textual characteristics  (big print, book [sic.] that say

            Scrooge on them)

     creator, performer  (produced other than Charles Dickens,

            Disney type story,  adapted by other author's take from original)

     'odds & ends'  (alone!, miscellaneous, other)

 

The grouping data were analyzed using cluster analysis (this part of the study has not yet been published).  Preliminary analysis indicated the common groups listed below: 

 

     audios (cassettes and CDs)        

     children's videos                         

     adult videos                                 

     large format paperbacks            

     small format paperbacks            

     foreign language materials         

     adult hard cover materials

     illustrated hard cover materials (children's)

     trivia book

     picture book versions with lots of text

     picture book versions with not much text

     activity versions (piano book, Advent calendar)

     item about A Christmas Carol

 

The research described here is exploratory; the findings indicate possible types of groups and characteristics that may be useful for organizing online catalog displays.  Every work is unique, and the group of editions and works about a particular work equally as unique and individual.  Further research using different works, and different types of works, for example, non-literary works, is necessary to make generalizations regarding what groups and types of characteristics are commonly associated with people's perceptions of works.  In addition, further research is required to determine the impact of grouping in online catalog displays on user searching, and whether or not grouping based on user categories is more effective than grouping based purely on relationships among items, such as that suggested by the filing rules and bibliographic relationships taxonomy analysis.

 

 

6.         Conclusion

 

The extent to which a library catalog fulfills the second objective affects cataloger users every day.  In my doctoral program at UCLA, I supported myself by working as a librarian at the Beverly Hills Public Library.  One day at the information desk I got a long distance telephone call from a woman in Arizona who was looking for an edition of The Haunted Pool by George Sand.  She had telephoned four libraries in two states and reported that none of these libraries had an edition of the work she was seeking.   My cataloging experience told me that this was unlikely, so I carefully browsed the titles by Sand in the Beverly Hills catalog and found several records for editions entitled The Devil's Pool.  After looking at a MARC record for one of these editions, I explained to her that the work she was seeking, La Mare au Diable, was most often translated into English under the title The Devil's Pool, and that she should phone her own public library again and ask for this title instead of the title The Haunted Pool.

 

I submit that it should not take five telephone calls to libraries in two states and a cataloger at the information desk for a patron to find an edition of a work held commonly in American libraries.  The catalog should be, as the Paris Principles state, "an efficient instrument for ascertaining ... which works by a particular author and which editions of a particular work are in the library." It should not be a roadblock, preventing users and librarians alike from finding the items they seek.  Lubetzky's life's work was to make library catalogs efficient and systematically designed instruments; unfortunately, as the George Sand story illustrates all too clearly, much has yet to be done to accomplish this task.

 

What stimulates Lubetzky's work also inspires my own:  a recognition of the potential of library catalogs to be effective and intelligible instruments which help users to discover valuable resources that they may not have known about before, and which show the relationships present among the items held in the library clearly and unambiguously.  I am grateful for Lubetzky's invaluable contribution to the literature and theory of cataloging.  It sparked my interested in a topic that has become the center of my professional life, and it remains a source of inspiration and guidance.  I am proud to follow in the footsteps of the most important cataloging scholar of this century.

 

 

7.         List of Cited Works

 

Anglo-American cataloguing rules, 2nd edition, 1988 revision.  (1988).  Ottawa:  Canadian Library Association.

 

Carlyle, Allyson (1996).  "Ordering Author and Work Records:  An Evaluation of Collocation in Online Catalog Displays." Journal of the American Society for Information Science.  47 (7):  538-554.

 

----- (1997). "Fulfilling the Second Objective in the Online Catalog:  Schemes for Organizing Author and Work Records into Usable Displays."  Library Resources & Technical Services.  41 (2):  79-100.

 

----- (1997).  "The Role of Classification in the Creation of Author and Work Displays in Online Catalogues."  in Knowledge Organization for Information Retrieval:  Proceedings of the Sixth International Study Conference on Classification Research, held at University College London 16-18 June 1997.  The Hague, Netherlands:  International Federation for Information and Documentation (FID):  90-96.

 

----- (In press).  "User Categorisation of Works:  Toward Improved Organisation of Online Catalogue Displays." Journal of Documentation.

 

Lubetzky, Seymour (1960).  Code of Cataloging Rules, Author and Title Entry:  An Unfinished Draft For A New Edition of Cataloging Rules Prepared for the Catalog Code Revision Committee.  [S.l.]:  American Library Association (March).

 

----- (1969).  Principles of Cataloging, Final Report, Phase I:  Descriptive Cataloging.  Los Angeles:  Institute of Library Research, University of California.

 

Smiraglia, Richard.  (1992).  Authority Control and the Extent of Derivative Bibliographic Relationships.  Ph.D. diss., University of Chicago.

 

Tillett, Barbara B. (1991).  "A Taxonomy of Bibliographic Relationships." Library Resources & Technical Services. 35 (2):  150-158.


 

 

Distribution of Match Type (Authors):  No. of Interruptions

Match Type (Authors):  No. of Interruptions

Interrupts

Chr.Str.

Percent

Keyword

Percent

0 to 2

81

90%

26

42%

3 to 5

7

8%

8

13%

6 to 8

2

2%

4

6%

9 to 11

0

0%

0

0%

12 up

0

0%

24

39%

 

Statistics for Match Type (Authors):  No. of Interruptions

Match Type (Authors): No.of Interrupts

STATISTIC

CHAR.STRING

KEYWORD

Mean

1.71

17.15

Standard Error

.12

2.91

Median

2.00

3.50

Mode

2.00

2.00

Standard Dev.

1.18

22.94

Variance

1.40

526.03

Range

7.00

110.00

 

 

Figure 1--Match Type (Authors):  No. of Interruptions

 


 

 

 

Distribution of Match Type (Works):  No. of Interruptions

Match Type (Works):  No. of Interruptions

Interrupts

Chr.Str.

Percent

Keyword

Percent

0 to 2

27

30%

15

17%

3 to 5

36

40%

37

41%

6 to 8

18

20%

17

19%

9 to 11

6

7%

8

9%

12 up

3

3%

13

14%

 

Statistics for Match Type (Works):  No. of Interruptions

Match Type (Works): No.of Interruptions

STATISTIC

CHAR.STRING

KEYWORD

Mean

4.67

6.49

Standard Error

0.36

0.60

Median

4.00

5.00

Mode

2.00

3.00

Standard Dev.

3.42

5.73

Variance

11.66

32.81

Range

19.00

36.00

 

 

Figure 2--Match Type (Works):  No. of Interruptions

 


 

 

Distribution of Catalog Size (Authors):  No. Irrel. Int. Recs.

Catalog Size (Authors): No. Irrelevant Intervening Records

Int. Recs.

Small

Percent

Medium

Percent

Large

Percent

0 - 9

40

84%

38

78%

38

70%

10 - 19

0

0%

0

0%

4

7%

20 - 29

0

0%

0

0%

0

0%

30 - 39

0

0%

1

2%

0

0%

40 - 49

0

0%

1

2%

1

2%

50 up

8

16%

9

18%

11

20%

 

Statistics for Catalog Size (Authors):  No. Irrel. Int. Recs.

Catalog Size (Authors): No. Irrelevant Intervening Records

STATISTIC

SMALL

MEDIUM

LARGE

Mean

26.96

84.98

187.46

Standard Error

10.10

31.42

65.08

Median

0.00

0.00

0.00

Mode

0.00

0.00

0.00

Standard Dev.

70.69

219.93

478.24

Variance

4996.91

48369.94

228717.61

Range

366.00

956.00

2101.00

 

Figure 3--Catalog Size (Authors):  No. Irrelevant Intervening Records


 

 

Distribution of Catalog Size (Works):  Precision

Catalog Size (Works):  Precision

Precision

Small

Percent

Medium

Percent

Large

Percent

.8 - 1

2

3%

0

0%

0

0%

.6 - .79

2

3%

2

3%

0

0%

.4 - .59

5

8%

3

5%

6

10%

.2 - .39

13

22%

13

22%

20

33%

0 - .19

38

63%

42

70%

34

57%

 

Statistics for Catalog Size (Works):  Precision

Catalog Size (Works): Precision

STATISTIC

SMALL

MEDIUM

LARGE

Mean

.22

.17

.19

Standard Error

.03

.02

.02

Median

.14

.12

.17

Mode

.17

.13

.17

Standard Dev.

.22

.16

.14

Variance

.05

.03

.02

Range

.99

.74

.51

 

Figure 4--Catalog Size (Works):  Precision


 

 

 

WORK NAME / AUTHOR NAME

 

            Editions:

                       Books   

                       Recordings 

                       Large print, Braille, ...

                       Work Name published with other works

 

                       Revisions, updated editions

                       Translations

 

                       Parts, selections, ...

 

      Adaptations & Related Works

                       Abridgements, simplified versions, summaries

                       Sequels, supplements

                       Videos, motions pictures

                       Musical versions

                       Pictures or other images

                       Multimedia, computer versions

                       Indexes, concordances

                       Miscellaneous

 

      Works about Work Name

 

      Items probably related to Work Name

 

      Items that may or may not be related to Work Name

 

      Other works by Author Name

 

 

 

Figure 5--An Organized Display for Works

 


 

 

 

AUTHOR NAME

 

            Single Works:

                       Work names A - H        

                       Work names I - O         

                       Work names P - Z        

 

            Collected Works

 

            Selections from Author Name's works

 

            Spurious and doubtful works

 

            Works about Author Name

 

 

            Items probably related to Author Name

 

            Items that may or may not be related to Author Name

 

            Works by the same/related author:   Author Name 2

 

 

 

Figure 6--An Organized Display for Authors

 

 

Reproduced by permission from The Future of Cataloging: Insights from the Lubetzky Symposium, edited by Tschera Harkness Connell and Robert L. Maxwell.

© 2000 by The American Library Association.