The library community has a long history of working to achieve standardization in various aspects of bibliographic control, including cataloging rules, filing rules, and, in the past, even the size of the catalog card. With the advent of the computerization of bibliographic control, the tradition of standardization has continued, being applied to such things as formats, character sets, and sorting rules. Indeed, the potential for sharing bibliographic data that is possible through computerization is an even more compelling reason for standardization.
One of the aspects of computerization in which standardization plays an important role, particularly in a shared environment, but is not often discussed, is the normalization conventions applied to data by various systems. In a sense, this may be a case of the "medium" affecting the "message." What is involved is the treatment, common or otherwise, of certain fundamental aspects of data by various systems--such aspects as capitalization, diacritical marks, special symbols. A different treatment of these aspects by different systems may result in the same character string being processed differently. This is of crucial importance in an enterprise such as the building of a common authority file, since the whole objective of the operation is to treat the same things alike, to distinguish uniquely those entities that are different, and to ensure that each unique entity is represented only once. Thus the "normalization" of data has direct bearing on:
In authority systems, one is really talking about the "normalization" conventions applied to headings and references. In normalization, the data are reduced to what is judged, by common agreement, to be the bare essentials--what is to be regarded and what is not. Conventions for treating data consistently usually entail:
a) in some cases translating to a single letter equivalency (e.g. treat Polish "el" as "l"); in some cases to a multiple letter equivalency (e.g. treat the digraph "ae" as the separate letters "a" and "e";
b) in some cases translating to a blank (e.g. treat "!" as a blank; treat "&" as a blank);
c) in some cases ignoring instead of translating (do not regard an apostrophe);
d) in some cases treating a character as is (e.g. #).
If the multiple systems participating in building the common authority file adopt the same normalization conventions, then normalized forms perform at least two functions:
It is the normalization conventions that make it possible to build a common authority file through the use of interconnecting systems. When normalization rules for indexing are the same among systems, a user can issue the same search on a local system and a target system and expect to retrieve the same type of results. To the extent the rules are different, searching records among systems becomes more difficult. When normalization rules for uniqueness are the same, a user can direct a search to a target system for a heading not in the local system and, if the heading is found, expect it to be acceptable for adding to the local system's files as a unique heading. To the extent the rules are different, sharing of records among systems becomes more difficult.
In 1983 the agencies (LC, RLIN, WLN, and later OCLC) engaged in what was then called NACO/LSP planning (LSP = Linked Systems Project) agreed to a table of conversions that states the conventions for treating diacritics, punctuation, etc. for normalizing heading strings. These conventions are used to carry out machine comparisons with other headings in the name authority file to assure heading uniqueness. All the institutions plan some day to have machine checks for uniqueness based on this normalization. In the interim, systems without such automatic checks must rely on catalogers to understand which characters do and do not differentiate one heading from another.
Planning continued and in about 1985 there was agreement with respect to rules on conflicts, stated as follows in an email message from Sally McCallum, LC Network Development/MARC Standards Office, to Ed Glazier, RLIN:
These specifications were translated into software that RLIN uses to run against newly created records or newly modified records. The results are transmitted daily to LC for investigation and resolution of problems. For 1996, the total number of problems received by LC was 12,935 (an average of 1,078 per month). Of these, 4,119 (32%) were reported as duplicates and 8,806 (68%) were reported as conflicts (see Table 1).
| January '96 | |||
| February | |||
| March | |||
| April | |||
| May | |||
| June | |||
| July | |||
| August | |||
| September | |||
| October | |||
| November | |||
| December | |||
| Total: |
The following condition reflects an aspect of LC's experience with the current normalization conventions applied by RLIN to the name authority file as described in the previous section. This condition currently requires considerable resources both in initially creating authority records and in maintaining them. Thus it appears prudent to explore any possible changes that might reduce the time and energy now spent on this aspect of authority work.
Tag not part of uniqueness of heading.
Condition:
Under current specifications, the heading type part of the tag is not taken into account when determining the uniqueness of a heading. Therefore, persons (X00), corporate names (X10, X11), geographic names (X51), and uniform titles (X30) are all compared against one another for conflict. This very broad approach has several untoward consequences, namely: extensive searching on the part of catalogers, a need to qualify an increasing number of headings (sometimes with less than satisfactory results in terms of the heading itself or accessing methods), and a considerable amount of time spent in resolving conflicts not initially identified at the cataloging stage. This condition will require the use of increased time and resources as the authority file grows. (There are already more than 6.5 million 1XX and 4XX fields in the Name Authority file.) When the name and subject authority files are combined, the problem will become even worse.
To better understand the "texture" of the problems reported as conflicts, staff in LC's Cataloging and Policy Support Office analyzed the error conditions reported as conflicts for the seven days of December 2, 4, 9, 12, 1996 and January 8-9, 13, 1997. The results are given in Table 2. For these seven days, the total number of conflict situations reported was 112. Those within the same tag entity (e.g. X00/X00, X10/X10, X30/X30) are investigated and there is an attempt to resolve them. The ones that are for separate entities that go across tags (e.g. X00/X10, X00/X51) are currently ignored except those that involve the X30 tags. (However, when the same entity, e.g. a personal name, is tagged both X00 and a different tag such as X10, there is an attempt to correct the mistagging.) Of 112 total conflicts for the seven days analyzed, 33 (29%) involved X30 tags, of which 22 (19%) went across tags and require an attempted resolution. These figures represent only the maintenance dimension; they in no way provide a measure of the amount of time and energy catalogers spend initially applying the current conventions for what is regarded as conflict.
The number of conflicts identified suggests that it is worth pursuing adjustments in the comparison universes for what are seen as conflicts with a view to reducing the time spent
| X00/X00 | |||||
| X00/X10 | |||||
| X00/X51 | |||||
| X10/X10 | |||||
| X11/X10 | |||||
| X11/X11 | |||||
| X30/X00 | |||||
| X30/X00/X10 | |||||
| X30/X00/X10/X51 | |||||
| X30/X10 | |||||
| X30/X11 | |||||
| X30/X11/X10 | |||||
| X30/X30 | |||||
| X30/X51 | |||||
| X51/X51 | |||||
| Total |
AACR2 is somewhat inconsistent in what kinds of headings should be compared against one another for conflicts. AA25.5B specifies that a qualifier be added to the heading for a uniform title to distinguish it from an identical or similar heading for a person or corporate body or from an identical or similar uniform title used as a heading or reference. However, the rules for conflict for persons require a qualifier only if the heading is the same as that of another person, and the rules for corporate bodies require it only for conflict with another corporate body.
Possible change:
One approach to this condition is to consider changing the normalization rules such that the last two numbers of the tag would be taken into account to determine uniqueness, i.e., X00s would be compared against other X00s, X10s against other X10s, etc. If this solution were adopted, it would probably be desirable to change AA25.5B to parallel the guidelines for other types of headings. This would then put the cataloging rules and the normalization rules in synchronization. As an interim position, LC is ignoring error reports that go across tags, except for X30s (because of AA25.5B).
Note, however, that although the suggested change alleviates the problem from the perspective of reallocating cataloging resources (time now spent on resolving conflicts could be redirected to time spent on creating new headings or doing other maintenance), there still remain conditions in the catalog that would be confusing to users. These will occur when the same character string occurs as a heading and a reference for different entities, for example, between persons and corporate bodies or between titles and corporate bodies.
Another impact would be that whereas now the software identifies mistagging of the same entity (e.g. the same personal name is tagged X00 and also X10), this would no longer occur if the last two numbers of the tag were taken into account to determine uniqueness.
The person/body condition can be illustrated as follows:
Condition in the authority file:
100 Ada
110 American Dental Association
410 ADA
Possible display under certain conditions:
Ada [heading for person]
ADA [reference for body]
American Dental Association
The title/corporate name condition can be illustrated as follows:
Condition in the authority file:
130 Irlande
151 Ireland
451 Irlande
Possible display under certain conditions:
Irlande [heading for title]
Irlande [reference for body]
Ireland