In the sections that follow two collaborative tools will be examined: DocReview (see ) and Research Webs (see ). DocReview is a web-based tool that allows readers of documents to become reviewers. This critical capability allows the collaborators to correct, expand, and refine the documents. DocReview is an integral part of Research Web Essays, the principal textual tool of the Research Web.
The Research Web is a customizable collaborative environment that permits the research team in a long-term, large-scale enterprise to examine an issue domain thoroughly. The Research Web (RW) has a WWW site that serves as the repository of the team’s corporate memory and research results. Tools available include a basic set that includes scholarly services of an annotatable bibliography and glossary, and an augmented web page format used for research essays. It incorporates any tool that the team finds necessary to its mission, provided that tool can be made web-compatible. Research Webs are unique, and for that reason may best be examined as case studies.
Author, authoring team: the owner(s) of a document. |
Five sets of DocReviews have been selected for detailed quantitative analysis in order to examine several propositions (see §5.1.6) that arise from the basic research questions. The basis of selection was their similarity to DocReviews that might be found in active Research Webs. These DocReviews were all under the control of a knowledgeable facilitator (the author). These 101 DocReviews contained 1929 review segments that attracted 294 comments. These comments were coded into 767 Bales codes (see Table V) and 425 Meyers argumentation codes (see Table VI). The data was mounted in a relational database to support data conditioning and analysis. Analysis was performed with a spreadsheet program.
Two of the selected sets of DocReviews were the minutes of 59 meetings. The meetings were task-oriented meetings with an attendance averaging six members, with occasional participation of others by telephone. The minutes were quite comprehensive and averaged two pages of text. DocReview was integrated into the meeting routines by directing the attendees to review the minutes on the WWW before the next meeting. At the next meeting the scribe would distribute copies of the minutes with commentary inserted inline. The scribe would then explain how the minutes were revised in light of received annotations, and the team would then approve the minutes or suggest other changes. Usually this discussion was over in two or three minutes, thus saving considerable meeting time.
One set of seven DocReviews was sections of a draft of a professional paper. The paper was divided into seven sections in order to reduce the time required for each reviewing session. In reducing the review time, the busy schedules of the reviewers could accommodate the small time slices. The reviewers were professional colleagues of the author, some of who were involved in the design of DocReview. The author found the annotations very useful and most were incorporated into the final draft of the paper.
Another set of DocReviews was 19 workshop position papers for the 1999 conference on Computer-Supported Cooperative Learning (CSCL). Reviewing position papers was seen as an excellent application of DocReview from the beginning of design. In practice it lived up to its presumed promise. Perhaps the greatest impact was not intellectual, but in opening networking channels.
The final set of DocReviews was a set of 17 documents, Research Web Essays, written for a Research Web for the issue domain of chromium (CrVI) contamination on the Hanford Nuclear Reservation. The set was quite successful in accomplishing the objective of refining the initial versions of the documents, each of which centered on one aspect of the contamination.
5.1.1 Research Questions
The major research questions and the propositions derived from those
questions are:
A. How does the behavior of dialog using DocReview compare to
dialog that is face-to-face?
B. How should DocReview be segmented in order to maximize the
effectiveness of the participants?
C. How does the design of DocReview serve the research team?
D. How does the quality of the document being reviewed affect
the participation in the review?
A program written by the author extracts and formats data from the files mentioned above. The program (makecsv.pl) creates several comma separated variable (.csv) files suitable for import into a database and thence to a spreadsheet program for the analysis. This program also does a word count on the base document and each of the document's review segments.
The analyst supplements two of the .csv files in order to add
information that cannot be automatically extracted. A file
(docrev.csv), that captures attributes of each DocReview, is augmented by
including a description of the DocReview and a document type attribute
designed to indicate the degree of quality, or the degree of completeness,
of the document. This attribute is entered as a number from 1 to 5
and is defined as:
The coder modifies the comments.csv file to add both the Bales codes
(Interpersonal Process Analysis) and Meyers codes (Structurational
Argumentation).
Base Document
The document that is prepared for DocReview
is called the base document. It varies in size, and quality (the
degree of development). Very large base documents are usually broken
into sections, each a DocReview, in order to allow the usually busy
reviewers to complete a section at one sitting.
Data collected on the base document for a DocReview includes a word count, the document type (quality), the sponsor (author), and date of creation of the DocReview. The text of the base document is also available.
Base Document Size (word count) | ||||
Characteristic | All | Type 2 | Type3 | Type 4 |
Mean | 459.26 | 135.61 | 465.47 | 798.82 |
Median | 422 | 130 | 469 | 598 |
Standard Deviation | 325.27 | 91.259 | 196.71 | 695.66 |
Sample Variance | 105801 | 8328 | 38693 | 483946 |
Kurtosis | 20.77 | -0.59 | 2.49 | 5.44 |
Skewness | 3.44 | 0.48 | 0.88 | 2.22 |
Range | 2647 | 309 | 1279 | 2657 |
Minimum | 10 | 10 | 140 | 206 |
Maximum | 2657 | 309 | 1279 | 2657 |
Count | 100 | 13 | 76 | 11 |
Table III Words in Review Segments
Review Segment Size (word count)
|
||||
Characteristic
|
All Types
|
Type 2
|
Type 3
|
Type 4
|
Mean | 24.89 | 35.60 | 21.09 | 73.86 |
Median | 14 | 8 | 14 | 65 |
Standard Deviation | 28.19 | 43.33 | 20.41 | 55.23 |
Sample Variance | 794 | 1877 | 417 | 3051 |
Kurtosis | 17 | 1.5 | 5.6 | 4.7 |
Skewness | 3.2 | 1.4 | 2.1 | 1.7 |
Range | 306 | 165 | 158 | 306 |
Minimum | 2 | 3 | 2 | 2 |
Maximum | 308 | 168 | 160 | 308 |
Count (Number of segments) | 1822 | 48 | 1656 | 118 |
The sample variances of both the raw data and a logarithmic transformation are too heteroscedastistic to perform a reliable analysis of variance. A null hypothesis of no differences between the three document types cannot be rejected. Examination of the means and standard deviations points out and obvious difference between types 3 and 4. This conclusion is supported by the nature of the genres represented: type 4 documents are drafts of conventional papers dominated with paragraph-long segments; and type 3 documents are dominated by meeting minutes composed of short segments such as action items and list bullets.
Comments
Each review segment attracts a set of comments,
usually an empty set. The set may include not only comments on the
review segment, but also comments on the other comments on the review
segment. The comments are entirely free form, either text or HTML,
and may include emphasis, paragraphing and even images.
Data collected on comments includes: the text of the comment, a word count, the name of the commentator, the commentator's e-mail address, the time and date, and the qualitative coding of the comment, both Bales codes (see §5.1.4.1) and Meyers codes (see §5.1.4.2). Due to the unrestricted length of comments, the unit of analysis for coding purposes must usually be a fragment of the message. The Bales codes were assigned to comments by dividing multi-sentence comments into written equivalents of speech acts. These fragments, as noted by Henri cannot be rigidly determined, but must be parsed out based on the analytic objectives (Henri 1991, 126). This same conclusion is seen in Meyers, et.al., where the units were "complete thoughts," rather than words or turns (Meyers, et.al. 1991, 53). Occasionally, there are additional, usually social, meanings that can be read into the commentary. For example, the wording of a comment may contain aggressive or supportive intent.
Comment Size (word count)
|
|||||
Characteristic
|
All Types
|
Type 2
|
Type 3
|
Type 4
|
General
|
Mean | 31.83 | 34.80 | 21.49 | 54.46 | 30.73 |
Median | 19 | 22.5 | 12.5 | 43 | 12.5 |
Standard
Deviation |
37.61 | 36.98 | 26.79 | 48.01 | 36.77 |
Sample Variance | 1414 | 1367 | 717 | 2305 | 1352 |
Kurtosis | 14.6 | 1.2 | 40.3 | 7.9 | 0.8 |
Skewness | 3.1 | 1.5 | 5.1 | 2.2 | 1.5 |
Range | 289 | 122 | 256 | 288 | 124 |
Minimum | 1 | 3 | 1 | 2 | 1 |
Maximum | 290 | 125 | 257 | 290 | 125 |
Number of Comments | 233 | 20 | 148 | 65 | 40 |
In analysis of the DocReview commentary, it was discovered that the DocReviews of meeting minutes constituted a subset of commentary that demonstrated very random annotative behavior (see Figure VII). When the comments for meeting minutes are removed from the Type 3 comments, the sample variances are too heteroscedastistic to employ an F test, so a logarithmic transformation was attempted. The sample variances found in the transform were 1.18, 1.26, and 1.01. Using the transformed data a value of 2.65 was found for F. The value of F2,104(.10) is 2.36, so a null hypothesis of no differences between those three document types can be rejected at the .10 level.
5.1.4 Qualitative Coding Systems Any classification scheme must serve to differentiate between members
of a group of cases. In our study the cases are DocReviews, an
object that consists of a document that is partitioned into "review
segments", and a set of comments made on each segment. The number of
comments may be zero or more, and is usually zero. In uncommented
segments, the question of implied agreement must be raised. One may
be tempted to assume, since there is no limitation on
reflection, that the reviewers agree with
the review segment. Implied assent is very dangerous because it
enables power mechanisms. No comment just means that the reviewers
chose not to add to the dialog
(Sheard 2000). So how can we differentiate between the DocReviews? Certainly
there are descriptive statistics such as size of the base document, the
number of review segments, the number of comments, when the comments were
made with respect to opening the review process, the size of the
comments, and who made comments. These data were maintained in the
log files, which are features of DocReview. Beyond these physical statistics are the study of the character of the
social interactions of the review team, the interpersonal process analysis
(IPA); and the study of the efficacy of the review process, how the review
contributed to the refinement of the knowledge represented by the
document. Both the IPA and studies of efficacy can be conducted
only by analysis of the content of the annotations. Measurement of
the value of the comments to the collaboration is quite impossible in most
cases, but a qualitative categorization of comments can be done by at
least two classification schemes: an observational scheme and a scheme
based on how the comments would fit into a formal argument. We must
then code the DocReview multilogues twice, once for the social dimension
of process-orientation and again for the knowledge content dimension of
task-orientation. To analyze the interpersonal process analysis of behavior in
DocReviews, I classified the annotations using the Bales' codes
(Bales 1950, 9), a well developed and
respected tool. Analysis of how comments within a DocReview
contributed to the knowledge-building content of the document will be
conducted using a coding system based on the function of the comment from
a task-oriented viewpoint, rather than from a social viewpoint as in
IPA. The task-oriented functions are defined as the character of the
comment (or comment fragment) in a formal argumentation framework.
Meyers, Seibold and Brashers developed this coding system that was based
on, and extended from, their previous work
(Meyers, Seibold and Brashers 1991). Classification schemes need to satisfy three conditions
(Bowker and Star 1999, 10):
The coding schemes I use vary in compliance with these
desiderata. Bales codes are not complete; there is no place for
nonsense or muttering. The Bales codes are not mutually exclusive, they
instead are derived from four fairly distinct major categories that are
each divided into three quasi-ordinal codes that have very fuzzy
boundaries, e.g. what is the difference between giving information and
giving an opinion? Bales attempts to close the ambiguities in the
codes by a very thorough explanation of each
(Bales 1950, 177-195), but overlaps and gaps exist. Meyers' scheme provides a less complete guide to coding
(Meyers et.al. 1991, 54)
(appropriate for a research article as opposed to Bales' book).
Both coding schemes are well described and coders can become facile with
them in a reasonable time period. In reference to mutual
exclusivity, a continuous system like Bales' IPA, must have fuzzy
boundaries. Meyers' system is not a continuous system so is immune
from this argument. Meyers' scheme neatly solves the completeness problem with the
introduction of the category "non-arguable." Fortunately, this
category can contain no contextual knowledge, so it can safely be excluded
from our analyses. Bales asserts that his categories are made
complete and continuous by being concerned with the interaction content
rather than the topical content and by eliminating any requirement for the
observer "to make judgments of logical relevance, validity, rigor,
etc."
(Bales 1950, Chapt. 2). Correct assignment of codes could perhaps be tested by comparing actual
results from dialog in the source research and the coding of the same
material by the author. In short, such testing would require
studying intercoder reliability between the teams of Bales and Meyers and
the team (myself) that would code the annotations. Bales offers six
pages of coded dialog
(Bales 1950, 93-99). Meyers et. al. offers some short
examples. Both papers do offer good definitions of the
categories. The categories are based on dialog quite familiar to any
literate individual. A larger issue is the absence of gestural
side-channel communication (head nodding, eye-rolling) in
DocReview. As face-to-face dialog would present frequent "speech
acts" that are gestures, facial expressions, or voice tones, there will
be a loss of that dialog in the coding of DocReview annotations.
This loss may account for some of the significantly lower "social-emotive"
codes in the DocReview annotations. I can only compare DocReviews to DocReviews since there was no attempt
to set up a control review method by other means. In the DocReview
study, all DocReviews use the WWW and are thus device independent.
Usually, the participants within a given set of DocReviews are
homogeneous, though between sets, they may vary in number. The same
task is always performed: review of a document, though the nature of the
documents may change (meeting minutes, position papers). Almost all
users are invited, since most DocReviews are on intranet sites. Other than
the exceptions noted, most dependent variables are identical. Most
studies that apply IPA compare computer-mediated communication with
face-to-face communication. In a meta-analysis of studies of
computer-mediated
collaboration, McGrath
and Berdahl
(McGrath and Berdahl 1998)
make several cautionary points based on differences between face-to-face
interaction and computer-mediated interaction: studies often use different
computer systems; different kinds of participants are used; different
types of tasks are performed; and there are different patterns of
dependent variables.
Analysis of the content of the annotations must start with the selection
or invention of a qualitative classification system. Many investigators
have seen the wisdom of creating coding systems that are fitted closely to their
problem. I chose to use existing systems, thus providing a possibility
of drawing comparisons. I chose two systems, one for gauging the social
functioning of research team, and the other to show how commentary became argumentation
in the review of the document.
5.1.4.1 Interaction Process Codes The Bales Code
These codes are intended to assign speech acts, including backchannel
communication, to categories that are based on social processes rather
than substantive content. Since we are social animals, the nature of
our dialog will to a great extent determine both how we respond
emotionally to our collaborative environment and how effective that
environment is in attracting productive participation.
Commentary of hyperdocuments through DocReview can be evaluated by use of
categorization, volume and quality. DocReview comments can be
categorized by using Bales codes
(Bales 1955). Depending on the
issue domain, these codes can be used to order value
between categories. For instance, detection of errors in spelling or
grammar is a low value contribution in studies of social behavior, but a
high value contribution in the development of a manifesto or epic.
Main Categories
|
Frequency
|
Types of Acts
|
Frequency
|
Positive reactions
|
25.9%
|
Shows solidarity
|
3.4%
|
Shows tension release
|
6.0%
|
||
Shows Agreement
|
16.5%
|
||
Problem-solving attempts
|
56.0%
|
Gives suggestion
|
8.0%
|
Gives opinion
|
30.1%
|
||
Gives information
|
17.9%
|
||
Questions
|
7.0%
|
Asks for information
|
3.5%
|
Asks for opinion
|
2.4%
|
||
Asks for suggestion
|
1.1%
|
||
Negative reactions
|
11.2%
|
Shows disagreement
|
7.8%
|
Shows tension
|
2.7%
|
||
Shows antagonism
|
0.7%
|
Commentary that expresses support or disagreement is not valueless, for such commentary does influence the behavior of the author and other contributors. So most commentary is of some value, even if it is merely reinforcing the recognition of a team effort. Sadly there are comments of negative worth that occasionally emerge, such as personal attacks or senseless graffiti.
Gay et.al. and classroom discussion forums
Geri Gay and others studied the character of student contributions by
computer-mediated communication in university classes
(Geri Gay et.al.1999).
The discussion forums were conducted in CoNote, a WWW-based annotation
program functionally similar to DocReview. Gay's study included
questionnaires and observer data as well as a repository of documents and
comments thereon. Gay's codes, like Bales' codes, are not based on
the relationship of the annotation to the collaboration task, but on the
character of interpersonal activity. Content of the annotations was
organized into three categories: technical comments, affiliative comments
and advice. Presumably, a single comment could contain all
categories, but not multiple occurrences of a category. 197 comments
produced percentages of 50.3 technical, 45.2 affiliative, and 68.5 advice.
These percentages were obtained in an environment dominated by students
who came into frequent contact, thus by age and group structure more
inclined to engage in affiliative commentary than professional groups
might be.
These codes are equivalent to portions of the twelve category Bales Codes for Interpersonal Activity. The affiliative comments, which presumably could be positive or negative, would fall into one of six categories: Shows Solidarity, Shows Tension Release, Agrees, Disagrees, Shows Tension or Shows Antagonism. The technical comments would fall into the neutral task-oriented area: Gives Opinion, Gives Orientation, Asks for Orientation, Asks for Opinion. The advice category corresponds to the extreme range of the task-oriented area: Gives Suggestion and Asks for Suggestion.
5.1.4.2 Argumentation Based Codes
If research is analogous to argumentation, as Eisenhart and Borko
suggest
(Eisenhart and Borko 1993),
then a coding system that is based on the argumentation process would seem
to be a more effective alternative for characterizing task-oriented
activity than the more process-oriented Bales IPA coding. The value
of a comment fragment (coding unit) to the collaboration is more closely
related to task than process. Perhaps we can assign a value to a
specific type, or if the coder is familiar with the document, we can
actually assign an interval measure for value. Three coding systems
have been considered: informal argumentation codes, structurational
argument codes, and an observational categorization.
Informal Argumentation
In An Introduction to Reasoning, Toulmin, Rieke and Janik develop a
dialog classification based on argumentation
(Toulmin, Rieke and Janik 1979). Their system is proposed to
be the basis for development of a tool (The Landscape of Reason) to
organize dialog for the
Research Web. Argumentation is broadly defined in
this work, having a place in any "rational enterprise." As the
authors put it, "... scientific arguments are sound only to the extent
that they can serve the deeper goal of improving our scientific
understanding." Every coding unit of a comment can be assigned a type
based on this classification. The value of the comment in terms of
value to the collaboration can be established through a surrogate, the
value of the comment in the argument. There are six elements in
argumentation: claims, grounds, warrants, backing, modal qualifiers, and
rebuttals.
Structurational Argument Codes
In research on decision-making discussions in a face-to-face environment,
a set of seventeen categories describing statements in terms of their
place in argumentation was developed and used by a team that studied 45
discussions. This research had its roots in research by Toulmin (in
1958) and two other research teams in 1969 and 1980
(Meyers, Seibold and Brashers 1991, 50). I can find no subsequent application of this
coding scheme in the literature. Coding is extremely difficult, as
meanings can shift with context. The coder must be thoroughly
immersed in the argument, not just the words, but also the intent of the
words.
In Meyers et.al. discussions were analyzed with 8,408 codes produced, having the distribution given in the following table (Meyers et.al., 45). This dissertation found 425 codes in the DocReview annotations.
ARGUABLES (67.4%) |
Potential
Arguables |
Assertions |
Statements of fact or opinion. |
Propositions |
Statements that call for support, action or conference on an argument-related statement. | ||
Reason-using
Arguables |
Elaborations | Statements that support other statements by providing evidence, reasons, or other support. | |
Responses | Statements that defend arguables met with disagreement. | ||
Reason-giving
Arguables |
Amplifications | Statements that explain or expound upon other statements in order to establish the relevance of the argument through inference | |
Justifications | Statements that offer validity of previous or upcoming statements by citing a rule of logic (Provide a standard whereby arguments are weighed). | ||
REINFORCERS (13.6%) | Agreement | Statements that express agreement with another statement. | |
Agreement + | Statements that express agreement with another statement and then go on to state an arguable, promptor, delimitor, or non-arguable. | ||
PROMPTORS (2.3%) | Objection | Statements that deny the truth or accuracy of any arguable. | |
Objection + | Statements that deny the truth or accuracy of any arguable and then go on to state another arguable, promptor, delimitor or nonarguable. | ||
Challenge | Statements that offer problems or questions that must be solved if agreement is to be secured on an arguable. | ||
DELIMITORS (2.1%) |
Frames | Statements that provide a context for and/or qualify arguables. | |
Forestall/Secure | Statements that attempt to forestall refutation by securing common ground. | ||
Forestall/Remove | Statements that attempt to forestall refutation by removing possible objections. | ||
NONARGUABLES (14.5%) | Process | Non-argument related statements that orient the group to its task or specify the process the group should follow. | |
Unrelated | Statements unrelated to the group's argument or process (tangents, side issues, self-talk, etc.) | ||
Incomplete | Statements that do not provide a cogent or interpretable idea (due to interruption, stopping to think in midstream, but are completed as a cogent idea elsewhere in the transcript. |
While Meyers et.al. conclude that the structurational argumentation codes reflect both process-orientation and task-orientation (or system and structure, as they put it); the coding scheme clearly supports task-orientation much better than the Bales IPA. In terms of support to a collaborative task, some categories have more value than others.
These argument codes provide places for every element in the Toulmin informal argumentation scheme. The nonarguables Process and Unrelated are very convenient "bins" for trivial or procedural content. One of the seventeen codes is extremely unlikely to be used: the nonarguable Incomplete. The argument codes were developed to analyze transcripts of face-to-face interactions, an environment where interruptions are frequent. It is difficult to imagine how an asynchronous contribution could be interrupted; if the writer is interrupted at the terminal, then the task can be resumed when the interruption terminates.
The Meyers, et.al. study used transcripts of actual face-to-face multilogue, with recourse to videotape only when the expression needed clarification (Meyers et.al. 1991, 56). Interruption and incomplete expressions were frequent, as in normal conversation. The computer-mediated environment of DocReview will make interruption unlikely and incomplete thought rare. I expect the distribution of message fragments in DocReviews to be quite different from conversational multilogues. As McGrath and Berdahl cautioned, these differences may be due to many different factors (McGrath and Berdahl 1998); nevertheless, if the differences are great, the argument in favor of computer-mediated communication as a more reflective medium gains support.
An Observational Categorization
The author's five years of experience in the use of DocReview has led to a
potential coding system based on observation and sorting.
Interpretation and characterization of the codes are based not only the
original context of the commentary, but on assumptions of what character
the comments would take in a fully implemented Research Web.
This scheme categorizes several nominal classes of comments seen in DocReviews. It has the advantage of being completely specific to DocReviews; that is it is not time restricted, and is asynchronous, document-centric. Most DocReview review segments, especially paragraphs, will contain an assertion, a conclusion and give evidence showing how the conclusion follows from the assertion. In addition to this logical imperative (substantial) there is also the requirement to conform to appropriate standards of scholarship and presentation (formal). In the Research Web environment, the documents are also subject to both the criticism process and an editing process.
Substantial
|
Formal
|
|
Editing
Process |
|
|
Criticism
Process |
|
|
5.1.5 Qualitative Coding Reliability
In the analysis of the data, the distribution of codes in the
DocReview commentary is compared to the distribution of codes in the
studies that defined the codes. In comparing the distributions,
there is the necessary assumption that all coding would be consistent and
correct. Bales points out three sources of variation between coders:
unitizing, the correct parsing of dialog into units of analysis;
categorizing, correct assignment of codes; and attributing, the source and
target of the dialog
(Bales 1950, 101). There is, in the DocReview analysis, no
question of the source and target. Because this dissertation was not
well funded, the author did all coding, so the skill and consistency of
coding was not established by comparing the coding of dialog by
independent coders.
Unitizing is a significant source of variability. The variability in unitization is induced by uncertainty in interpretation. Some methods of unitizing are less susceptible to variability than others. Time-based unitization, segments of elapsed real time, are not subject to interpretation (Nyerges et.al. 1998, 141). Turn taking in speech dialog is more variable due to complications that arise in parsing of monologues; annotations in DocReview are essentially monologues. Parsing face-to-face dialog into speech acts (Bales) is yet more variable because there is a need for insertion of implied speech acts and gestural acts. Even more variable is the event-based coding that was used in the argumentation coding (Meyers). Nyerges et.al. chose time-based coding over event-based coding because event-based coding required at least two coding passes (Nyerges et.al. 1998).
In the Bales coding, DocReview annotations were parsed during coding into approximations of "speech acts" by dividing the annotation into phrases, sentences or a set of contiguous sentences that dealt with a single topic. Not infrequently when the coder understands both the review segment and an annotation well, implied codes emerge. One comment usual contained a few codes (mean = 2.6) sometimes as many as a dozen. This parsing is assumed to be equivalent to the turn taking of face-to-face dialog.
In the argumentation coding, the unitizing protocol used in Meyers et.al. could not be employed since their unitizing was done by two judges concurrently. As Meyers used transcripts of dialog, so I used written dialog. The unitizing rule that Meyers et.al. used was: "any statement that functioned as a complete thought or change of thought." The Meyers team coded dialog that was parsed into turns, while DocReview comments are relatively long monologues. Rather than parsing the monologue into speech acts I parsed it into argument units that might include several sentences. Such units fit well into the Meyers categories. One comment usually contained one to a few codes (mean = 1.4) sometimes as many as eight.
Coding and unitization of DocReview annotation requires the coder to place the annotation into the context of the review segment being annotated. This contextualization is done by mentally converting the annotation unit and review segment into a narrative equivalent. Unfortunately, returning to the exact same mind set is difficult for either independent judges or for the same coder repeating the coding at a later time.
5.1.5.1 Coding Reliability Tests
In order to test the reliability of the coding, it was decided to take
a 12.5% random sample of all review segments that received comments.
The author, who was the original coder of the entire set of comments, then
recoded this sample. There was no recall of the original coding.
Four sets of codes were tested for reliability: the Bales codes (twelve categories), the Bales categories (four sets of three codes each), the structurational argumentation codes (seventeen categories), and the five structurational argumentation categories derived from the seventeen codes.
Aligning codes at the beginning gives: |
acbbbca cbbbca |
If on the other hand we align like this: |
acbbbca
cbbbca
|
If such realignment is allowed it is subject to much abuse, so I allow only a shift of the entire shorter code string within the limits of the longer code string. If the code strings are of equal length, then no shifting is allowed. Any unmatched codes resulting from unequal code string lengths are removed. Both Bales and the structurational argumentation codes were conditioned this way, and the resulting conditioned data was converted to the aggregated categorical data (the four Bales categories and the five structurational argumentation categories).
5.1.5.3 Analysis
Intercoder or recoder reliability can be measured by several
methods. Cohen
(Cohen 1960) and
Landis & Koch
(Landis & Koch 1977), in their examples, use nominal categories that are
clear, complete and mutually exclusive. On the other hand Perreault
and Leigh use more qualitative (though unstated) codes
(Perreault and Leigh 1989). On
this basis, plus favorable arguments from the Meyers et.al. paper, I am
inclined to use the Perreault and Leigh measure. Since Cohen's kappa
is so widely used, I include it for comparison purposes.
The conditioned data were placed in contingency tables comparing the two coding sessions. From the contingency tables, Cohen's kappa and Perreault and Leigh's Index of Reliability were calculated for the four sets of data.
Bales codes
From the initial set of 99 Bales codes, there were 82 codes remaining in
the conditioned data. Each code could assume one of twelve
values. Comparing the two sets showed 54 pairs in agreement, 28
pairs in disagreement and 17 unmatched codes. Cohen's kappa
(Cohen 1960) for the Bales codes is
0.538, showing only moderate agreement between the two coding sessions
(Landis and Koch 1977, 165). The
Index of Reliability
(Perreault and Leigh 1989) is 0.792 with a 95% confidence level of +/-
0.088. This mediocre result, in conjunction with some very low
counts of several codes, provided the argument to use only the four Bales
categories in the analysis of DocReview annotations.
Bales categories
In analyzing the four Bales categories, each code could assume one of
four values. Comparing the two sets showed 80 pairs in agreement, 2
pairs in disagreement and 17 unmatched codes. For the Bales
categories, Cohen's kappa is 0.878, showing almost perfect agreement
between the two coding sessions. The Index of Reliability is 0.984
with a 95% confidence level of +/- 0.027.
Structurational argumentation codes
From the initial set of 70 structurational argumentation codes, there were
48 codes remaining in the conditioned data. Each code could assume
one of seventeen values. Comparing the two sets showed 21 pairs in
agreement, 27 pairs in disagreement and 22 unmatched codes. Cohen's
kappa for these codes is 0.402, showing only fair agreement between the
two coding sessions. The Index of Reliability is 0.668 with a 95%
confidence level of +/- 0.133. As with the Bales codes, there were
a large number of codes with low to zero counts.
Structurational argumentation categories
In analyzing the five structurational argumentation categories, each
code could assume one of five values. Comparing the two sets showed
28 pairs in agreement, 20 pairs in disagreement and 22 unmatched
codes. Cohen's kappa is 0.383, showing only fair agreement between
the two coding sessions. The Index of Reliability is 0.673 with a
95% confidence level of +/- 0.133.
The structurational argumentation codes were too numerous and difficult to code to produce acceptable reliability. Applying argumentation codes to analysis of DocReview annotations will require the use at least pairs of coders working together (as Meyers et.al. did). The unitization problem was extremely serious, producing almost a one third rate of no matching codes. The combination of arbitrarily long review segments and arbitrarily long annotations will demand a very clever unitization scheme to produce any hope of consistent coding.
5.1.6 Analytical Results
The proposition designations below (e.g. A2) refer to the research
questions discussed in §5.1.1. Three techniques were used to test
the propositions: Chi-squared, regression analysis, and case studies.
Four of the propositions use the Chi squared test comparing the counts of DocReview codes versus the coding distributions in the original Bales and Meyers studies. In order to normalize the sample sizes a pseudo-sample of the Bales or Meyers codes was drawn with the same distribution as in the original studies but with a size equal to the DocReview sample.
Four of the propositions were tested using single variable regression analysis. In all these cases the independent variable (X) was the word count of the base document or a review segment of the base document. In some cases the dependant variable (Y) was confounded with the independent variable. This confounding was due to the definition of effectivity as the ratio of commentary to the size of the document (effectivity = Y/X). The shape of the best fitting regression line was found to be logarithmic.
One of the propositions was a case study comparing DocReview to three other web-based annotation programs. The comparison was made on the basis of a universe of features found in all the programs.
5.1.6.1 Proposition A1. The social character of comments in DocReview differs from comments in face-to-face dialog.
One of the most important questions arising from the use of DocReview is how the nature of dialog in DocReview is different from face-to-face dialog. Fortunately we have from Bales' work a distribution of codes assembled from thousands of face-to-face speech acts. If one makes the assumption that DocReview annotation is equivalent to one side of a face-to-face dialog, and further assume that in face-to-face dialog the two participants each produce an identical distribution of coded speech acts, then we can make a valid comparison. The assumption of equivalence is strained by the odd nature of this communication: essentially the document is the source of a series of propositions. The annotation is a set of responses to the proposition presented in the review segment by the readers. This set of responses is also complicated by the not infrequent presence of commentary on other annotations.
Operationalization:
Assigning Bales codes categories to all annotations operationalizes the
social character of the comments. The Bales Interaction Process Analysis
categorizes all speech acts, including gestures, into twelve codes. The
differences between some of the Bales codes are very slight. These fine
nuances result in a high variability between coders or between coding
sessions by the same person. In order to reduce the intercoder variability
it was decided to use Bales' broader classification: categories. Bales
grouped the twelve codes into four categories that are generic and form a
good basis of comparison. These categories are: positive reactions,
problem-solving attempts, questions, and negative reactions.
Problem-solving Attempts and Questions are further generalized into a
supercategory of the task area, while Positive and Negative Reactions are
generalized into the social-emotive area.
Data conditioning:
None.
Data Analysis:
The counts of codes of the entire set of DocReview annotations by Bales
category demonstrates that DocReview annotations show a much higher degree
of task-related dialog and a much lower degree of social-emotive dialog
than is seen in face-to-face dialog. The comparisons
(DocReview/face-to-face) are: for Negative Reactions -- 0.1%/11.2%; for
Questions -- 7.3%/7.0%; for Problem-Solving Attempts -- 85.5%/56.0%; and
for Positive Reactions -- 7.0%/25.9%.
We find that the null hypothesis that there will be no difference between face-to-face and DocReview dialog when Bales coded can be rejected. With three degrees of freedom, Chi squared = 213.2. This result is significant at <0.000001.
Discussion of Findings: The very low percentages of DocReview annotation in the social-emotive area (positive and negative reactions) may show the effect of moderation in dialog induced by the reflection afforded by DocReview as opposed to the more spontaneous nature of face-to-face dialog. The similarly low, though less extreme, percentages in the positive reactions category may show that there is less need felt for social reinforcement than in face-to-face dialog. Though DocReview annotations show less positive reinforcement, the reinforcement is there, it is simply less effusive. Questions (task area: negative) show an almost identical distribution. Problem-solving attempts (task area: positive) are much higher in DocReview annotation than in face-to-face dialog. This disparity may be the result of the ability of the reader to reflect much longer than is possible in face-to-face dialog. I suggest that this is the most important finding, demonstrating the value of DocReview in problem solving.
Interpretation of Findings:
The conclusions must be tempered with the realization that there are
no gestural acts in the DocReviews and their annotations. While Bales
does not record the percentages of gestural acts captured in his research,
in his description of the codes gestures such as winks, nods, frowns, and
even blushing appear. From Bales' description of the codes one can clearly
see that most gestural acts are in the social-emotive categories. If an
arbitrary portion of the Bales social-emotive codes (comprising 37.1 % of
the total face-to-face acts) was assumed to be gestural, then in the
annotation coding the missing percentage would need to be reassigned from
the task oriented categories. This reassignment would cause the comparison
between positive task-orientation to be somewhat less marked, and the
comparison between negative task-orientation would shift from being almost
equal to somewhat less negative than in face-to-face dialog.
5.1.6.2 Proposition A2: The substantive character of comments in DocReview differs from comments in face-to-face dialog.
The substantive nature of comments in DocReview is measured by determining the intent of the comment, or a portion of the comment. Intent is defined in this analysis as what place the comment would take in argumentation.
As in the analysis of social character of the comments above in Proposition A1, we have to assume that the dialog is quite one-sided, with the document providing propositions and the readers arguing with that proposition. Clearly there can be no negotiation of meaning and the document can make no rebuttals. In terms of argumentation, then we can have but one round of argumentation, but with several people participating.
Operationalization:
Assigning Meyers structurational argumentation code categories to each
comment operationalizes the substantive character of the comments.
Data conditioning:
The raw data percentage comparisons
(DocReview/face-to-face) are: for non-arguables -- 22.6%/14.5%; for
delimitors -- 8%/2.1%; for promptors -- 23.1%/2.3%; for reinforcers --
10.3%/13.6%; and for arguable -- 36%/67.4%.
Argumentation codes in the non-arguable category in the dialog were excised. In the raw data, DocReview annotations were 22.6% non-arguable, compared to 14.5% in the Meyers study. The difference in non-arguables is attributed to the assignment of annotations frequently complaining about grammar and spelling to that category. Arguably such commentary does not contribute to productive argumentation, and furthermore such corrections are seldom made in face-to-face dialog.
Codes in the arguable class were also excised. Difficulties in adjusting for the asymmetrical nature of DocReview argumentation are simply insurmountable. In the one turn dialog, responses to propositions (the base document's review segment) are much more prevalent than responses to annotations. Responses to annotation usually requires re-reading the comments; busy participants are not likely to return to review comments, even if they are reminded by e-mail notification. This would not be the case in face-to-face argumentation.
The data conditioning leaves us with three categories of codes: Reinforcers, Promptors and Delimitors. Unfortunately the excision of troublesome categories reduces our number of data points by 58% to 176. Since the central action of argumentation is carried out in these categories, I feel that they are an adequate basis for comparison.
Data Analysis:
The conditioned data comparisons
(DocReview/face-to-face) are: for reinforcers -- 25%/75.6%; for promptors
-- 55.7%/13.3%; and for delimitors -- 19.3%/11.1%.
Comparing face-to-face distributions to the distributions found in the DocReviews shows a very strong difference in both promptors and reinforcers. There are four promptors in DocReviews for each face-to-face promptor and three face-to-face reinforcers for every DocReview reinforcer.
We find that the null hypothesis that there will be no difference between face-to-face and DocReview dialog when Meyers coded can be rejected. We find that with two degrees of freedom, Chi squared = 93.3. This result is significant at <0.000001.
Discussion of Findings:
The differences of face-to-face argumentation and DocReview annotation are
clear: people are much more inclined to suggest changes to the document in
DocReview than in face-to-face dialog; people are much less inclined to
agree with the document in DocReviews than they are in face-to-face
dialog. I see this finding as suggesting that there may be some
satisficing occurring as people are less inclined to annotate texts that
they see as not far enough wrong to complain about. The vast difference in
promptors may be explained by the nature of DocReview: documents are
mounted with the intent of drawing out errors and omissions. A portion of
the differences may also be explained by social mechanisms: it is much
easier to praise than object; and power effects may also be seen as people
are more inclined to agree with a proposition offered in a meeting
(usually by a leader).
5.1.6.3 Proposition B1: Long base documents are ineffective relative to short documents.
The lives of researchers are fragmented into scores of tasks of varying importance. This produces the need to engage in multitasking, a mosaic of activity that fills the available time with periods of variable lengths. There will be short periods to review documents, provided they are of a size that will fit into the time slot. Very long documents may encourage a shallow reading, thus shallow and short commentary.
Operationalization:
Effectivity is operationalized as the ratio of the sum of comment size to
the size of the base document. Size of comments and base documents are
both established by software that counts the words of more than three
characters. For each DocReview that attracted annotation (n = 78), the
word counts for annotations to segments that attracted annotation for each
DocReview were accumulated in one column and the word count for the
DocReview was placed in another. The DocReview word count was plotted on
the X-axis and the effectivity on the Y-Axis.
Data conditioning:
Records for DocReviews that attracted no commentary were excluded. A
DocReview with segments containing graphics was excluded due to the low
word count in the segments, and the heavy annotation of the segments. The
same DocReview contained an anomalously long general comment.
Data Analysis:
A correlation of 0.665 on the logarithmic regression line confirms the
hypothesis. With 77 degrees of freedom a value of F = 60.1 was found. As
expected the slope was negative with P = 3.27 x 10-11. The P
value of the intercept was 1.64 x 10-12. A study of DocReviews
by document type (see
§5.1.6.7) suggests that
the logarithmic relationship is even stronger among base documents that
are not meeting minutes.
Discussion of Findings:
The hypothesis is accepted. Smaller base documents produce more effective
DocReviews. This leads to the conjecture that fragmenting a very long
document will increase the effectivity of the review process. This
conjecture could be tested, but not with the data from this study.
5.1.6.4 Proposition B2: The amount of commentary received on a review segment will be directly proportional to the segment's length.
An extremely long review segment may tax the reader’s concentration, leading to a decline of effectivity. Short review segments such as list "bullets" are sharply focused and easy to grasp and critique. Due to a small denominator, the effectivity of such short segments may be inflated. The deleterious effect of long review segments is one of the basic assumptions of the design of DocReview.
Operationalization:
Size of comments and base documents are
both established by software that counts the words of more than three
characters.
Data conditioning:
Segments not attracting annotation are removed. Segments that were graphic
images were discarded. General comment segments were discarded.
Data Analysis:
A correlation of 0.235 on the linear regression line weakly confirms the
hypothesis showing a direct relationship between segment size and received
annotation. With 49 degrees of freedom a value of F = 2.80 was found.
As expected the slope was positive with P = 0.101. The P value of the
intercept was 0.391.
Discussion of Findings:
The hypothesis is accepted.
Commentary size is directly proportional to segment length; but while
larger segments attract more commentary due to the positive slope, but
they are not necessarily more effective (see §5.1.6.5) as seen by
the low value (<1.0) of the slope of the regression line.
5.1.6.5 Proposition B3: The ratio of size of comments received to size of review segment (effectivity) will decline in proportion to review segment size.
Short entries in lists and cells in tables are very sharply focused and when they attract annotation, the annotations are likely to contain more information than the entry (effectivity > 1.0). The context of lists and tables are usually quite clear and contributes to their focus. When long segments such as paragraphs receive annotation, they are likely to contain less information than the segment.
Operationalization:
Size of comments and review segments are both established by software that
counts the words of more than three characters.
Data conditioning:
For this analysis general comment segments were excluded, as they are not
focused review segments. Segments that applied to graphic images were
removed because the number of words in the graphic segment is simply the
number of words in the title, and a picture is indeed often worth a
thousand words. At this point outliers were examined and one more point
was removed. This outlier was a document section heading that drew much
commentary from the review segments within the section. Making a section
heading a review segment is an error on the part of the facilitator;
section headings are for ease of reading and are devoid of real content.
Data analysis:
The remaining segments that received comments were selected and two
columns were produced by database query: size of the segment and the
summation of the size of the commentary on the segment. This table was
imported into the spreadsheet. For each segment, the size of the
commentary was divided by the size of the segment to yield effectivity. A
column was created for the effectivity. An XY scattergram was produced
with segment size on the X-axis and effectivity on the Y-axis. A
correlation of 0.451 on the logarithmic regression line confirms the
hypothesis. With 184 degrees of freedom a value of F = 46.8 was found. As
expected the slope was negative with P = 1.1 x 10-10. The P
value of the intercept was 1.1 x 10-18.
Discussion of Findings:
The hypothesis is accepted, with strong indications that effectivity
decays logarithmically rather than linearly. This hypothesis is also
supported by style guides for printed text
(Zinsser 1980, 111),
(Strunk and White 1979, 15) and for the WWW
(Nielsen 2000, 110 et.seq.). Long paragraphs are problem-laden when
reading from a screen: scrolling may be required, especially when small
displays are used and when the user has the font size increased to
compensate for poor eyesight. When the user has set the window to single
column width, even moderate length paragraphs may need to be scrolled.
5.1.6.6 Proposition C1: Products similar to DocReview will emerge and will, by similarity, validate the design.
At least four other web-based annotation products have been put into service. One of these (Third Voice) was forced to withdraw after it was subjected to numerous lawsuits centered on copyright issues, specifically allowing anyone to copy any publicly available web page on someone else's web site for annotation.
Since DocReview's debut in 1995, three similar products have emerged: Living Documents in 1998, PageSeeder in 2000, and QuickTopic in 2001. The four products may be compared on a set of core features. The core features are: notification service, in-line commentary option, security, segmentation flexibility, comments on comments, general comments, and review all comments.
Operationalization:
The three products are compared on a set of core features.
A DocReview demo may be used at http://faculty.washington.edu/~bkn/DocReview/review.cgi?name=DrDemo.
Several Interactive Papers may be examined at http://lrsdb.ed.uiuc.edu:591/ipp/.
A Document Review may be examined at http://www.quicktopic.com/6/D/QXx3sZA2kptQpnq9Rqwv.html.
A PageSeeder demo may be used at http://ps.pageseeder.com/ps/ps/demos/tryit/choco/choco.pshtml.
|
DocReview
|
Interactive
Papers
|
QuickTopic
Document Review
|
PageSeeder
|
Notification Service | Yes | No | Yes | Yes |
In-line Commentary | Yes, click for alternative format. | Yes, by request. | No | Yes, no other alternative format. |
Security | Yes, your server. | Yes, your server. | By obscure URL. | Yes, commercial service. |
Segmentation Flexibility | Yes | No | No, paragraphs and list elements only. | No, chunks only. |
Comments on Comments | No, by design. | Yes, three deep. | No, by design. | Yes, unlimited. |
General Comments | Yes | Yes | Yes | No |
Review all comments | Yes | No | Yes | No |
Discussion of Findings:
DocReview's design has been validated by the similarity of several
commercial and academic products that were developed in the five years
following DocReview's original release.
5.1.6.7 Proposition D1: Higher quality documents will attract more participation.
Document quality may be categorized on an ordinal scale. Degree of completion on a scale from conceptual sketches to completed canonical documents. We have categorized the documents on a five-valued quality scale (see §5.1.2).
Operationalization:
Participation is considered equivalent
to effectivity and is operationalized as the ratio of the sum of comment
size to the size of the base document. There were three document types
represented: types 2, 3, and 4.
Data Conditioning:
DocReviews without comments were
discarded. A DocReview with segments containing graphics was excluded due
to the low word count in the segments, and the heavy annotation of those
segments.
Data analysis:
|
n
|
total words
in documents
|
total words
in commentary
|
effectivity
|
Type 2 | 10 | 1302 | 696 | 0.535 |
Type 3 | 58 | 27636 | 3181 | 0.115 |
Type 3 w/o minutes | 8 | 4433 | 909 | 0.205 |
Type 4 | 10 | 8581 | 2914 | 0.340 |
All Types | 78 | 37519 | 6791 | 0.181 |
The DocReviews that received comments were analyzed and two columns were produced by database query: size of the base document and the summation of the size of the commentary on the DocReview. This table was imported into the spreadsheet. For each DocReview, the size of the commentary was divided by the size of the base document to yield effectivity. A column was created for the effectivity. An XY scattergram was produced with segment size on the X-axis and effectivity on the Y-axis. Five effectivity distributions were studied: all DocReviews by document type, meeting minutes (most of the type 3 documents), and all DocReviews less the meeting minutes.
Studying the distributions of the three types shows three very distinct populations, type 2 with very strong logarithmic decay of effectivity with increasing base document size, type 3 documents with a very low effectivity and an almost random distribution (see Figure VI), and type 4 with logarithmic decay of effectivity. Considering the strong (R2 = 0.4416) logarithmic decay of effectivity with increasing base document size seen in Proposition B1 (see §5.1.6.3), the nature of type 3 documents needs to be examined more closely.
Type
|
df
|
F
|
Pslope
|
Pintercept
|
R
|
Std
Err
|
2 | 9 | 22.2 | 0.0015 | 3.3x10-8 | 0.858 | 0.593 |
3 | 57 | 0.001 | 0.966 | 0.644 | 0.0057 | 0.117 |
4 | 9 | 8.72 | 0.018 | 0.013 | 0.722 | 0.658 |
Type 3 documents are working drafts, in the data examined here either position papers submitted for a workshop or minutes of weekly group meetings. Meeting minutes are a highly stable and consistent genre that does not attract much discussion, unless discussion topics were not reported or were reported incorrectly. All the meeting minutes were consistently formatted and prepared by only three people. They were separated from the position papers and examined and the effectivity was found to be essentially randomly distributed (R = 0.05) with respect to document length (see Figure VII). With 49 degrees of freedom a value of F = 1.21 was found. The slope was positive with P = 0.730. The P value of the intercept was 0.033.
Based on the finding that meeting minutes formed an essentially random cluster of data points that was well distributed at the knee (document size 200-800) of the logarithmic regression line, it was decided to plot all DocReviews except the meeting minutes. This distribution contains documents (n = 28) that are more likely to stimulate substantive dialog.
A correlation of 0.714 on the logarithmic regression line confirms a strong negative logarithmic relationship. With 27 degrees of freedom a value of F = 27.1 was found. As expected the slope was negative with P = 1.98 x 10-5. The P value of the intercept was 1.7 x 10-6.
Discussion of Findings:
The hypothesis is soundly
rejected. It is clear that less finished documents attract more
participation than do more polished documents. This is likely due to the
presence of more opportunities for change through collaborative
critique.
5.1.6.8 Proposition D2: The nature of social commentary will vary with the type of document.
It is expected that the more formal nature of higher quality documents will evoke a more formal commentary as opposed to the informal and preliminary nature of the less mature documents.
Operationalization:
The social character of the comments is
operationalized as the distribution of the Bales codes categories for each
of the document types. The Bales Interaction Process Analysis categorizes
all speech acts, including gestures, into twelve codes. Many of the Bales
codes are specific to face-to-face dialog, so we must eliminate those
codes in order to make a comparison. Bales grouped the twelve codes into
four categories that are generic and form a good basis of comparison.
These categories are: Social-emotive area: positive (positive
reactions), Task area: positive (problem-solving attempts), Task area:
negative (questions), and Social-emotive area: negative (negative
reactions). The central two categories are further generalized into a
supercategory of the task area, while the extremes are generalized into
the social-emotive area.
For each of the four Bales categories, the percentages of commentary codes by document type (n = 3) are graphed.
Data conditioning:
None.
Data Analysis:
The Bales category distributions of
DocReview annotations by document type demonstrate that the annotations
are almost never negative reactions. The annotations that show positive
reactions are more often directed to the more finished documents (type 4)
than to the working and rough drafts (types 3 and 2). Questions are asked
over twice as often in type 2 (rough) documents as in type 4 (finished
documents).
We find that the null hypothesis that there will be no difference in the Bales category distribution between document types can be rejected. With six degrees of freedom, Chi squared = 46.5. This result is significant at <0.000001.
Discussion of Findings:
Finished documents are viewed more
positively than rough documents in DocReview. Most commentary is directed
toward problem solving.
5.1.6.9 Proposition D3: The nature of substantive commentary will vary with the type of document.
High quality documents such as Research Web Essays (type 4) will attract relatively few negative comments, just because the documents are likely to contain few errors and omissions. On the other hand speculative documents (type 2) are likely to attract negative commentary due to their incomplete and unfinished nature. Working documents are likely to occupy an intermediate position.
Operationalization:
The substantive character of the
comments is operationalized as the distribution of the Meyers
structurational argumentation codes categories for each of the document
types.
Data conditioning:
None.
Data Analysis:
Of interest is the distribution of
reinforcer percentages among the types of DocReviews. The more polished
(Types 3 and 4) documents draw over twice the percentage of reinforcers
than do the rough (Type 2) documents. This is distribution is inversely
mirrored, weakly, by a corresponding presence of a lower percentage of
promptors in the polished documents as compared to the rough documents.
We find that the null hypothesis that there will be no difference in the Meyers Argumentation Code category distribution among the document types can be rejected, but only very weakly. With four degrees of freedom, Chi squared = 3.92. This result is significant only at <0.5.
Discussion of Findings:
The distribution of argumentation
categories is only weakly contingent on document type. There are
indications that polished documents will attract more agreement and
somewhat fewer objections than rough documents.
5.1.6.10 Other Findings
Exponential decay of multiple
comments is seen. The regression line shows a correlation of 0.941 for
classes of comment counts, 0 to 6.
The substantive nature of dialog in DocReviews [prop A2] is very concentrated in constructive disagreements with the statements in the DocReview. Conversely, agreements are much less frequent than in face-to-face dialog. Most of these agreements include amplifications. This finding reinforces the similar findings in the study of the social nature of the dialog [prop A1].
Findings related to the size of the base document and the segment size found that the effectivity of the DocReview decays logarithmically with increasing base document size [prop B1]. Commentary size is directly, but not strongly, proportional to segment size [prop B2]. The effectivity of a review segment shows logarithmic decline with increasing segment length [prop B3]. This finding indicates that the document segmentation strategy should avoid long segments.
Analysis of the descriptive statistics on the document size shows that
the length of annotations is significantly longer in more finished
documents (type 4), perhaps reflecting the willingness to spend more time
on "serious" documents, and shortest in working documents (type 3).
Annotations on rough documents (type 2) fall into an intermediate length
class, perhaps because they need more work to bring them to acceptable
quality.
Comparing DocReview to roughly comparable products shows
that no important features were overlooked in DocReview, though no product
has implemented the features just as DocReview has [prop C1]. This
convergence of design demonstrates that DocReview's design is in the
mainstream. The differences in design implementation are largely due to
differences in audience and commercial aspirations.
An attempt to measure the effect of base document quality on the effectivity (the ratio of words of commentary to words in the base document) of the DocReview found [prop D1] that (with exceptions) effectivity of documents declined with increasing quality, corroborating the findings of prop B1. Measuring the effect of base document quality on the social nature of the dialog showed comparable distributions among the Bales categories [prop D2] in all document types. The minor differences speak perhaps more to the consistent categorization of documents than to the significance of the differences. In the case of substantive dialog (Meyers codes), similar comparable distributions were seen [prop D3]; however there was an apparent, but insignificant, increase in agreements (reinforcers) with increasing quality. A corresponding decrease in objections was also seen.