Webster Criteria-based Grading

Making it Public:

Criteria-based grading to ease the pain of grading, build stronger student self-assessment and enable more effective peer review

Though one could imagine many different grading systems, two dominate classroom practice in American education. The first is “holistic”—far and away the most common mode of grading. It assigns a single grade, either a number or a letter, to a given piece of work. The second, and much less commonly practiced, is “primary trait” grading. This mode defines a set of “primary traits” or characteristics to be assessed, and then assigns a set of grades, one for each of the defined traits. I have come to use both systems in a way that seems to me to take advantage of the strengths of each.

Though I do not grade first drafts of work submitted, and though I give holistic number grades to revised drafts, for both I also give primary trait scores, using a set of criteria (see the “Six Criteria for Good College Writing”) together with a numerical system in which I assign a rating of 1 to 5 to each of the six “characteristics” my sheet defines. One is low, five is high; for each criterion they correspond to the following descriptions:

1 Not yet enough sense of this category to be functional in college level work. (e.g., a paper without a conceptual center, or with a center so hidden or unfocused as to leave the paper essentially without one.)

2 Some sense of what the category is, but not much more (e.g., a paper that notices words, but only points to them without showing how the work is affected by those words.

3 Functional success with this category, but not yet showing full control. (e.g., a paper with a conceptual center, but one which is confused, or misplaced, or split between two or more other potential centers.)

4 Functional success with this category, with only minor problems. (e.g., a paper with a conceptual center, usefully deployed and responsive to the assignment, but not fully successful in that it might, for example, not be fully developed, or it simply might not be particularly compelling.)

5 Full success with this category. (e.g., a paper with a strong, clear, interesting, and relevant conceptual center.)

I tell students that these numbers do not correspond exactly to grades, though I also tell them that they certainly do have a definite relation to them. If students write a paper in which center and focus are poor, their grade will be lower than if those two characteristics were clear and compelling. That much ought to be obvious.

I find this system helpful in several ways, each of which saves me time, and yet also keeps my comments specific and practical. First, it allows me to avoid writing some things over and over, while it nevertheless keeps the central writing goals of the course constantly before the class. Second, the fact that I’ve already given a skeletal evaluation of each of these different dimensions of the paper frees me to concentrate my written commentary on the paper’s two or three most pressing problems. Having given at least a minimal opinion about each of the Criteria, I need not fear that I have neglected the whole. Third, the very brevity of my numbers will later give me a good way to begin a student conference, since I can ask students to respond to my estimation. If “Fullness” got a “1,” for example, I can ask them if they can describe why; if it got a “4,” I ask the same question. Since the mere number doesn’t represent a complete description of how well their papers fulfilled each of the criteria, in responding to my question students can’t be feeling as if they are merely giving my comments back to me. We can have a conversation instead.

The system’s main usefulness, however, has been its capacity to make my grading process a public procedure, and thereby establish a rhetorical relation between me and my students which is far more useful and productive than what more private grading systems create. Again, this is something I’ve learned from experience. In the past, I have often been frustrated in my assigning of grades by the development of a particular sort of “me” vs. “them” way of talking. I don’t think that sort of oppositional relationship was unique to my classes; indeed, I think it is probably the norm in many writing classes. But it certainly didn’t seem productive. For once that dichotomy was established, students frequently stopped seeing me as a helpful resource. Instead I became an antagonist or, at the very least, a roadblock to students’ success in the class, and much of my students’ energies often then seemed to shift to the task of manipulating me in order to lobby for higher grades.

That was not fun, and neither was it good either for me or for the students. For I ended up either playing the tough guy and having to deal with the resentment and general unfriendliness of those students who had a passive aggressive streak, or yielding ground in ways which made me uncomfortable because I was assigning grades I knew to be too high, and only because I felt that I couldn’t avoid damaging teacher-student relations unless I did so. In a word, I was not managing grades—grades were managing me.

My solution to this problem has been to break out of that “me-them” dynamic, and to replace it with a “criteria-them” dynamic, in which my chief role is that of referee between students on the one hand, and a set of goals describing what my students are to be learning in my course on the other. Which is to say, in order to avoid my becoming the problem which students must solve to do well in the course, I now offer them my primary trait Criteria instead. Thus if students in my class want to lobby me for a better grade, I tell them that they don’t have to work on me, because I’m not the problem. Rather, they have to deal with whichever of the skills their criteria scores show them still to be weak in.

In this respect, my courses have (happily) become much more like courses in other disciplines. Few math teachers, for example, get lobbied in the way students used to work on me, because math students’ work either shows they can do the problems, or it shows that they can’t. To be sure, “writing” is considerably less quantifiable than is math or chemistry—no doubt that’s one reason I like to do it. But even if the subject is hard to define, I can still come up with relatively specific goals. I don’t claim that my Criteria are the only criteria a writing class could use, nor do I claim much originality for the list. But because they do define characteristics central to assignments I make, I work hard to make sure that every student understands both what those goals mean for the papers they write, and how well each student is doing in efforts to master them.

In short, the real benefit of public, external standards is that one’s role as teacher becomes more clearly that of a third-person coach (the person who can help students make progress towards success at meeting general, public standards) and monitor (the person who supplies a disinterested but in fact terribly helpful progress report on how well students are doing in their working towards the acquisition of certain writing-thinking skills). In either case, the effect is to shift the usual writing class grading rhetoric away from that of student vs teacher, to one in which the teacher is merely a third party whose job it is to help students see how they currently stand as writers in relation to the larger world around them.

Developing Grades from Criteria scores

Although I explain to students that I see no exact correspondence between the Six Criteria numbers and the grades I assign, I do indeed use my numbers to establish the basic range for my grading. Indeed, I’d undermine my whole approach if I didn’t. Since I claim that those criteria really do matter, if I were to lose track of how any student’s paper stacks up against the writing goals I’ve set out for the course, I’d simply be sending the message that I don’t mean what I say, and that those goals don’t really matter after all.

So to establish a base range for my grade, I add my figures together and use the resulting total. That range set, I then feel free to adjust the grade up or down the odd tenths of a point in order to accommodate better my holistic sense of the paper (i.e., to account for the fact that the whole is sometimes more or less than the sum of its parts), or to accommodate other criteria which seem to me to be important at that point in the student’s development. But again, although I do allow myself maneuvering room, it is nevertheless obvious that if I say that I value a paper which measures up well to the a certain set of criteria, then the grade I assign had better honor that claim.

Sometimes, by the way, I will go through this process (which is quite rapid once you’ve grown accustomed to it), and find myself with a lower or higher grade for a particular paper than my first, holistic take on the paper would have produced. But what does that really mean? Usually it means that for one reason or another my first response to a paper has led me away from my goals altogether. I like the sentences, or the energetic listing of things, and so I tend not to notice that the paper has no center, no focus. Or, conversely, I DON’T like the surfaces, or even the particular argument the student is making. And so I think poorly of the paper until I’m forced by my criteria to think again, to grant that the student has done well with a center, has effectively noticed lots of detail, and has actually argued at least well enough to make me quarrelsome. But whether my holistic instinct is higher or lower than my criteria-based scores, the discrepancy tends to guard against my being either seduced or offended by someone’s prose. By keeping my criteria public and consistently present not just to my students but to myself, I’ve become more consistent and fair as a grader.

Peer Review and Self-Assessment

Two other virtues of the system have to do with peer review and with the oh-so-under- achieved-goal of building student self-assessment skills. Peer review is often problematic because peers don’t fully understand the criteria they are asked to read by. With this system not only are the criteria more fully defined, I can also do an in-class criteria-norming session. I can hand out two sample papers, and then have students read them both and do their best to apply criteria scores. Students do this overnight, and then in class I put them in groups of 3 or 4 and ask them for paper 1 to come to a consensus as a group as to what score each criterion would get. At the next class session I can then lead a discussion about their answers. I especially look for gaps, or outliers, for then I can ask a group who gave Fullness a 5 (while everyone else gave a 2) what they saw in the paper to justify that score. I then ask the 2’s to speak to the same question. We then have a discussion of the situation, and a whole class consensus exercise. There actually are real characteristics of papers that can be identified and assessed—it isn’t all (as many students have come to believe) a big crap shoot.

We repeat all this with Paper 2, asking them first to rethink the scores they have already given it. I then put each group consensus on the board and look to see how coherent the pattern is, and usually there is no more than a single digit’s difference—a four instead of a three—from one group or two. One point’s difference, I tell them, is really pretty small given the complicated nature of either writing and reading.

This process allows me to make three points dear to my heart:

• First, my grades are not mysterious outputs of a kind of Potterian sorting hat;
• Second, students now have something clearer than they had before with which to read each other’s work; and
• Third, to have done this exercise successfully means they have made progress building within themselves an ability both to assess their own strengths and to guide their future work.

Peer Review Basics

Traditionally, students write papers, and faculty grade papers. That system has worked for a very long time, but it has two drawbacks. The first is that faculty can only grade so much, and thus are limited in the amount of writing they can ask students to do. And the second is that the whole process is not actually always good for the students it is aimed to help. Many feel unclear, even mystified about why they are writing or how to improve as well as fearful about getting response. That fear can lead in turn to ignoring comments or even to growing angry and frustrated with the faculty’s grading.

One solution to this set of problems is to use peer review to involve students in the reading and evaluating of papers. Unfortunately, however, peer review has often won a bad name for itself. Faculty complain that students don’t really “review” so much as skim and praise, and students who have been asked to be peer reviewed may complain that the comments they received were useless. From both sides the verdict has sometimes been that peer review is simply a waste of time.

My view, however, is that when peer review does not work it is usually a case of bad peer review—a process in which students have not learned how to be observant and helpful, and faculty have not known how to change that. Good peer review can in fact do wonders for both students and faculty—even if it won’t make your work load zero or make students learn better writing habits at warp speed.

The keys to good peer review include:

1. Effective criteria
2. Full understanding by students of what the criteria mean
3. A peer norming exercise in applying criteria intelligently and helpfully

I’ve had very good experience using peer review, but only when I’ve used peer norming as well. Properly done, a peer norming exercise sets a safe environment for students to try out making and discussing informed grade-like judgments of their own, and offers faculty a way to demonstrate that grades are not just a matter of “opinion.”

Among peer review and peer norming’s potential virtues:

• When students know other students will be reading their work, they very often are more careful about what they hand in.

• It’s a way of sharing the paper load—especially with low stakes writing and with drafts of high stakes work. In some circumstances peer review can provide all the feedback necessary to validate an assignment; in others your commenting can be more efficient because you have one or more student comments to draft off.

• Having to read other student work on an assignment they themselves have also written helps students develop the capacity to read their own work critically. That is not easy for them. Criteria are lifeless and formulaic until they are used and tested and talked about. A norming exercise of some sort helps that happen.

• When students see how others solved the same assignment problem they have solved, they often both increase their understandings of the course material that was central to the writing and give themselves something of a contrastive template with which to return to their own.

Peer Review—three ways to do it.

So if you are willing to give it a chance, here are three of the many, many ways to organize a peer review.

I. Peer Norming. The steps?

• One. Share your assignment criteria with students, using examples.

• Two. Either as an overnight or as an in-class assignment, have students read the sample to be normed, and assign a number score (you can use word-labels if that works better for you—like Very Strong, Strong, Functional, Still Working on it, Not yet Functional) for each of the criteria.

• Three. Put students in groups of 3-4 and have them come to consensus about their number scores. Tell them that if they don’t agree then each of them should locate in the paper the specific reasons they had for assigning the score they did. Those conversations are very important, and they prepare students for the next step.

• Four. Write the group scores on the board, and look for discrepancies. Locate the big discrepancies first, and ask the groups to explain what they saw in the paper that led them to the decision they made. This offers you the chance to help them see better, to avoid mistaking weak evidence for strong—whatever comes up—but it does so by modeling reasoned and careful judgment.

• Five. Look at the criteria scores where people agreed, too. Ask them about these. If everyone agreed on a three, for example, you can play devil’s advocate and ask why they didn’t assign it a one, say, or a four. Again, the point is to have them articulate their reasons for assigning value as a way to increase their critical understanding of the kinds of thinking you are asking them to do.

This exercise can be done first with a model paper from other sections, or with excerpts you yourself have written. Then you move on to using class examples. When I use examples from the class, I avoid weak papers. That would be embarrassing to the student; it might also give the exercise a negative tone. Given how negatively many students think about their writing in the first place, there’s no reason to reinforce that here. I avoid very strong papers, too, since I want to have students talking about ways something good can be made better. So I choose a paper that has both strengths and some things that can be profitably improved. If I have time I’ll use two with contrastive strengths. I also write the student a note first either asking if it’s ok to use his or her paper (unless this is a writing class where everyone knows they’ll be expected to share their work), and explain that this will give them terrific feedback on how to make their already strong paper stronger. I explain that the exercise will be done anonymously. Most students are fine with this—most in fact are actually flattered by the attention.

I usually budget from 50 minutes to an hour for this exercise, and I have learned NOT to rush it. That seems a lot of time, but believe me, this hour can be extraordinarily effective in helping students actually understand what you want them to do—and (depending on how fully integrated your assignment is into the whole course) often not just on the single paper, but also in the course as a whole.

Last bit of advice: Once your students have learned to apply your criteria, you can ask them to use these criteria to pre-grade their own work. If you assign two papers, they can learn your criteria on the first and then use them on their second. I do this by having them write on the back of their paper before turning it in. I’ll ask two questions, and then ask them to give themselves criteria scores. The questions are usually: What do you think you did particularly well in this paper? And: What did you find to be the most challenging part of writing this paper? I find that reading these responses is easy and quick, and it eases my commenting by making it possible to begin by responding briefly to what they have written.

II. The Quick Exchange. (To make this one work you should be sure to let them know ahead of time what you are going to ask them to do.) Professor Jan Sjâvik (Department of Scandinavian Studies) worked this out as a method for dealing with a series of seven short in-class writing assignments in a class of 68 students. His purpose in giving these short writes was to offer students the opportunity to write low stakes trial-runs of possible examination questions:

“I prepared and showed overhead slides with individual questions, of which the following is a sample:

Is there really any truth in Ludvig Holver’s play Erasmus Montanus?
Should the old Norse poem “The Lay of Thrym” be read as a story about a victory over
evil, or as a story about genocide?
What is the author’s purpose in Alexander Kielland, “At the Ball?”
Do you see any problems with creating a coherent interpretation of Jonas Lie, “The
Cormorants of Andvaer”?

“The students wrote for seven minutes, then traded papers with someone sitting close to them and provided peer feedback. By instruction, the feedback was to call attention to one positive aspect of the paper and to mention one thing about it that could be improved, but only if the suggestion would be stated in a positive manner. The students were given three minutes to provide the peer feedback. At the end of the term the students were asked to submit their in-class writing assignments as a portfolio, which counted for five percent of the course grade. Students who had missed one or more assignments were allowed to submit make-up work.

“In slightly different form, … several of the topics for the brief writing assignments in class showed up on the midterm and final examinations. The low-stakes writing clearly worked as intended, for the quality of the exam essays was much higher than what would be expected according to past experience.”

III. The Read Around. Have everyone bring in their drafts (whether for rewrite or to turn in) (I ask for two copies so I can keep one on file). Put one copy of all the papers in a stack in the middle of the room, or on the table up front. Ask each person to take a paper, and read it. No marginal comments. Then ask that when they finish reading the paper they write two comments: first, a noticing of something that seemed strong—that means something specific and connected to the criteria. Second, something they didn’t follow or understand or find convincing. No corrections, no instructions—only description. I then ask everyone to sign their comment.

Depending on the length of their papers, students will need 10 to15 minutes per paper. When they have finished the first paper, they then return that paper to the stack and take another. (I sometimes have to give out a couple of my second stack to keep people reading.) They repeat this for 45 minutes to an hour.

At the end of the allotted time, every paper will have had two or three or sometimes four readings, with comments. Students will have seen a range of other student work, will have had some practice applying criteria, and will also have seen other students’ comments as they read their second and third papers.

I then often close the exercise by asking them to write (on a separate piece of paper) for five minutes about the experience. I’ll help them by asking questions like: What surprised you most in the papers you read? What was the greatest strength in the papers you read? What did you think people had most trouble doing? What piece of advice would you give the class as a whole about how they could produce stronger writing next time?

A special bonus of this enterprise is that many students realize for the first time how hard it is to read a set of papers. Having taken 45 minutes to an hour to read three or four papers they end up with some appreciation of a part of your job that is otherwise invisible.

As for my own next steps, if these are final drafts I will read the comments along with the paper—and put question marks by those I don’t follow or agree with, and comment on the comments when appropriate.

My students have liked this exercise a lot, especially when preceded by a criteria norming session to help them understand how to recognize strengths and weaknesses in a paper in the first place. Indeed, my most recent use of this exercise was with a set of drafts that students had been asked to revise, and almost exactly half of the 34 person class ranked the Read Around as the single most helpful thing we did to enable them to rewrite effectively.