UW STUDENT RATINGS RESEARCH
Supplement to UW's December 4 Press Release
Colleges and universities rely on student ratings of instruction as the
primary method for
evaluating teaching effectiveness. University of Washington (UW) research
that reveals
substantial problems in student ratings measures was reported by Anthony G. Greenwald and
Gerald M.
Gillmore in the December 1997 issue
of Journal of Educational Psychology, published by
the American Psychological Association.
WHAT DID WE FIND? (A Disturbing Result)
It was no surprise for Greenwald & Gillmore to find that high-graded
courses (ones in which
students expected their highest grades) were also ones that they gave high
ratings to. This has
also been reported by many others. However, it was surprising to discover
that the high-graded
courses were also ones for which students reported doing least work.
As Greenwald and
Gillmore explain, this finding has disturbing implications regarding the
impact of student ratings
measures on the educational process.
Greenwald and Gillmore used novel measures and analyses in seeking to
understand why high
work demands and low course ratings go hand in hand with strict grading.
Their answer centers
on variations in professors' goals for their courses. On the one hand there
are professors whose
chief aim is for students to perform well; these professors are likely to set
the amount of course
material at a level that permits most students to achieve a high level of
mastery. On the other
hand there are professors who seek to maximize the content coverage in their
courses. Professors
of the first type can give relatively high grades and their students
(especially the brighter ones)
need not work very hard. Professors of the second type, by contrast, must
challenge students,
which means making it relatively difficult to get high grades -- if it is too
easy to get a high grade,
students may skip some assigned work and will not learn as much as the
professor hopes.
It is of course too simple to think that all professors are one of these
two types. Nevertheless, the
UW data did reveal that professors vary considerably in the amount of work
they expect of
students. The professor's workload expectation, in turn, becomes a central
factor in shaping the
course's grading scheme -- the greater the goal for student work, the stricter
the grading must be
in order to assure that students will do the work.
Data obtained recently at UW indicate that instructors resembling the
second type -- ones who
appear to subscribe to a 'no pain, no gain' theory -- are found especially in
math and science
courses. For many math and science courses students report on their ratings
surveys that they
expect low grades and that they do a lot of work for the course. Although
students may rate
these courses and instructors as 'good', ratings for these courses
nevertheless often fall well below
the average for the university as a whole.
WHAT DOES IT MEAN? (Some Undesirable Consequences)
When difficult courses get relatively low ratings, some undesirable thing
can happen:
1. New instructors with high standards may be discouraged.
New teachers may expect
students to work as hard as they themselves did as undergraduates. These new
instructors all too
soon discover that their students find the workload excessive and the grading
standards
unreasonable. The low ratings that are likely to result can be quite
discouraging.
2. Students may avoid math and science courses.
University-level math and science courses
have the reputation of being ones in which even good students can get low
grades. Is it surprising
that many students plan their undergraduate programs with few or no math and
science courses?
3. Higher education becomes education lite. When
instructors are discouraged from teaching
challenging courses and students gravitate toward less demanding, high-grading
courses, there
will necessarily be a reduction both of the number of demanding courses
available and of the
enrollments of such courses that survive. These trends can be described as a
'dumbing down' of
higher education or as the evolution of higher education 'lite'. (The 'lite'
label is borrowed from
Mark Edmunson, writing in the September, 1997 issue of Harper's Magazine, pp
39-49.)
4. Grades creep up. Although grade inflation appears to
go hand in hand with the lite-ening of
higher education, grade inflation may not itself be a cause for concern. If
gradually increasing
grades went together with gradually increasing educational content, the upward
movement of
grades might be taken as a positive sign. However, actuality may be just the
reverse -- gradually
increasing grades appear to be associated with gradually decreasing
educational content.
WHAT CAN BE DONE? (Repairing the Student Ratings System)
Despite having some problems, student ratings have two very attractive
features. First, they are
easy to administer. Second, they provide a simple numerical index. (No
matter that the index is
questionable as a measure of quality of instruction.) With these two very
desirable properties, it
makes more sense to repair the student ratings system than to abandon it. Two
types of changes
will improve the use of ratings. First, ratings should be improved by use of
statistical adjustments.
Second, the users of ratings should become better educated about what student
ratings can and do
measure.
Statistical adjustment. Some well known numerical
indexes are used with statistical
adjustments to correct for extraneous influences. Some examples: (a) The US
monthly
unemployment index is statistically adjusted to correct for seasonal
employment fluctuations in
industries such as tourism, construction, and agriculture; without that
correction it would be
inappropriate to directly compare index values for winter and summer months.
(b) IQ measures
are statistically adjusted by correcting raw performance scores for the test
taker's chronological
age; without that correction it would be inappropriate to compare IQs of
people of different ages.
(c) Computerized rankings of college football teams in the US are
statistically adjusted by
correcting won-lost records for difficulty of the opposing teams played;
without that correction it
would be inappropriate to compare teams in different conferences. In the same
fashion, student
ratings of instruction can be statistically adjusted in order to correct for
unwanted influences of
grading policy (higher grades produce higher ratings), class size (larger
classes get lower ratings),
and potentially other unwanted influences. Such adjustments are now beginning
to be used at
University of Washington.
Interpreting ratings. Quality of instruction in a
course has two components: (a) productivity --
how much students learn from the course, and (b) satisfaction -- how
much students enjoy taking
the course. These two types of outcome (productivity and satisfaction) are
found in many
situations. In industry, managers are evaluated in terms of both how
productive and how happy
their subordinates are. In medicine, doctors are evaluted both for
effectiveness of treatment and
for bedside manner (which translates to patient satisfaction). In
professional sports, managers are
evaluated both for the team's won-and-lost record and for their players'
morale.
Productivity may appear to be the bottom line, but satisfaction is also
important -- in part because
satisfaction often affects productivity. For example, workers who don't enjoy
their jobs may quit,
patients who dislike the doctor's bedside manner may not show up for
appointments, and athletes
with low morale may perform below their peak levels. Students' enjoyment of a
course is
important in much the same fashion. Students who don't enjoy a course may
lose interest either in
the course's subject matter or, worse, in education generally.
It is obviously desirable for student ratings to measure both productivity
and enjoyment.
However, present-day ratings methods do a much better job of measuring
satisfaction than
productivity. Student ratings surveys may try to measure productivity, for
example by including
questions such as, "How much did you learn from the course?" However, the use
of such
questions (a) presumes (dubiously) that students are competent to assess what
they have learned
and (b) overlooks much evidence that responses to such questions are distorted
(by influences
known technically as halo effects and self-serving attributions).
Greenwald and Gillmore participate in UW's Faculty Council on Instructional Quality and maintain a continuing research program directed at improving student ratings as measures of instructional quality.