In her latest post on Learning Curves, Rudbeckia Hirta describes two methods of grading:

Around here there are two schools of thought for grading calculus classes: straight percentages or curving the grades. I favor the former with each letter grade representing ten percentage points. In my class, 90%+ is an A, at least 80% but less than 90% is a B, etc.

I had been arguing with a colleague about which system is better, mine (percentages) or his (curve). The way he assigns grades is to set the mean to the boundary between B and C and then to allow one letter grade per standard deviation. We have been arguing about this all semester. I am convinced that my way is better.

I think they’re both wrong, though her way is better than his. Grading on a curve is destructive and counter-productive. It sets students up in competition with each other, it discourages cooperative study groups (which are known to be exceptionally effective tools for learning mathematics), and it rewards an entire class for not studying. We’ve all had the experience of teaching such a class, just as we’ve all had the experience of teaching a class where almost everyone is motivated and successful. Grades shouldn’t be adjusted up for the first class and down for the second, which is what a curve does. If anything, an inverse curve would make more sense: boost the grades for the class that is working hard, lower them for the class that isn’t.

The counter-argument is usually that a test might turn out to be too long or too difficult, and you don’t want to penalize the students for the teacher’s mistake in judgment. Indeed we do often misjudge length or difficulty, and some compensation is appropriate. But I don’t think a curve is the right solution, since all the objections in the previous paragraph still apply. The correct solution is… well, let’s first look at why I think Hirta’s method of grading is wrong (even if better than grading on a curve), and then we’ll see a common solution to both problems.

Straight percentage grading sounds fair and equitable. It doesn’t have all those negative effects of grading on a curve. But it also assumes that the teacher has somehow managed to control the level of difficulty of the test so precisely that an 80% really does reflect the lowest level of competence (assuming that an A represents excellence, a B competence, and so forth). It discourages writing tests that contain truly challenging problems, since the grades would then be artificially low. It also makes it difficult to decide how to assign points for partial credit, since three points on a four-point problem is automatically below B-level.

The solution to both grading schemes is to adopt a third alternative, which is what I’ve been doing for the past six years. After a test is written, I make a professional judgment about its level of difficulty and I assess the likely cutoff point between excellent and competent work (the lowest A), as well as the likely cutoff point between competent and merely satisfactory work (the lowest B). These serve as anchors for a tentative scale. I then correct the tests, assigning numeric point values only, following a system learned in an Annenberg workshop: student work on a typical 4-point problem is assigned a 4 if it’s perfect or nearly so, a 3 if there’s an error or omission that’s small enough to be explained in a sticky note, a 2 if the student has demonstrated knowledge of how to solve a problem but has not successfully solved it, and a 1 if the student has demonstrated understanding of the problem but not knowledge of how to solve it. Note that a 3 on every problem would result in a 75% total score, so I usually aim for 75% as the lowest B.

But before I commit myself to the tentative scale, I look at a few sample papers that hover on either side of the two cutoffs described above, and I do the same for a proposed D/F boundary. This all leads me to a scale where grades truly convey what they’re supposed to convey, where hard work and cooperation are rewarded, and where a class that doesn’t study is appropriately penalized. The median might be an A, it might be a D. But whatever it turns out to be, at least it’s an honest reflection of the class’s performance. If a test does turn out to be too long or too difficult, there’s a de facto scale in the sense that the cutoffs for various grades will drop, as a student can successfully demonstrate competence with a grade lower than 75%. This means that the sole legitimate excuse for grading on a curve is still preserved, but the negative side-effects are removed. Does that make sense?

Categories: Teaching & Learning