On Grading Scales

The standard grading scale is base 10 and sets the pass/fail frontier at 60%.

Why? What’s magical about 60%? It seems pretty arbitrary to me. If there’s a meaningful logic to this, nobody’s ever told me what it is, and I’m yet to stumble across it on my own.

One of the things that’s always struck me as weird about this is when this scale is used to grade essays or term papers. I can make reasonable determinations of the difference between a 95% and an 85% because I can judge whether the student has done A work or B work. But what the hell is the difference between a 45% and a 35%? A majority of the scale is in the F range, so presumably we’re supposed to be able to divine the difference between a good F, an average F and a lousy F?

It’s not uncommon in some science courses for the pass/fail frontier to be much lower, say 50%, or even in some cases, so I’ve heard, 40%.

And then we translate those grades into a GPA scale, which is base 4, and the whole range from 0-59% gets compressed into the range of 0. Meanwhile, the range from 93-100 also gets compressed into the single integer 4.

My colleague uses the base 4 system for grading. This doesn’t work well with multiple choice, so he doesn’t use multiple choice. But it makes grading essays a lot easier, because instead of trying to discern between close values (4.4 o4 4.5 on a 5 point scale?) he just assigns whole numbers based on an assessment of the letter grade. The ease of grading this way should not be undervalued. Grading is simply excruciating, and I know factually I’m not the only academic who’s resorted to using Adderall to get through it.* Anything that reduces the time and energy involved in grading, while still being fair to students (i.e., I never actually have tried the bullseye method, despite threatening to do so) is a win-win.

So I adopted the base 4 system, although unlike my colleague I will use 2.5 and 3.5 instead of just whole numbers. I don’t, though, bother distinguishing between, say, a B+ and an A-. 3.5 is B+/A- and I figure it should all average out across multiple graded items (say I gave a 3 (B) on an answer that could have reasonably been called a 3.2 (B+) and a 3.5 on another that could have been a 3.7 (A-), and the cumulative score comes out the same when averaged together.**)

But I discovered a problem with that system; one that suggests the underlying value of the base 10, frontier 6, system. Last year I had a student who dropped out halfway through the term, despite having been doing quite well, and when calculating final grades I figured out that despite having not completed about half the work in the class, she still would have passed with a C-. In other words a student who did A work (4s) on half the assignments and did not even bother to turn in any of the others (0s) would earn a C (2). C is supposed to represent average work, and sure enough, 2 is the average of 4 and 0, but…but… Well, I’m not sure what the logical argument is against giving such a student a C. It just sticks in my craw, even if perhaps irrationally. It seems as though simply not completing assignments really ought to have greater downward pull than completing them really poorly.

So after a sitdown lunch with my more mathematically inclined brother, to bounce ideas off him (payback for all the basketballs he bounced off my head when we were kids) and get his feedback, next term I’m going to tweak the system, experimenting with a 6 point scale, where 6=A, 5=B, 4=C, 3=D, and 2, 1, 0=F. 0 will be reserved for non-submissions. 1 will be used for submissions that are utter bullshit, where it’s patently obvious the student has not one fishing clue what s/he’s talking about. 2 for a “normal” F, where it’s not complete bullshit, but is so bad in various ways (non-compliance with requirements, purely ideological instead of analytical, etc.) that it can’t really earn a passing grade. This gives non-submissions and pure bullshit greater gravity than just plain ol’ lousy work, provides some reasonably coherent justification for distinguishing among the various levels of F, and leaves the ease of grading passing work on a 4 point scale in place.

Will it work well? I’m experience enough at this point to not be overly optimistic that I’ll be wholly satisfied with it. Just because it looks logical a priori doesn’t mean that problems won’t show up with practical experience. But it’s a tinkering, rather than a radical change, so I have some optimism that the gains will outweigh any unforeseen negatives.

Any thoughts? On this scheme, or on grading schemes in general?

* Grading has actually become a serious quality of life issue for me. At its best, when students do well, it’s mind-numbingly repetitive. At its worst, it’s unbelievably frustrating and creates an overwhelming sense of failure that you’ve apparently been unable to teach your students anything at all. I so dread it, and have such a hard time focusing on it, that I tend to put it off. But knowing that it’s there, unavoidable, and increasing in amount while the time available for doing it is shrinking, stresses me out most amazingly. I lose sleep, become depressed, and get short-tempered with my family. Taking Adderall and plowing through it all in one or two days frees up multiple days from the stress and is much better for my family as well.

** In grad school I TAed for a prof who said he thought that an A-/B+ was as legitimate a grade as either an A- or a B+. He was right, but after doing that for a term I realized that the proliferation of grade categories, while conceptually legitimate, made the task harder while providing less clarity. By using 3.5 out of 4 as A-/B+ I am actually reducing the number of grade categories, because the A-/B+ is not in addition to A- and B+ but in substitution for them. And I haven’t noticed that the reduction in categories has reduced clarity in the meaning of grades.

About J@m3z Aitch

J@m3z Aitch is a two-bit college professor who'd rather be canoeing.
This entry was posted in The Democratic Process. Bookmark the permalink.

9 Responses to On Grading Scales

  1. Profclaus says:

    By weighting the assignments later in the semester as a larger percentage of the points possible in the course, you reduce the chance that the student who does A work early and drops off the face of the earth, will pass the class. In addition you reward the student who, “improves/tries harder” and has been able to understand the material cumulatively throughout the semester. I have put more of an emphasis on final exams because I was tired of student not putting effort into studying for the final and making sure they grasped the total of the class. One of my classes has the final exam worth 65% of the total for the class, but students can see how they are doing based on the other exams in the class, and the other assignments which are worth very few points in the grand scheme of things. I like the idea of the 6 point scale. I have always been upset by the 100/10 point grading scales. My general rule of thumb is that if you understood 50% of the material in the class then you can pass (D) and I build my scale off of that. Also the 100% scale means that PERFECTION is the only way to get 100. If I have six or seven typos or make a calculation mistake then I am already below the “A” range. This is a problem. It does take a while to educate students about the value of different grading schemes, however.

  2. At my Alma Mater, grading was done on a 4 point scale like this:
    A – 4
    AB – 3.5
    B – 3
    BC – 2.5

  3. J@m3z Aitch says:

    MRS, May I ask where that was?

    It does take a while to educate students about the value of different grading schemes, however
    Lord, yes, but then it takes a while to educate them on everything, doesn’t it? I’ve always been tempted to have a class where there are 73,627 points available, just to be wicked and to hammer home the point that the number of points an assignment is worth is, if not actually arbitrary, more nominal than real.

  4. University of Wisconsin – Madison; not sure if the scale applied to the whole UW system, or just the main campus, but I know every department on the Madison campus used it

  5. Jeremy Sell says:

    IMHO anything below 90% is cause for concern. Either you understand the material, or you’re merely one of various degrees of not “getting it.” If a student isn’t there to kick ass and take names, why bother.

  6. Matty says:

    A teacher of my acquaintance, school not university, once told me that he had been advised that to stop parents complaining about their little darlings low marks he should give marks out of ten then add 90 and pretend it was a percentage.

  7. AMW says:

    For essays I do the following in a spreadsheet:

    1. Rate each paper on a 10-point scale (fractional points are permissible). The rating is largely in comparison to all the other papers I’ve read so far.
    2. Create a table mapping each point value to a score on a percentage scale
    3. Fiddle with the percentage values until the average assigned percentage is 87%

    It gives me the average grade (and usually grade distribution) I’m looking for every time, it saves me the agony of thinking up a particular percentage for each paper, and the final product looks familiar to the students.

  8. Troublesome Frog says:

    IMHO anything below 90% is cause for concern. Either you understand the material, or you’re merely one of various degrees of not “getting it.”

    Doesn’t that depend entirely on how the exam is structured? My Signals and Systems instructor gave some brutal exams where the median was probably somewhere in the 30% range. They could find the limits of the best students and were often interesting learning experiences on the own. It seems like having an exam where too many people hit 100% is like using a 6 foot ruler to measure human height. You just end up losing a lot of data about the distribution when you hit the saturation value.

    One EE professor I knew said that over the years he’d found that grades in the lower divsion engineering classes were bimodal. You had a set of students with the skills and the motivation to “make it” and they were gaussian centered somewhere around the B range, and you had another smaller gaussian about the D range of students who, for various reasons, probably weren’t going to hack it. By upper division, it had normalized as people changed majors or left the university entirely and all that was left were people who were likely to finish the program.

  9. AMW says:

    I’ll chime in in support of T-Frog. On my exams I have two basic kinds of questions. The first type just test a student’s rote memory ability. They look just like problems I’ve given them on homework assignments, but with different values plugged into the problem. The second type of problem includes some kind of twist or tricky component to it that a student could figure out if he really knew the reasoning behind the models we use in class. Students who can only answer the first type of problem get less than 90%.

Comments are closed.