Proposal for a new two-axis grading system

I have often thought that the academic grading system should provide two characteristic numbers corresponding to two fundamental measures of information:

1) Comprehension (C) — What breadth of explanation, creativity, or ambition has the student expressed?

2) Percent Error (PE) — How much of the student’s expressed knowledge contradicts accepted knowledge?

This is how the proposed system would work. On a given test or assignment, a student who answers every question gets all the Comprehension points. Answer half the questions and get half the Comprehension points. Then, Percent Error is calculated based on number of questions answered and the number of mistakes.

A simple example would be a student who answers 4 out of 10 questions and 3 answers are correct (1 answer is incorrect). Using the two-axis grading system, the resulting scores would be C=40 and PE=25. This provides a point in a fundamentally meaningful two-dimensional space. For those who are familiar with information theory, this grading system measures the student’s Receiver Operating Characteristic (ROC).

This new system actually simplifies grading in many cases. Late work or missing answers get penalized in their C score, without affecting PE. Creative assignments become easier to grade, because the overall creativity or lack thereof goes into the C score without concern for technical flaws, which show up in PE. Classes with mixed levels of students can require students to show different levels of comprehension in order to achieve the same numerical score. Most importantly, it lets the student express directly how much he claims to know without encouraging Bogus Solutions (BS).

Teachers often ask “What if I have to give a single score? Won’t that score be the same as I would have given using the old system?” The answer is that you can reduce C and PE to a Single Score (SS), but the result will be different only if you grade on a curve. The formula is SS=(C/100)*(100-PE). If you curve, then the curve is only applied to C based on its mean value across all students. This method provides a better measure of information than curving SS based on the mean value of SS across all students.

5 Responses to “Proposal for a new two-axis grading system”

  1. Good thinking. This is basically a rubric for objective grading. Rubrics in the subjective world of essays have a similar easier-and-more-thorough effect on grading. A series of numbers can express a surprisingly nuanced feedback mechanism.

    As for how to record them with the only-one-score limitation, I found it convenient (and mathematically solvent) to record each column of my rubric as a separate score in my grade book.

    For instance, I would have “essay thoroughness”, “essay accuracy”, “essay thesis” and “essay clarity” as separate scores. Gradig software averages automatically, so the problem was solved.

    • Chris, it sounds like you are using principles that are often called “standards based grading” in the assessment literature. Proponents of that grading style offer feedback in many categories, but tend to avoid numerical scores. I think that this social trend is an unfortunate side effect of avoiding punishment in education. Punishing with numerical scores alone has been shown to be ineffective in several studies. However, I believe that itself is a side effect of combining the two fundamental measures of information that I describe in this article. Therefore, I suggest that teachers first divide the score into C and PE, and then break those down further into sub-categories as desired.

      In particular, consider placing thesis and thoroughness in the C category, and accuracy and clarity in the PE category. This makes it clear to students that no amount of thesis complexity or lengthy thoroughness really makes up for errors in accuracy or clarity. Similarly, a clear but overly simple thesis does not demonstrate comprehensive understanding.

    • Hmm. If one cannot make up for the other, how would final averages affect this plan? If I score poorly on an early assessment, does not a high score on a later one work (though not completely) to “make up for” the missing credit?

      Also, I’m not sure I agree with the word “punishment” being applied to rubric-based grading. A rubric can provide a helpful summary of diverse feedback, but it does not replace teacher commentary in the margins.

      Where would, say, a sentence fragment fit in? I would say that’s evidence of a lack of understanding of grammar and usage rules, but it’s also considered bad content.

      There’s also something to be said for writing that is so unclear (which you said belongs in PE) that its meaning and practical content (which belongs in C) cannot be interpreted. I grant you that C and PE are separate measures, but I don’t think they’re always as independent as we might hope.

    • Oh no, my comment was unclear! 🙂

      Here is a list of the grading systems that are being discussed:

      G1 – Standards Based Assessment, the national reform movement that is gaining momentum despite my critique of it.
      G2 – Your current grading system.
      G3 – Two-axis grading.
      G4 – A merger of G2 and G3.

      About punishment: The proponents of G1 have shown that deducting points for mistakes without giving students positively worded objectives is interpreted as punishment, and is ineffective. They focus on “positive” labels that I would categorize under the C score, and they also try to reword “negative” labels that I would categorize under PE so that they sound like objectives (i.e. great spelling!). The other issue with G1 is that they throw out numerical scores altogether! They rate each rubric with “unacceptable”, “below expectations”, “meets expectations”, or “exceeds expectations”. These ratings are then entered into software that the school uses to deliver a summary report card to parents.

      About final averages: In the two-axis system (G3), the idea is to keep two scores all the way though to graduation without lumping them into a single score (SS). If they are lumped, then the scores affect each other. The C score, comprehension, is not affected by mistakes. You can get a score of C=100 by attempting to answer all questions or meet all objectives. The PE score, percent error, will change as more objectives are attempted.

      Sentence fragments: You would be the best judge for how to score a sentence fragment. How do you do it using G2? I would say that a fragment is an attempt to meet objectives and that the result was an error.

      Overall, you’re making me think about how to apply the two-axis grading concept. In information theory, for each binary bit in a transmission, the receiver claims that the bit was detected or not detected. Then, only in the detection case, the value of the bit (0 or 1) is checked to see whether the detection was correct or mistaken. These two information metrics are so fundamental that all of our modern communication systems (radio, cell phones, internet, etc.) are based on them. Measuring information is the main idea. Beyond that, some of the details (like scaling to a maximum of 100 points) could be reconsidered.

      I am interested in helping you create G4. I would love to know how students respond to this system, and there is potential for your experience to show up in a book someday as the first shot of the revolution!

    • Let’s go back to your information-theory comparison, because I think that has untapped relevance. If the receiver does not detect the transmission, is the content of the transmission considered? If I follow you, the answer is no. Receipt/detection is a prerequisite of making sense of the information. Detection becomes a gatekeeper. No matter how amazing or important the message is, if it’s not detected, it’s as good as unsent.

      In writing, clarity (mostly mechanics, which would include sentence fragments) is like the gatekeeper. A student can have an amazing idea but write so poorly as to not be understood. I can’t help but think that the two axes are more interdependent than you’re saying, at least in the case of subjective tasks like composition.

      As I wrote the above ¶, it occurred to me that basic mechanical skills are elementary matters, and by the time students work on expressing more complex ideas, they should have successfully mastered the skill of writing a complete sentence. I suppose that, to an extent, this point would be irrelevant if the elementary grades demanded mastery before promotion. If *all* students were *guaranteed* to understand proper sentence structure, it might be different. Emphasis could be on more advanced elements of writing.

      There’s a completely different school of thought which argues that as students are learning new things, their mechanics will slip/revert until the new ideas can be processed sufficiently. In other words, more brain power is being taken up with the concepts than the writing skills. (The brain favors content to presentation; message to receipt thereof.) If that’s so, as a student struggles to improve PE on a difficult concept, the C might fall due to distracted focus.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: