Hi – I’m back again! The new semester is now well underway and I am beginning to reach cruising speed. Based upon my rather slow writing speed and my teaching load this semester, I think that three posts per week will be my typical output for now. I plan to add to this blog on M/W/F mornings. We’ll see how that goes…
I have been discussing selected response (objective) assessments for the past few posts. I’d like to completely change gears today. What about high-stakes assessments (as in – you don’t graduate if you don’t pass) that are subjectively given a single score by an evaluator? I have to admit it – this prospect made me very uneasy (how’s that for a euphemism?) at first. Once I saw this type of assessment in action, though, I quickly a big fan of this approach.
Before joining the faculty at Ferris State University, I was an assistant professor at California State University, Long Beach. Each member of the California State University system is required to assess the writing proficiency of all students graduating from that institution. CSULB elected to create a high-stakes writing exam (the WPE) to measure student writing proficiency. In it, students that have taken 65 credits of classes are required to write an analytic response to a provided prompt within 75 minutes. These responses need be read and evaluated by the faculty. CSULB has 35,000 students, so thousands of papers must be scored each year. To simplify this process, a holistic scoring rubric was developed and I have created a copy of it here. Each student paper was read by three separate faculty reviewers. Each person scored the paper using the holistic rubric and the submission received the total of those three scores. A passing paper needed a score of 11 or higher (two fours and a three). In addition, the three reviewer scores were not allowed to vary by more than one point overall. In cases where the score diverged by more than a point, an expert arbitrator made the final evaluation
The main advantages of this approach are simplicity, speed, and reliability. The scoring rubric is easy to understand and to communicate to students. The WPE website includes study suggestions and an entire workbook for students to see sample essays and practice prompts. With thousands of paper to be read in triplicate, speed was essential. I am not a particularly fast reader, yet I was able to accurately score over 200 papers in about six hours on a Saturday reading. The accuracy of scoring can be evaluated by examining the inter-rater reliability (how close are the scores of separate raters when reading the same paper?). The overall reliability of this process was between 95 and 99% – very high. Only a handful of papers were ever sent to the expert arbitrators due to scoring discrepancies.
The principle disadvantage of this approach is the disciple and training that is necessary to ensure that the process is fair and robust. Each of the prompts used for the writing assignments had to be tested to ensure that there was little or no bias (cultural, gender, etc.) introduced. In addition, in order to achiever reproducibility in scoring, all readers need to be trained. The first hour of any reading session began with range-finders. Papers the typified each of the possible scores on the rubric were selected and given to the readers to score. After rating the papers, all of the scores were tallied on a chalkboard and the relative merits or deficiencies of the papers were discussed. Once everyone agreed upon what a 4 or a 5 looked like (for example), they were given actual papers to grade. Another, smaller set of range-finders was also given after lunch to reinforce the standards.
This type of approach is probably not practical for most course-level assessments (though perhaps huge, multi-section General Education classes might consider it…). However, this may be something that we could apply at Ferris State in measuring program outcomes. What do you think? Does it look interesting or impossible to you?