Assessment with Analytic Rubrics

Grading Using Analytic Rubrics

BIOL108 214

Hi! I’m back again. Last time, I discussed how a holistic rubric can be used to score high-value assignments and provide reliable assessment of program outcomes. Today I would like to look explore analytic rubrics. Unlike holistic rubrics, these tools allow us as instructors to break complex student tasks into discrete and measurable pieces. Students are given separate scores for each subtask often associated with a comment or other formative feedback. The total score is used as a measure of their total progress toward mastering the overall project. This is the daily-double of assessment; I would like us to go for the trifecta. The missing piece in most cases is an analytical summarization and report of the overall findings for the project. This is not as hard as it may at first seem.

Like many instructors, I use analytic rubrics to score many different types of assignments in my courses. The most complex of these is a group research poster project performed over the course of an entire semester in lab. This project clearly maps onto the Ferris State University “Scientific Understanding” student learning outcome. Additionally, it maps onto the program and course outcomes regarding scientific communication and use of the scientific method. Therefore, it represents an excellent opportunity to collect assessment data with broad relevance to what we do in our unit.

Large and complicated projects require detailed instructions and clear scoring guidelines. To this end, I have developed an analytic rubric with nine different measurable parameters. These are each individually described and score and are completed sequentially during the semester. The score criteria and specific instructions for each component are given to the students in the form of a lab handout. Here are some examples of these documents.

  • Scientific Method – This first document explains the overall concepts behind the scientific method and sets the stage for the class’ own experimental design experience.
  • Hypothesis – The first step in performing a scientific experiment is to define a testable hypothesis.
  • References – Each student is required to perform some literature review for the project. These references eventually get culled down to a final list of citations for their project.
  • Protocols – The class must create their own experimental procedures. They must also take sample selection and proper controls into account
  • Figures and Captions – Following data collection, the class must attempt to illustrate their findings with tables, graphs, drawings, and photographs as they see fit.
  • Summary – The main findings of the project must be summarized.
  • Introduction – Once everything is in place, an interesting and factually accurate introduction must be crafted.
  • Layout – The students must work as a group to insert their work into a poster template for printing and posting in the department.
  • Participation – Student participation in the poster project is graded by both the instructor and their peers.
  • Group score – Individuals earn 80 points for their contribution and up to 20 points as a group score

The interesting part of this exercise comes in scoring it. I have created an Excel scoring sheet for these rubrics. By simply typing a letter into the corresponding score fields, the students’ contributions can be quickly scored and an overall summary of the course performance is collected. Summary statistics for each criterion are calculated and the class performance for each criterion is plotted in a series of column graphs. These data are suitable for saving as a PDF and submission into TracDat. With very little extra effort, I have achieved the trifecta of assessment! The students receive lots of formative feedback over the span of a semester. I collect summative scores for each student that are entered into the grade book. More importantly, I collect useful data that maps onto course-, program-, and university-level student learning outcomes. All that and I have not even touched our Course Management System (Blackboard). Next time I will show you how to make Blackboard work for you instead of against you in your assessment efforts.

Posted in Resources | Tagged , , , | Leave a comment

Assessment Using Holistic Rubrics

Holistic Assessment


Hi – I’m back again! The new semester is now well underway and I am beginning to reach cruising speed. Based upon my rather slow writing speed and my teaching load this semester, I think that three posts per week will be my typical output for now. I plan to add to this blog on M/W/F mornings. We’ll see how that goes…

I have been discussing selected response (objective) assessments for the past few posts. I’d like to completely change gears today. What about high-stakes assessments (as in – you don’t graduate if you don’t pass) that are subjectively given a single score by an evaluator? I have to admit it – this prospect made me very uneasy (how’s that for a euphemism?) at first. Once I saw this type of assessment in action, though, I quickly a big fan of this approach.


Before joining the faculty at Ferris State University, I was an assistant professor at California State University, Long Beach. Each member of the California State University system is required to assess the writing proficiency of all students graduating from that institution. CSULB elected to create a high-stakes writing exam (the WPE) to measure student writing proficiency. In it, students that have taken 65 credits of classes are required to write an analytic response to a provided prompt within 75 minutes. These responses need be read and evaluated by the faculty. CSULB has 35,000 students, so thousands of papers must be scored each year. To simplify this process, a holistic scoring rubric was developed and I have created a copy of it here. Each student paper was read by three separate faculty reviewers. Each person scored the paper using the holistic rubric and the submission received the total of those three scores. A passing paper needed a score of 11 or higher (two fours and a three). In addition, the three reviewer scores were not allowed to vary by more than one point overall. In cases where the score diverged by more than a point, an expert arbitrator made the final evaluation


The main advantages of this approach are simplicity, speed, and reliability. The scoring rubric is easy to understand and to communicate to students. The WPE website includes study suggestions and an entire workbook for students to see sample essays and practice prompts. With thousands of paper to be read in triplicate, speed was essential. I am not a particularly fast reader, yet I was able to accurately score over 200 papers in about six hours on a Saturday reading. The accuracy of scoring can be evaluated by examining the inter-rater reliability (how close are the scores of separate raters when reading the same paper?). The overall reliability of this process was between 95 and 99% – very high. Only a handful of papers were ever sent to the expert arbitrators due to scoring discrepancies.


The principle disadvantage of this approach is the disciple and training that is necessary to ensure that the process is fair and robust. Each of the prompts used for the writing assignments had to be tested to ensure that there was little or no bias (cultural, gender, etc.) introduced. In addition, in order to achiever reproducibility in scoring, all readers need to be trained. The first hour of any reading session began with range-finders. Papers the typified each of the possible scores on the rubric were selected and given to the readers to score. After rating the papers, all of the scores were tallied on a chalkboard and the relative merits or deficiencies of the papers were discussed. Once everyone agreed upon what a 4 or a 5 looked like (for example), they were given actual papers to grade. Another, smaller set of range-finders was also given after lunch to reinforce the standards.

This type of approach is probably not practical for most course-level assessments (though perhaps huge, multi-section General Education classes might consider it…). However, this may be something that we could apply at Ferris State in measuring program outcomes. What do you think? Does it look interesting or impossible to you?

Posted in Uncategorized | Tagged , , , | Leave a comment

Selected Response (Bloom)

Assessing Critical Thinking Using Selected Response Items

ThinkSorry for missing my self-imposed deadline yesterday! I was moving my youngest daughter down to Kendall College of Art and Design and it took much more time and effort than I expected. It seems like just last year that I was the freshman on the curb watching my parents’ reflection in the car mirror recede as they dropped me off at school. It seems very strange to be on the other side of the mirror now.

Well now to the task at hand. One of the most pointed criticisms of multiple-choice (selected-response) assessments is that they cannot be used to measure higher levels of cognitive ability. I will readily concede the point that most multiple-choice exams do not assess much beyond the level of understanding. The vast majority of the questions that ship out in the test banks from college textbooks do not require much more than simple memorization (remembering level stuff) with a few categorization/classification (understanding items) tossed in. This need NOT be the case. There is a fairly compelling argument that mid-level cognitive abilities (applying and analyzing) can be handled very well in a MCQ format. A relatively recent article addresses this specifically for the field of Biology (my personal interest) and I am providing a link here. I believe that it essential to include a substantial number of mid-level questions in our assessments regardless of whether the class is 100-, 200-, 300-, or 400-level. You can refer to my earlier post on leveling for a little more info on my assessment philosophy.

Some people think that MCQ assessments can also be used for upper level cognitive abilities (evaluating). I’m not so convinced about this. I know that problem-based learning can be evaluated with selected-response items. However, selecting the most appropriate choice isn’t really the same thing as independently evaluating a problem and its possible solutions. I occasionally try a MCQ for evaluation, but I’ve never been too happy with using those data. Creating just flatly cannot be assessed using these tools. Selected response exams are not a panacea for assessment.

I also use my selected response assessments to get at metacognition in my courses (students thinking about their thinking…). I have created a handout for my students that explains the different types of thinking required in my course and how I plan to measure it. Throwing problem-based learning at students without preparing them for it is unfair and leads to poor results (and upset students). I am attaching a copy of my document here for your comments. Next time, I want to discuss how selected response items can be implemented for course assessment. See your then.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Selected Response (construction)

Constructing Useful Selected Response Items


Today, I would like to briefly introduce some practices that I use to improve the item quality on my multiple-choice assessments. Regardless of format that we use, we must remain mindful of reliability and validity whenever we try to measure student learning. The reliability of an assessment is a measure of scoring reproducibility. If we could repeatedly administer a highly reliable assessment to a class, the students would score at or about the same level each time. Well-constructed selected response assessments typically have high reliability. I will discuss how this is calculated and interpreted in a future post. The validity of an assessment is the degree to which the assignment actually measures what we set out to measure. This is where things could potentially get sticky for multiple-choice assessments.

One of the main threats to validity for MCQ is “test-wiseness”. You do not have to look very long on the internet to locate information with suggested strategies for taking multiple-choice exams. Here is a typical one. If we are to maintain validity when using MCQ, we must defeat (or at least mitigate) these strategies. This does not mean that we must write trick questions. Instead, we need to carefully construct our assessment items so as not to accidentally provide unintentional clues that allow students to correctly answer the questions without knowing the answer (guessing).

Most of what I’ve learned about multiple-choice items has come from reading Developing and Validating Multiple-Choice Test Items by Thomas Haladyna. Not exactly a summertime page-turner, but a great resource! Here are some suggestions that I’ve gleaned from it and other resources:

  • Each item should address one of your course outcomes.
    if it is not a part of your outcomes, why are you asking it?
  • Stems should be direct questions or clearly phrased statements.
    for example, War and Peace was written by _____. or Which of the following authors wrote The Great Gatsby?
  • Avoid grammatical clues in the responses.
    articles like a a/an or singular/plural
  • Be careful about how the choices are arranged.
    I list them from shortest to longest in length, alphabetically when they are similar in length, and from smallest to largest when numerical.
  • Use a randomized key to avoid creating patterns in the responses
    this prevents unintentional biasing toward any one response – like C.
  • Make sure that all of the possible answers are at least somewhat plausible to an uninformed reader. Unbelievable choices serve no function from an assessment point of view.
  • Do not use “all of the above”
  • Use “none of the above” sparingly
    if you do use this, make sure that it is the correct answer sometimes
  • If negatives or conditions are used, they should be highlighted
    for instance: Which of the following is NOT a prokaryote?

Haladyna suggests that the most efficient multiple-choice exams have three possible answers. Most textbooks use four or five item exams. He claims (and my own experience bears this out) is that most questions only have three functioning responses (those that the class chooses with much regularity). Items with just three choices tend to freak people out (I was initially pretty skeptical too). What about guessing?? Let’s do a little math… If I give a 20-item quiz with three options each, what is the chance that a student could score 70% (14 correct) or better by just blindly guessing? This can be calculated as p = 1 – binomialcdf(20,(1/3),13) = 8.788 x 10-4 or about 0.088%. Not too likely – and the odds are even longer when there are more items. I’m not worried.

One thing that I did fret about was adequately randomizing my answer key. Like many people, I tend to favor B or C over A and D (edge aversion, I suppose). To overcome this, I have created a spreadsheet that will create random keys of any length up to 1,000 questions. This one will generate four-option keys (I still use these). I may share variants later to generate 2-, 3-, and 5-option keys if you ask.

The really big issue with selected response items is whether or not they can be used to test critical thinking. I say yes! (up to a point) and that will be the subject of tomorrow’s post.

Posted in Uncategorized | Tagged , , , | 2 Comments

Selected Response

Selected Response Assessments


Over the next few weeks, I plan to spend some time covering the basics of the different sorts of assessment tools that are available for our courses. Each of these techniques have their own unique strengths and weaknesses. I do not view any one of them as the “ideal” method for assessment. They are, after all just tools that we use in trying to measure student learning, instructional effectiveness, and programmatic efficacy. Like any other tool – they can be handled skillfully or they can be poorly wielded. The fact that some amateur carpenters might injure themselves while using a nail gun does not the nailer a poor tool. A poorly conceived assessment strategy is similarly unproductive (though typically less physically painful). A master carpenter will use a variety of tools (each carefully chosen and carefully applied) to complete a task. Likewise, when we intelligently select from our assessment toolbox and apply them in our courses we can greatly enhance our students’ learning. This week, I would like to begin with a much-maligned assessment tool: the selected response (multiple-choice) exam.


Selected-response assessments are composed of a series of questions or statements (items) that the students must answer. The question or statement for each item is usually called the stem. The students pick from a variety of potential answers – the correct response and one or more incorrect choices (usually called the distractors). The number of distractors may vary. However, most instructors use between one and four.


Selected-response assessments have several important advantages (which is why this type of assessment is still so commonplace). The following is a brief list of some of the chief advantages of this sort of assessment:

  • They are objectively scored (either correct or incorrect)
  • They are easy to automate (Scantron, clickers, and online exams can be used)
  • They are quick to respond to (more questions and be asked on an exam
  • They are easy to analyze (there are a variety of statistical tests that can be applied to MC questions
  • They are relatively easy to construct and modify over time


Selected response assessment have gotten quite a bit of bad press over the past couple of decades. Much of this criticism was very deserved. However, the real weaknesses that have been pointed out reflect more on poor assessment implementation rather than a fundamental weakness in the assessment format. Some weaknesses that are often attributed to selected-response assessments include:

  • Multiple-choice questions do not test higher levels of cognition (applying, analyzing, evaluating, and creating)
  • Multiple-choice questions favor students that are test-wise
  • Multiple-choice questions reward guessing rather than knowing
  • Multiple-choice questions are not authentic assessments (they do not reflect real-world situations)
  • Multiple-choice questions give very limited feedback (formative assessment) to the students

Over the next few days, I plan to share with you some of my thoughts and experiences with this sort of assessment. I think that I have some very practical and useful suggestions to leverage this sort of assignment in many different (but not all) courses. Tomorrow, I will discuss how to construct MCQ for maximum effect. See you then.

Posted in General interest | Tagged , , , , , | 1 Comment

Poll 1

How do you assess student learning in your courses?

I’m sort of curious; what types of assignments do y’all use in your courses to assess student learning? I plan to discuss some of the advantages and disadvantages of different approaches over the next few weeks. Why do you use the assignment types that you have chosen? Take a moment to vote on this poll (you can vote for more than one type of assignment if you like) and please feel free to post comments.

Posted in General interest | Tagged , | Leave a comment


What level(s) are you assessing at?


Appropriate leveling is an important consideration when developing learning outcomes and assessment strategies for our courses. I think that we probably all agree that a senior-level capstone biology course ought to require a different level of cognitive effort than a freshman introductory biology course. The tricky part is moving from that nice safe generic statement to a more specific and actionable one. What cognitive abilities are appropriate at different course levels? How can our students demonstrate these abilities and how can we best measure them? What do we mean by cognitive levels anyways?

For over five decades, Bloom’s taxonomy of the cognitive learning domain has been used to define different levels of critical thinking. I personally prefer the recent (relatively speaking) modification of this system described in A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives by Anderson and Krathwohl. The revised taxonomy is actually laid out in a two-dimensional matrix that is defined by a knowledge dimension (what is known?) and a cognitive dimension (at what level of abstraction is it known?). They also provide a variety of specific examples to aid in proper categorization of items within our courses. One useful interactive website that illustrates this new taxonomy can be found at the Center for Excellence in Learning and Teaching at Iowa State University.

Well let’s suppose that can agree that different courses ought to assess and different levels and we can (for the moment at least) agree to use Anderson and Krathwohl’s taxonomy to define educational objectives. How much emphasis should different levels of courses emphasize the various levels of cognitive ability? I will throw my thoughts out and see if anyone is listening… Let’s keep things a little simpler and reduce the six cognitive levels to three (low, medium, and high as shown in the graphic for this post). Here is one suggestion for balance at different course levels:

  • 100-level courses – 60% low, 30% medium, 10% high
  • 200-level courses – 40% low, 40% medium, 20% high
  • 300-level courses – 30% low, 45% medium, 30% high
  • 400-level courses – 10% low, 50% medium, 40% high
  • Graduate course – 0% low, 50% medium, 50% high

Admittedly, this is a somewhat arbitrary system, but here is my rationale. Lower division courses are primarily concerned with introducing a body of knowledge and this will be most directly assessed at the lower end of the cognitive spectrum. The mid-level courses (200 and 300) tend to build upon this knowledge base and require more problem solving and critical analysis. These are best assessed in the middle of the cognitive spectrum. The upper division courses include capstone experiences and ought to focus primarily at the upper end of the cognitive pyramid. What do you think of this schemata?

This is all well and good. However, the devil is in the details. I plan to spend the next several weeks talking about some of the specifics: types of assignments, types of assessments, strengths and weaknesses, etc. After all – this blog is supposed to be about assessment in action. See you then.

Posted in Books, General interest, Website | Tagged , , , | 2 Comments

Pretest / Posttest Analysis

Pre-test / Post-test Evaluation of Learning

One common strategy to measure (assess) student learning in a course is to administer a pre-test / post-test assignment. At or near the beginning of instruction, a pretest is given to the class to determine the class’ preexisting knowledge of the content area. Later on – at or near the end of instruction – the same assessment is given in an attempt to demonstrate measurable gains in student knowledge. Some useful suggestions concerning the implementation of such an assessment strategy can be found in this guide from the International Training and Education Center for Health.

Although this assessment strategy is popular, if you read very much literature concerning pre- and post-test analysis you will discover that it is a bit controversial. Its biggest limitation seems to be that the validity of any inferences you can make is rather low. Because there is rarely a control group (the whole class generally takes the pre- and post-tests), there is no reliable comparison to be made. If a statistically significant difference in scores is detected, we cannot be certain that the instruction provided in class actually caused that difference. Another weakness is that the students do not always try their best on the assignments (their score may not truly reflect the state of their understanding).

I use pre-tests and post-tests in my courses (as one means of assessing student learning). The validity problem cannot be helped. Since this is only one measure of student learning, however, I’m not greatly bothered by that. In practice, the gains seen in my pre-test / post-test scores correlate very well (r >0.7) with exam scores. I try to mitigate the student effort complication by awarding bonus points. My students can earn five bonus points each for simply taking the pre-test and post-test. This encourages participation. They may earn an additional five points each for scoring at or above 70% on the assignments. This encourages them to actually try their best.

Analyzing and summarizing the results of these tests can be time-consuming. To simplify the process, I have created a spreadsheet that automatically does it for me. You can find a copy here or in the Resources page. You simply enter some basic course information and copy/paste the matched pre- and post-test scores in. A succinct summary of the test is generated along with a nifty graph. Simply save that one page report as a PDF and you have documentation of student learning that can be entered into TracDat. Give it a try and let me know how you like it. Cheers until next time.

Posted in Resources | Tagged , , , , , , | Leave a comment

Mapping Outcomes

Mapping Course Outcomes

Yesterday, I wrote a bit about mapping course assessments to specific outcomes. Now I would like to show you a couple of ways that I try to do this in my own classes. I’ll tell you right up front the I do not think that there is any one best way of doing this. Furthermore, these examples are not necessarily what I would consider best practices – I just want to give you some concrete examples to consider.

The first way that I map assessments to outcomes is by explicitly making the connections in my course syllabus. Below is one of my course outcomes for a content area in a Medical Microbiology course that I teach.

A. Microbial Diversity – Give examples of and compare and contrast different types of microbes (including viruses, bacteria, fungi, and protozoa) as well as identify various structures and define their functions.

Assessed via the homework (1-6), laboratory quizzes (1-4), lecture exam questions (especially exam 1), the laboratory practical, and the comprehensive final exam.

I also give feedback to the students concerning their performance on these graded assignments and break down their scores by learning outcome. So, for instance, following a lecture exam they would be able to determine their current state of understanding with regard to what I call learning outcome A (microbial diversity).

A second – more intuitive – way I try to communicate the relationship between assessment and outcome is through a graphical syllabus. I got this idea from reading The Graphic Syllabus and the Outcomes Map by Linda Nilson. This is a way of showing a great deal of information about the structure of a class in the form of a picture. I have linked to an old version of a graphic syllabus from one of my old courses here. As I said before, these are just some examples of how I have tried to explicitly tie my assessments to the course outcomes. What about you? What have you tried in your courses? I am always eager to learn from others; please feel free to comment, I would like to build some community here.

Posted in Books, General interest | Tagged , , , | Leave a comment

Assessment and the syllabus

Assessment and the Syllabus

Hello again! I have taken a brief hiatus to enjoy the last glorious bits of summer. Now I, like you, must refocus my energies upon my upcoming classes.

As the new semester rapidly approaches, many of us will be preparing and updating our syllabi. This is an excellent time to consider describing your course assessments as teaching and learning tools. I have recently reread an excellent book by Kathleen Gabriel – Teaching Unprepared Students: Strategies for Promoting Success and Retention in Higher Education. Don’t let that title put you off; I am not trying to denigrate the quality of our students. I think that this book has many excellent suggestions concerning ways to clearly organize course materials and communicate our expectations to students of all abilities. The third chapter deals with the first week of class and setting the tone for the upcoming term. A syllabus with clear and measurable outcomes and explicit descriptions of the assessment rationale plays an essential role in this process. After reviewing many syllabi, it is clear that most of our courses have pretty good outcomes. However, the course assessments (be they tests, quizzes, or assignments) are rarely directly mapped back to those outcomes. One practice that I would like to encourage this year is to begin explicitly linking at least some of our assigned coursework to their corresponding outcomes.

Course assessments can play three complementary roles in our classes: formative feedback to the students, intermediate evaluation of current student skills, and summative evaluation of course goals. My mission is not to add more work to your load to accomplish these ends. Rather, I want to help you hit the assessment trifecta wherein one task simultaneously plays all three roles (at the track, you are on your own). I hope that as we share our assessment successes, near misses, and insights this year that we will all benefit. Not by adding more to our already full teaching loads! Instead, I want to promote a cycle of instructional refinements that will lead to more efficient teaching, improved student learning, and rational program development. Let the adventure begin…

Posted in Uncategorized | Tagged , , | Leave a comment