Constructing Useful Selected Response Items
Today, I would like to briefly introduce some practices that I use to improve the item quality on my multiple-choice assessments. Regardless of format that we use, we must remain mindful of reliability and validity whenever we try to measure student learning. The reliability of an assessment is a measure of scoring reproducibility. If we could repeatedly administer a highly reliable assessment to a class, the students would score at or about the same level each time. Well-constructed selected response assessments typically have high reliability. I will discuss how this is calculated and interpreted in a future post. The validity of an assessment is the degree to which the assignment actually measures what we set out to measure. This is where things could potentially get sticky for multiple-choice assessments.
One of the main threats to validity for MCQ is “test-wiseness”. You do not have to look very long on the internet to locate information with suggested strategies for taking multiple-choice exams. Here is a typical one. If we are to maintain validity when using MCQ, we must defeat (or at least mitigate) these strategies. This does not mean that we must write trick questions. Instead, we need to carefully construct our assessment items so as not to accidentally provide unintentional clues that allow students to correctly answer the questions without knowing the answer (guessing).
Most of what I’ve learned about multiple-choice items has come from reading Developing and Validating Multiple-Choice Test Items by Thomas Haladyna. Not exactly a summertime page-turner, but a great resource! Here are some suggestions that I’ve gleaned from it and other resources:
- Each item should address one of your course outcomes.
if it is not a part of your outcomes, why are you asking it?
- Stems should be direct questions or clearly phrased statements.
for example, War and Peace was written by _____. or Which of the following authors wrote The Great Gatsby?
- Avoid grammatical clues in the responses.
articles like a a/an or singular/plural
- Be careful about how the choices are arranged.
I list them from shortest to longest in length, alphabetically when they are similar in length, and from smallest to largest when numerical.
- Use a randomized key to avoid creating patterns in the responses
this prevents unintentional biasing toward any one response – like C.
- Make sure that all of the possible answers are at least somewhat plausible to an uninformed reader. Unbelievable choices serve no function from an assessment point of view.
- Do not use “all of the above”
- Use “none of the above” sparingly
if you do use this, make sure that it is the correct answer sometimes
- If negatives or conditions are used, they should be highlighted
for instance: Which of the following is NOT a prokaryote?
Haladyna suggests that the most efficient multiple-choice exams have three possible answers. Most textbooks use four or five item exams. He claims (and my own experience bears this out) is that most questions only have three functioning responses (those that the class chooses with much regularity). Items with just three choices tend to freak people out (I was initially pretty skeptical too). What about guessing?? Let’s do a little math… If I give a 20-item quiz with three options each, what is the chance that a student could score 70% (14 correct) or better by just blindly guessing? This can be calculated as p = 1 – binomialcdf(20,(1/3),13) = 8.788 x 10-4 or about 0.088%. Not too likely – and the odds are even longer when there are more items. I’m not worried.
One thing that I did fret about was adequately randomizing my answer key. Like many people, I tend to favor B or C over A and D (edge aversion, I suppose). To overcome this, I have created a spreadsheet that will create random keys of any length up to 1,000 questions. This one will generate four-option keys (I still use these). I may share variants later to generate 2-, 3-, and 5-option keys if you ask.
The really big issue with selected response items is whether or not they can be used to test critical thinking. I say yes! (up to a point) and that will be the subject of tomorrow’s post.