historical-thinking-assessment-designer
Historical Thinking Assessment Designer
What This Skill Does
Designs formative assessment tasks that make students' historical thinking visible — revealing whether students are sourcing, close reading, contextualising, and corroborating, and at what level of sophistication. The output includes assessment tasks with documents and prompts, a scoring guide distinguishing novice/developing/proficient responses, diagnostic guidance on how to interpret results, and validity notes identifying what the assessment does and does not measure.
The fundamental problem this skill addresses is that most history assessments test factual recall rather than historical thinking. A test that asks "When was the Battle of Gettysburg?" measures whether students memorised a date. A test that asks "Does this painting help us understand what happened at the first Thanksgiving in 1621? Explain your reasoning" — and provides a painting created in 1932 — measures whether students consider the temporal relationship between a source and the event it depicts. The first tells you what students remember; the second tells you how students think.
Wineburg, Smith, and Breakstone (2018) demonstrated this gap devastatingly. Using History Assessments of Thinking (HATs) — brief constructed-response tasks targeting sourcing, contextualisation, and claim-evidence reasoning — they found that 94% of introductory college students ignored the 300-year gap between a 1621 event and a 1932 painting. Upper-level history majors performed only slightly better: 78% earned zero on a task requiring them to use a Senate hearing's existence (not its content) as evidence of public opposition to war. Students consistently read for content rather than context — engaging with what documents said but not with the circumstances of their creation.
These findings confirm that historical thinking skills must be explicitly assessed, not assumed. And the HATs model shows how: brief tasks centred on a document, with questions that can only be answered well by deploying specific historical thinking skills. This skill generates assessments following that model.
Evidence Foundation
Wineburg, Smith, and Breakstone (2018) developed History Assessments of Thinking (HATs) with Library of Congress support. Three HATs were administered to 78 introductory and 49 upper-level college students. Each HAT presented a document (a painting, testimony transcripts, a playbill) with full bibliographic information and asked questions that required sourcing, contextualisation, or claim-evidence reasoning to answer well. The scoring rubric used three levels: Basic (0) — no evidence of the target skill; Emergent (1) — some evidence but incomplete or superficial; Proficient (2) — clear demonstration of the skill. The results were stark: introductory students averaged less than 0.5 out of 12, and the highest score was 3 out of 12. The HATs model demonstrates that historical thinking can be assessed briefly, authentically, and diagnostically — each task reveals a specific skill presence or absence.
Reisman (2012) used a 30-item Historical Thinking Test (22 multiple choice + 8 constructed response, α = .79) and a 20-item Transfer test as outcome measures. The Historical Thinking Test assessed application of sourcing, close reading, contextualisation, and corroboration strategies. Treatment students showed significant gains on this measure, demonstrating that historical thinking skills are both teachable and assessable. Reisman's finding that effects were concentrated in sourcing and close reading (not contextualisation or corroboration) provides diagnostic guidance: if an assessment shows students performing well on sourcing but poorly on contextualisation, this is consistent with the research — the skills develop at different rates and require different instructional approaches.
Wiliam (2011) established that formative assessment's value lies in making student thinking visible so that teachers can adjust instruction. The most useful formative assessment tasks are those that distinguish between students who have developed a skill and students who have not — tasks where the wrong answer is as informative as the right answer because it reveals a specific misconception or skill gap. For historical thinking, this means assessment tasks where a student who sources will answer differently from a student who doesn't, and where the specific way a student fails to source reveals what they need to learn next.