Name
Grade Expectations & The QWK Fix: Scaling Human-Level AI Scoring from Formative to High-Stakes Exams
Description

This session explores breaking the "iron triangle" of assessment—quality, accessibility, and cost—to deliver personalized learning at scale. Historically, organizations have traded critical thinking for rote recall due to the high costs of custom-trained models. Modern LLMs fundamentally shift this by enabling "expert-level" scoring from the first essay, slashing costs to as little as $0.01 for formative feedback. Research with the University of Oxford validates this approach, achieving a 0.90 Quadratic Weighted Kappa (QWK) score—signifying almost perfect agreement with expert human graders. Beyond top-line metrics, real-world implementation has reclaimed 5 hours of teacher time weekly and increased course completion rates by 7%.

Session Type
Presentation
Session Area
Education, Certification/Licensure
Primary Topic
AI in Assessment