Remote assessment programs face a fundamental tension: the systems built to catch fraud can also harm honest test takers. This session shares how to resolve that tension through two practical work streams, structured experimentation and observation-based proctoring. You'll learn how to run controlled experiments in live proctoring conditions to build evidence-based policy, measure inter-rater agreement at scale, and test review workflows without exposing operational test takers to unvalidated experiences. You'll also see how shifting from holistic, judgment-based review to structured behavioral observations improves consistency, reduces bias, and generates the labeled data needed for AI-assisted detection. Whether you're managing a large reviewer cohort or rethinking your fraud detection approach, you'll leave with a model for making remote assessment more consistent, more equitable, and more ready for AI.
Brooke Westerlund, Duolingo English Test
William Belzak, Duolingo