Name
Predicting and Evaluating Item Responses Using Machine Learning, Text Embeddings, and LLMs
Speakers
Description
This study explores how artificial intelligence can support assessment development by simulating student responses to field test items on a social-emotional learning assessment. We compare three methods for generating synthetic data: (1) machine learning models trained on actual student data, (2) models enhanced with semantic text embeddings, and (3) large language models (LLMs) prompted with student profiles. Using DESSA responses from 3,982 high school students, we evaluate each method’s accuracy and ability to replicate item parameters derived from real data. Preliminary results suggest machine learning models produce the most accurate and reliable synthetic responses, with LLMs showing promise but requiring refinement. This work highlights the potential for AI to reduce field test burdens while supporting construct validity and offering insights into tradeoffs between structured and generative modeling approaches in educational assessment.
Session Type
Presentation
Session Area
Education
Primary Topic
Test Development and Psychometrics