Large Language Models for Automated Grading and Synthetic Data Generation in Communication-Based Training Assessment

2025

Effective communication is critical in high-stakes tasks, particularly in scenarios requiring precision and coordination under time pressure. Here, we explore the potential of large language models (LLMs) to evaluate communication performance and generate synthetic conversation data for training and assessment purposes.

We present a proof-of-concept study focused on a highly structured task: the interaction between a forward observer and a fire direction center during a call for fire mission. Using a rubric-based approach, the LLM graded transcripts of forward observer communications, distinguishing between varying levels of trainee performance with high reliability and alignment to expected outcomes. Additionally, we demonstrate the utility of LLMs in generating synthetic transcripts that simulate varying performance levels. While this study is centered on the call for fire, the approach has broader implications for training assessment in complex, communication intensive tasks. Our results suggest that LLMs can serve as effective tools for both grading and data generation, enabling scalable solutions for improving performance in high-stakes domains.

Year: 2025
Category: Artificial Intelligence
Tag: Communication Assessment, Automated Grading, Large Language Models, Performance Evaluation, Generative AI
Author: Joseph P. Salisbury, David M. Huberdeau
Released: Proceedings of the Florida AI Research Society’s 38th International Conference (FLAIRS-38)

Access Paper

Featured Riverside Research Author(s)

Joseph P. Salisbury

Dr. Joseph Salisbury is a neuroscientist (Ph.D., Brandeis University, 2013) and software developer whose current research focuses on human-computer interaction, human-robot interaction, and applications of large language models.

David M. Huberdeau

David M. Huberdeau is a research scientist at Riverside Research in the Artificial Intelligence and Machine Learning Lab. He uses concepts from cognitive science and human neuroscience along with AI and ML approaches to conduct research and development in human-machine teaming. His current focus is in optimization, autonomous planning, human behavioral modeling, and reinforcement learning. Previously he completed a post-doc at Yale University in the Department of Psychology and a PhD in biomedical engineering at The Johns Hopkins University. He has experience in human behavior analysis and modeling, physiological recording and processing, and software development project management.

Disclaimer

The above listed authors are current or former employees of Riverside Research. Authors affiliated with other institutions are listed on the full paper. It is the responsibility of the author to list material disclosures in each paper, where applicable – they are not listed here. This academic papers directory is published in accordance with federal guidance to make public and available academic research funded by the federal government.

Academic Papers