AI grading error affects hundreds of MCAS essays in Massachusetts: Here’s what went wrong
Massachusetts’ adoption of artificial intelligence to score statewide standardized tests has revealed technical vulnerabilities, affecting approximately 1,400 student essays, NBC Boston reports. The error, discovered over the summer, prompted the Massachusetts Department of Elementary and Secondary Education (DESE) to rescore the affected essays and notify the relevant school districts.
Teacher scrutiny uncovers the problem
The issue surfaced when preliminary results for the Massachusetts Comprehensive Assessment System (MCAS) were distributed to districts. In one notable instance, a third-grade teacher at Reilly Elementary School in Lowell identified anomalies while reviewing her students’ essays. The teacher noticed that some scores did not align with the quality of work submitted and raised the concern with the school principal. District leaders subsequently alerted DESE, prompting a review of the scoring process.
How the AI scoring system works
DESE and the testing contractor, Cognia, confirmed that the errors stemmed from a “temporary technical issue” in the AI scoring system, NBC Boston reports. AI essay scoring relies on human-scored exemplars to inform automated evaluations, with approximately 10% of AI-scored essays undergoing a secondary human review to verify accuracy. Despite this process, certain essays were incorrectly scored, with some losing points for minor discrepancies such as the omission of quotation marks when referencing the reading passage.
Human review and corrective action
Wendy Crocker-Roberge, assistant superintendent of the Lowell school district, said that while she personally reviewed around 1,000 essays, the precise cause of each scoring discrepancy was difficult to isolate, according to NBC Boston. However, it was evident that the AI system was deducting points without justification. DESE subsequently rescored all affected essays and corrected the data, ensuring that districts received accurate results.In total, 145 districts received notifications that at least one student essay had been impacted. DESE emphasised that the errors represent a small fraction of the roughly 750,000 MCAS essays scored statewide. Preliminary results were designed to allow districts to report discrepancies, a safeguard that facilitated the detection and correction of these errors.
The value and limits of AI in education
Mary Tamer, executive director of MassPotential, highlighted the broader value of AI in standardised testing. According to NBC Boston, she acknowledged that faster scoring can assist educators in identifying students who require additional support and inform instructional planning, while cautioning that human oversight remains essential to maintain accuracy.
A cautionary note for districts
Crocker-Roberge urged other districts to scrutinise AI-scored essays as the final MCAS results are released to parents in the coming weeks. She underscored the importance of careful implementation when introducing new technologies, noting that “artificial intelligence is just a really new learning curve for everyone, so proceed with caution,” as reported by NBC Boston.The incident illustrates both the potential and the limitations of integrating artificial intelligence into educational assessment, emphasising the continued need for rigorous oversight to safeguard accuracy and fairness in student evaluation.
