Technology in Speaking Assessment

Technology in Speaking Assessment

After watching the recorded presentations, join these authors for a live panel discussion on December 4, 2020 at 9:30 am – 10:00 am (CST). Moderator: Agata Guskaroska


Hye-won Lee

Senior Research Manager
Cambridge Assessment English

Making it Happen: Assessing Speaking through Video-Conferencing Technology

Practical considerations such as 'administrative conditions' are especially important when new test formats are operationalised, for example, a speaking test delivered via video-conferencing technology. The literature on research-informed practical implementations of remote speaking tests is limited. This study aims to contribute to this research niche through reporting on the last phase of a research project on a high-stakes video-conferencing speaking test. In the previous three phases (Nakatsuhara et al., 2016; Nakatsuhara et al., 2017; Berry et al., 2018), it was established that the in-room and remote delivery mode are essentially equivalent in terms of score comparability and elicited language functions, but some practical issues were identified as potentially affecting the validity of the test score interpretation and use.
The final phase was designed to extend the evidence gathered about examiner and test-taker perceptions regarding specific aspects of the test delivery and platform, such as the examiner script, sound quality, display of test prompts, and examiner/test-taker guidelines. Adopting a convergent mixed-method design (Creswell & Plano Clark, 2007), questionnaire and focus group data were gathered from 373 test-takers and 10 examiners. In the presentation, I will discuss key findings and their implication for the practical implementation of the test. I will end with an emphasis on the importance of including research-informed administrative considerations as part of a validity argument.
Video Recording

Jing Xu, Edmund Jones, Victoria Laxton and Evelina Galaczi

Principal Research Manager
Cambridge Assessment English

Assessing L2 English speaking using automated scoring technology: Examining automarker reliability

Automated scoring is appealing to large-scale L2 speaking assessment in that it increases the speed of score reporting and reduces the logistical complexity of test administration. Despite increased popularity, validation work on automated speaking assessment is in its infancy. The lack of transparency on how learner speech is scored and evidence for the reliability of automated scoring has not only raised language assessment professionals' concerns but provoked scepticism over automated speaking assessment among language teachers, learners and test users (Fan, 2014; Xi, Schmidgall, & Wang, 2016).
This paper contributes to this niche in language assessment by providing evidence for the performance of the Custom Automated Speech Engine (CASE), an automarker designed for the Cambridge Assessment English Linguaskill Speaking test, and by problematising traditional approaches to establishing automarker reliability. We argue that correlation is inappropriate for measuring the agreement between automarker and human scores and that quadratic-weighted Kappa (Cohen, 1968) may behave strangely and is hard to interpret. Instead, we chose to use 'limits of agreement', the standard approach in medical science for comparing two concurrent methods of clinical measurement (Bland & Altman, 1986, 1999). Additionally, we examined automarker consistency and severity, as compared to trained examiners, using multifaceted Rasch analysis.
Video Recording