Validation research in language assessment:
Contributions from methods in applied linguistics
Validity Arguments in Language Assessment: Contributions from Applied Linguistics
Current approaches to validation for language assessments require test developers and researchers to state the claims that they want to make about assessment results including the intended interpretations, uses, and consequences. Validation research seeking support for such a diverse set of claims necessarily draws upon a variety of quantitative and qualitative methods. I will open the conference with a look at some of the claims, warrants and assumptions that appear in validity arguments for language assessments to demonstrate the need for research based on methods from applied linguistics. I will also point to some challenges for language testing researchers who undertake to integrate a variety of evidence from applied linguistics research within validity arguments. The papers at LARC 2018 illustrate how researchers are addressing some of today’s challenges in validation research.
Purposing Writing Assessments: Focusing Complex Constructs in Variable Contexts
Writing, languages, and human abilities are so multi-faceted and variable that construct models for their assessment are necessarily partial representations of the full construct, designed to fulfill particular purposes of assessment in specific contexts (supported by varying degrees of validation and research). Assessment purposes are either normative (comparing on a common scale all people who take a test, usually for decisions about admission to educational programs, certification of professional abilities, or monitoring an educational system); formative (to inform teaching and learning for individual diagnosis, program selection, guidance, or motivation); or summative (to document and report achievement within educational programs). These purposes overlap and are easily confused in educational practices and policies because institutions, stakeholders, and educators want assessments to serve multiple functions. Exemplary writing assessments can be designed to fulfill multiple purposes systematically, as in ETS’ CBAL project—assessment of, for, and as learning. But most writing assessments for educational purposes remain limited to certain educational programs, populations, points in the lifespan, languages, genres, and purposes for writing. For these reasons, the design, uses, and evaluation of writing assessments in education should make a fundamental distinction between purposes that either are normative (so should not in principle relate to any particular curricula or teaching) or are formative and summative (which should be based directly on and inform curricula, teaching, and learning).
Validation of Language Tests in U.S. Public Schools: Roles of the Language Testers, Roles of Policy Makers
Language testing is not done in a vacuum. Current approaches to test validation ultimately provide support to making justifiable decisions about test testers leading to (hopefully) beneficial consequences for society. Language testers are typically well-trained technically, experts in their particular disciplines, and increasingly valuing a multidisciplinary outlook valuing the contributions of other areas of expertise in the complex process of language test validation. But what role does knowing how to apply a variety of quantitative and qualitative methods in the search for evidence to support claims ultimately play when in the public sphere it is often policy makers who drive the decision-making? In this paper, I try to illustrate a beneficial interplay between these various roles based on experience from the ACCESS for ELLs® testing program of the WIDA Consortium.
ACCESS is an annual assessment of the development of academic English language administered across 39 states and territories to some two million English learners in grades K to 12 to meet legal and civil right accountability requirements. Results on the assessment help determine English learner (EL) status (i.e., placement into English language services) and reclassification (i.e., exiting from such support services). In the paper, I present a model that has been useful for communicating roles in the ACCESS project, illustrate various roles played by language testers and policy makers, and argue for increased understanding by language testers of the bigger picture in which they do their work.
A challenge for language testing: The assessment of English as a Lingua Franca
This paper argues that the primary challenge facing language testing at the moment is the need to face the implications for assessment of the reality of English as a Lingua Franca. Given that much of the world’s business, its education (including national and international conferences) and its political interaction is conducted in English as a Lingua Franca, it is remarkable that few if any language tests exist specifically directed at measuring competence in English as a Lingua Franca communication. The reasons for this are complex, but are clearly associated with the fact that, as Messick pointed out, values are at the heart of the constructs in educational assessment. What values underlie the resistance of our field to the testing of English as a Lingua Franca? This paper tries to set out the radical implications for our field of an embrace of the construct of English as a Lingua Franca for the design of English language assessments, and the likely types of resistance that would result. The issue reveals the fundamentally value-driven and political character of language testing, a notion our field continues to experience as a major challenge.
Eye Tracking for Language Assessment
Eye tracking is a method of measuring the point of gaze, that is, where one is looking. This is typically accomplished by using specialized cameras to capture images of the person’s eyes, often in infrared light, and then processing these images to estimate the gaze point. In this workshop, participants will learn about the applications of eye tracking to language assessment research, including (1) user experience studies as part of the development and validation of computer-assisted methods of task delivery, (2) analyses of eye-fixation patterns in research investigating the behavior of test takers or raters, and (3) psycholinguistic studies of cognitive processes that underlie reading and writing that may inform the development of biometric methods for language assessment. Participants will gain hands-on experience using an affordable 60 Hz remote binocular eye-tracking system (Gazepoint GP3) with stock software (that comes for free with the device) and with specialized software that was developed at Iowa State University for psycholinguistic research. Examples of research studies that used various eye-tracking measures will be discussed.