Universidad Autónoma de Madrid
The focus of Mick O’Donnell’s research has always been to understand the functional nature of language, a system which is both highly structured yet organically complex. To this end, He has explored computational modelling of language, both in terms of generating texts from underlying information representation, and automatic analysis of text to reveal meaningful patterns.
More recently, he has been exploring how we learn foreign languages, using learner corpora, manual error annotation, and automatic syntactic annotation, to identify the critical lexical and grammatical problems of specific sets of learners.
He has been developing corpus annotation tools since 1992, including Systemic Coder, RSTTool, and more recently, UAM CorpusTool, which has been downloaded over 17,000 times in the past seven years.
Mick’s Keynote Speech:
Exploring Learner Development in Terms of Evolving Contexts of Use
Learner corpora have been used to identify the critical linguistic problems faced by particular L1 communities (e.g., problems of when to use an article in Spanish and Japanese learners of English). Such studies provide information as to where we need to focus our teaching efforts for the particular group. Two types of analysis are most common: Error Analysis, to identify the most frequently produced errors of the group, and automatic syntactic analysis, to identify over-use or under-use of syntactic structures in the learners, compared to natives.
For grammatical constructions, the most basic production problems relate to knowing how to produce the structure: learners lack knowledge as to how the structure should be formed, and thus produce syntactically wrong text, for instance, as in “He said me that…”, or “He asked me who am I”.
However, even after learners master the construction of linguistic form, they still continue to make errors in regards to when to use the structure: in what expressive context is the form appropriate? For instance, the present perfect construction is fairly similar between English and Spanish, but (at least for European Spanish) the contexts of use are not identical. While a Spanish speaker might say “He comido el desayuno esta mañana” at any time in the day, an English speaker would only say “I’ve had breakfast this morning” while still having food in the stomach.
In this presentation, I will explore a learner corpus methodology based upon fine coding the “context of use” of grammatical forms used by the learner, with the goal of exploring which contexts of use of each structure are problematic for the learners in question.
In addition, by using a learner corpus ranged over six proficiency levels, I will show how the learners developmentally move from an interlingua where the contexts of use of structures approximate the L1, gradually mastering each context of use until their usage conforms to native patterns. For example, we can explore problems of article optionality in terms of differing contexts of use: we can posit a number of distinct expressive contexts for nominal reference, combining reference to a specific or generic entity, abstract or concrete, definite or indefinite, singular or plural, mass or count. For many of these contexts, both Spanish and English have similar expressive choices. However, in contexts of referring to generic plurals or abstract terms, English prefers no article, while Spanish requires an article (e.g. “Drugs are a problem for society.” Vs. “Los drogas son una problema para la sociedad.”). By mapping out the errors of use that learners make over developmental stages, we can identify where in their developmental path each context of use becomes critical, and thus where over a teaching curriculum attention on that context is most needed.