Three pre-conference workshops will be held on Thursday, September 15, 2016 and are open to registered conference attendees. Registration for the workshops will be accepted until all available seats are filled.
Workshop #1: Statistics for Corpus Linguists with R
Stefan Th. Gries, University of California, Santa Barbara
Thursday, September 15
10:00am – 5:00pm
(with a break for lunch)
Location: Gold Room, Memorial Union
This workshop will familiarize participants with the statistical programming language R and how to use it for the (1) import and processing, (2) description, (3) visualization, and (4) analysis of linguistic data; it is aimed at beginners.
As for the first item, we will briefly discuss R’s four most important data structures and how spreadsheet data are loaded into R and prepared for subsequent steps. In considering issues of description, we will turn to a variety of basic descriptive statistics used for categorical and numeric data, including frequencies, central tendencies, dispersions, and correlations. Next we will explore the visualization of data in a variety of simple but useful ways, such as dotcharts, boxplots, ecdf plots, and scatterplots. The emphasis will be on creating self-sufficient plots that can draw attention to trends or important data points in a data set. Then, in terms of analyzing linguistic data, we will discuss a variety of analytical scenarios that are frequently encountered in applied linguistics research.
Contrary to many introductory courses/workshops, however, this workshop will not deal with these scenarios in terms of simple monofactorial tests (chi-squared tests, t-tests, Pearson’s r, etc.). Instead we will explore all these tests from a regression perspective. This approach may appear to be more complex than learning simple functions for simple tests. However, it is superior in that it shows how many statistical tests typically taught separately can in fact be viewed as only slightly different instantiations of a more general logic–generalized linear modeling. In addition, this approach better sets the stage for subsequent exploration of multifactorial regression modeling in the participants’ own future work.
Note: Participants will work on their own laptops during this workshop. Prior to the workshop, please make sure that you have the following installed on your laptop:
- R (version >= 3.3.0)
- Rstudio (version >= 0.99.902)
In addition, install the following packages by starting R and running the following line:
install.packages(“car”, “effects”, “nnet”, “party”, “rgl”, “rms”)
Workshop #2 (Parts 1 and 2): Using the BYU Corpora: From Newbie to Geeky
Mark Davies, Brigham Young University
Thursday, September 15
2:30 pm – 5:30 pm
Location: Room 212, Ross Hall
There are two parts to the workshop, and participants are welcome to come to just one or both parts. The first half will deal with the basics of the BYU corpora — frequency, concordances, collocates, word comparisons, limiting by and comparing sections (e.g. historical, dialectal, and genres), and searches involving wildcards, part of speech, lemmas, and synonyms. The second half will deal with more advanced features — customized wordlists, saving and re-using searches, “fuzzy searches”, the new (May 2016) interface, and the (new) virtual corpora. In the second half we’ll also look at WordAndPhrase.info, which provides a more user-friendly interface for the COCA data, and which allows you to analyze entire texts.
Note: Please make sure that you are a registered user of the BYU corpora by September 1, 2016. If you do not already have a user account, you can register at http://corpus.byu.edu/profile_new.asp
Workshop #3: Regular Expressions for Advanced Corpus Queries
Jesse Egbert, Northern Arizona University
Thursday, September 15
6:00 pm – 9:00 pm
Location: Room 120, Ross Hall
The purpose of this workshop is to give participants a basic introduction to regular expressions–special character strings used for matching advanced patterns in texts. In other words, regular expressions are wildcards on steroids. Thus, they provide an extremely powerful solution to many of the problems corpus researchers experience when trying to identify and quantify linguistic patterns in corpora. Regular expressions can also provide a natural gateway into programming for researchers interested in more advanced corpus analysis. The workshop will comprise three parts. First, I will introduce regular expressions, their usefulness for corpus analysis, and their limitations. Second, participants will practice using regular expressions in AntConc to find several linguistic patterns in tagged and untagged corpora. Finally, participants will learn how to analyze the accuracy of their regular expressions (in terms of precision and recall).