Linguistics 580 (E,F)

R in Linguistic Analysis

Autumn 2012

 

 

Course Info:

 

Instructor Info:

 

Links:

 

Course overview:

Description

 

This is a graduate-level course intended primarily for students of linguistics with little-to-no computing background who wish to discover the value of using R software for linguistic analysis. It's particularly designed for students who want a bit more hand-holding when it comes to programming.  Assigned work is simple, with the intention of building a level of confidence in using R that can be scaled up over time as students work on their own projects independently.

 

In all subfields of linguistics, we find practitioners who wish to be able to work with bodies of data. Though these datasets are of different types and amounts, there are certain tasks we may all wish to accomplish.  We gather language data, then we have to make choices regarding how to represent what we've gathered (in transcribed form, summary numerical form, or as the theoretical structural units that the tokens or cases represent, to name just a few), count things, order things, test hypotheses, or find patterns.   Alternatively, we might wish to discover how untrained speaker-listeners respond to a dataset we have created. In such cases, we might need to know what a subject's acceptance judgements are, or determine what phone, word, or phrase they perceive we have presented to them.  Ultimately, we want to know how our findings address a particular research question. R is a tool that can aid in a number of these tasks.

 

The syllabus is arranged into General weeks, and Subfield Focus weeks.  General weeks - During the first few weeks of the quarter, we will learn about basic R functions that are of general utility to linguists, regardless of subfield.  Subfield Focus weeks - We will then move on to locating and working with supplemental packages (libraries) that are relevant to our particular interests.   The two required texts we will use supply content for specific subfields. We will use this material for the subfield focus weeks, as there is interest. This means that if we have no practitioners of some of the subfields represented in the syllabus, we may choose to remove this material and expand the time allotted to the subfields represented, so the content is of greatest utility to the class members present. However, we aim to include a mix of subdisciplines and project types, to demonstrate the range and utility of R's functions.  Students will be responsible for the subfield focus week of their choosing.

 

The best way to learn and understand R is to use it.  But ultimately, linguists are not well-served by using resources about R written to teach practitioners of educational psychology, sociology or...[insert other field here]. Such texts use examples and data from foreign fields, and linguists are left attempting to adapt ideas to their own research.  The approach we will take is to use materials prepared by linguists who use R for linguistic analysis. We will gain lots of hands-on experience in using R to analyze linguistic data.  This course is not intended to provide comprehensive training in statistics in any way.  However, because many linguists must use descriptive and inferential statistical methods in linguistic work, and because R is quickly becoming many linguists' preferred software for statistical analysis, we will incorporate some basic statistics into our syllabus.  Students are urged to acquire more thorough training in statistics in other classes (see the instructor for recommendations).

 

Course goals

 

By the end of this course students will:

 

... (your additions here)

 

 

Note:

 

You don't have to have taken Computational Methods in Linguistics (CML: LING580-Bender/Wassink, SPR12) to be able to grasp course content.  This seminar is intended as a complement to that course.  It is pitched at a basic level, and keeps a slow pace appropriate for non-programmers wishing a very specific, focused introduction to R. CML covered a broad range of computational topics. This course is just about R.  For those who did take CML, the work we do will in this course covers the R versions of some of the functions we learned in Python, although it assumes no familiarity with Python. We may complete the "R sandbox" projects that were introduced in CML, but which were not done for lack of time. We will go beyond these projects to learn other basic functions in R.

 

Prerequisites:

 

Required: LING400 (or equivalent)

Recommended:  Introductory coursework in statistics is recommended. No programming background necessary.

Please contact the instructor if you are not certain whether you satisfy course prerequisites.

 

 

Course calendar

(note: this outline is subject to change, depending on size and skill level of the class)

It will be clear that some of the topics listed under certain subfields will also be relevant to others! (That's kind of the point!)

 

Date (click to see lecture slides)

Topic

Reading(s) to be discussed this week

Due Friday

General Focus Weeks:

Week 1 (M 9/24)

Introduction

What can linguists do with R?

Foundations: Installing and Running R; the R interface

Gries 3 (through 3.5)

Gries Exercise 3.5

Week 2 (M 10/1)

Workflow of a Basic R Session

Setting the working directory; Importing data from an external source; Data Structures, Functions and Arguments

Gries 3 (3.6-end)

Gries Exercise 3.6

Week 3 (M 10/8)

Working with Strings (Character Processing)  Trimming strings;  Converting to upper/lower case; Replacing strings;  Regular expressions;  Randomization

finish Gries 3.6, begin 4

Research Questions for presentation/term project

Week 4 (M 10/15)

Frequency counts; Concordancing

Best practices in scripting

Facilitator: Meghan

Gries 4

Gries Exercise 4.2

Progress reports on R package search

Week 5 (M 10/22)

Basic Statistics 

Descriptive statistics; Interaction plots; Mosaic and box plots; Scatterplots

Johnson 1,2 & Gries 5

Reproduce the graphs on Johnson pp. 10-15 (Figs. 1.2, 1.3, 1.5, 1.6)

Subfield Focus Weeks:

Week 6 (M 10/29)

R for phoneticians and experimental phonologists

vowels.R; Equivalence of means; Multiple regression; Principal components analysis I

Johnson, ch 3; vowels.pdf (Tyler Kendall's manual for vowels.R)

Johnson Exercises 3.3, 3.5

Week 7 (M 11/5)

R for sociolinguists

Contingency tables (the χ2 distribution);

Discussion of datasets for final projects

Facilitator: John

Johnson 5

Week 8 (M 11/12)

no class: Veterans' Day

Reproduce the logistic model of the dative alternation explained in the "R note" on p. 251

Week 9 (M 11/19)

R for syntacticians and semanticists

Logistic regression

Facilitator: Nigel

Exploring the Maptools package

Facilitator: Meghan

Johnson 7, maptools script (download Meghan's zip file)

(none, Happy Thanksgiving!)

Week 10 (M 11/26)

A little more statistics: Clustering and Classification

Supervised and unsupervised methods for finding patterns in multivariate datasets; Principal components analysis II, Hierarchical cluster analysis, Discriminant analysis

Baayen 5

Final List of Functions and Scripts to be used for final projects

Week 11 (M 12/3)

Final Project Previews

Final Projects Due: 6:30 p.m.

Thursday, Dec. 13, 2012 (in Canvas)

 

Readings

(recommended)  Baayen, R. H. (2008) Analyzing Linguistic Data: a practical introduction to Statistics using R

 

(required) Gries, S. Th. (2009) Quantitative Corpus Linguistics With R: a practical introduction. London: Routledge

 

(required) Johnson, K. (2008)  Quantitative Methods in Linguistics. Malden, MA: Blackwell

 

(required freeware) R Development Core Team (2008), R software package. Accessible from CRAN website <http://cran.at.r-project.org/>

 

 

Remote access

 

I have been asked whether students can take the course for credit if they will not be physically on campus to attend meetings. Students with special situations may request permission to take the course as an off-site student. But you must have instructor permission to do this.  We have a Tegrity account that will allow us to record the weekly meetings for later streaming. If you are in the field, conducting research, for example, you may be able to enroll in the seminar.  Your course participation will be slightly different than that of in-class participants.  Note that you MUST fulfill the alternate course requirements:

 

                 - you must obtain instructor permission to be an off-site class member prior to enrolling.

                 - reading questions must still be submitted, on-time, prior to in-person class meetings.

                 - on-time submission of the term project is required at the scheduled time during finals week.

                 - you will still make a presentation during the weeks when individual presentations are scheduled to occur in class. You will prepare and send materials for the day you lead class.  You will work with the instructor well in advance to plan your presentation, and select your data and R package(s).  Your presentation should be submitted as a powerpoint presentation or pdf with sufficient detail that we may read the slides and grasp the content. Note that you are encouraged (but not required) to audio-record your voice making the presentation to go along with your slides (using Praat, Audacity, PowerPoint's record presentation feature , Keynote's "Record Slideshow" feature, etc.). If you aren't using software that integrates audio with the slides, you will be expected to verbally indicate when you are advancing slides.  Send the slides and the recording (if applicable) to the instructor together for feedback and revision. This audio-recording is not required, but would be really helpful.

                 - your slides MUST be submitted for review and comment to the instructor 48 hours prior to your presentation date.

                 - the instructor will review your slides and provide you with feedback.  Any requested changes to the slides must be incorporated prior to the presentation.

                 - after your presentation, the class will post questions to the Canvas area for you. 

                 - you MUST respond to classmates' questions in the Canvas area within 2 days of your presentation

 

Disability Accommodations

 

To request academic accommodations due to a disability, please contact Disabled Student Services, 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from Disabled Student Services indicating that you have a disability which requires academic accommodations, please present the letter to the instructor so we can discuss the accommodations you might need in this class. 

 

 

Course requirements

 

1. Reading questions (5%)

 

2. Weekly exercises (20%)

Each week, we will complete one or two exercises from the text read for that Monday's class.  Exercises are listed in the syllabus.  The exercises will be due by Midnight (Pacific Time) Friday of the same week.  This gives you a week to continue to digest what we read and discussed on Monday.  So, for example, the exercises in Gries are mentioned in the text as you read. But to prepare for class, you're not required to do all the exercises listed while you read unless you wish to (which will, of course, help you learn).  All you must do is read the chapter.  However, you ARE strongly encouraged to read with your R installation open, so you may try out the code mentioned in the body of the chapter (in gray boxes in both the Gries and Johnson texts). You have the rest of the week to do the exercise(s) assigned for that chapter.

 

3. Class leading (25%)  -- days listed as "facilitated" on the syllabus.  You MUST meet with the instructor in advance of your date to let me know what you plan to cover in class (office hours preferred).

 

Each student will choose a class period to lead.  This will typically be a Subfield focus week. But if class size is large, this may also include General weeks.  There are two ways class-leading can be structured:

 

 

For off-site class members: Your presentation will be delivered remotely (as described in Remote Access, above)

 

4. Term Project (50%)