Statistical analysis of Linguistic data
This is a special topics course on linguistic data.  More and more data
these days have linguistic content--so this class will investigate what
it takes to drop such linguistic data into a statistical model.
The main readings will be:
Homework:
I've posted the homework from pervious years (.pdf,  .Rnw and for a quick .html
 view).  I'll update this file as the semester goes along.  So
don't print it out!  Just keep checking the web.
  -  Homework 1 is due Sept 24th.
 
Schedule:
  -  Sept 10: Regular expressions (.pdf)
    
  
 -  Sept 12: Ngrams (.pdf)
     
     -  N-grams (Chapter 4 of JM)
     
 
   -  Sept 13 at noon: Justin Rising and Josh Magarick are running a session called
"Python for Statisticians". Lunch will be served!
 
  -  Oct 1: Backoff and information theory.
  
 -  Oct 3:
 
   -  Oct 29: No class: rain day
   
 -  Oct 31: Streaming methods
     
 
  -  Nov 5: The power of large blocks
    
  
 -  Nov 7: Parsing
    
    -  .pdf
    
 -  read chapters 13 and 14 of JM.
    
 
 
   -  Nov 12:Statistical parsing
     
     
 - Nov 14: notes
 
   -  Nov 26: Machine traslation (chapter 25)
    
   
 -  Nov 28: CCA
    
    -  slides for today's lecture
    
 -  paper with
Sham
    
 -   CCA goes back to the 1930's, so there should be pleanty of
web material to look over.  I won't put it up.  But if you find
something nice, email it to me and I'll post it.
    
 
 
  -  Dec 3: Disambiguation
    
  
 -  Dec 5:  Hadamard transformations?
 
dean.foster@gmail.com
Last modified: Wed Dec  5 14:23:59 EST 2012