Presenter:  Simon Dennis
Presentation type:  Symposium
Presentation date/time:  7/28  10:30-10:55
 
A syntagmatic approach to syntactic representation: Extracting dependency and constituency information from corpora
 
Simon Dennis, University of Adelaide
 
In the linguistic literature, a distinction can be drawn between dependency grammars, which assume that syntax involves capturing the relationships between individual words in a sentence, and constituency grammars, which assume that syntax involves capturing the (hierarchical) relationships between contiguous sequences of words in a sentence. By creating the matrix of syntagmatic relationships between words in a sentence (i.e. which words follow which words) both dependency and constituency units are exposed. Dependency units correspond to the rows and columns corresponding to an individual word, and constituency units correspond to contiguous triangles. Compiling the matrices corresponding to the sentences of a corpus creates a third order tensor to which machine learning algorithms can be applied to extract these syntactic units. In this talk, I will contrast the units extracted by nonnegative matrix factorization and three versions of sparse independent components analysis. The units produced will be compared against those proposed by standard phrase structure grammar (Radford, 1988) and link grammar (Sleator & Temperly, 1993) as examples of constituency and dependency grammars, respectively.