Statistical Methods for NLP LT 2202
Lecture 7 part 2 – Dependency
February 25, 2014 Richard Johansson
Lecture 7 part 2 – Dependency
parsing
Syntactic parsing
• Given a sentence, compute a syntactic tree representing the grammatical
analysis of the sentence
–Constituent / phrase structure trees
– Sentence analyzed in terms of nested phrases (S, NP, VP, PP, ...)
–Dependency trees
– Sentence analyzed in terms of relations between words (subject, object, adverbial, ...)
–Feature structure grammars (HPSG, LFG)
Probabilistic context-free grammars
• Used for phrase structure trees
• Production rules (S VP NP) are assumed to be dependent only on parent phrase type (S) be dependent only on parent phrase type (S)
• Probability of tree is equal to product of the probabilities of rules used to generate the tree
• Parsing = finding tree with highest probability
–CKY algorithm (recall from the NLP course) –Cubic time
Example
P(S NP VP) = 0.7 P(NP A N) = 0.1 P(VP V NP) = 0.4 P(VP V NP) = 0.4 P(NP N) = 0.34
P(A ”nice”) = 0.0001 ...
Probability:
Dependency syntax
• In dependency syntax, syntactic relations (arcs, edges, links, ...) are annotated
between words
• The grammatically dominant word is called the head and the subordinated word the dependent
• Dependency arcs normally form a tree
• Arcs may be labeled (subject, object, ...)
Dependency tree example
• ”in” is the head of ”Rome”
• ”saw” is the root of the dependency tree
Example: the Språkbanken annotation lab
•http://spraakbanken.gu.se/korp/
annoteringslabb
Approaches to dependency parsing
• Incremental algorithms
–Process words one by one
–Note the similarity to human sentence processing
processing
–Usually very fast: linear time
• Global scoring algorithms
–Compute a score for the whole tree –Example: PCFG
–Often solved using graph search
–Usually slower than incremental algorithms
The Nivre algorithm
• The Nivre algorithm is an example of an incremental dependency parsing algorithm
– J. Nivre: “Algorithms for Deterministic Incremental Dependency Parsing” (2008) ,and other papers
Dependency Parsing” (2008) ,and other papers – Implemented in MaltParser
• Basic idea:
–Proceed through the sentence word by word –Keep track of substructures using a stack
–Terminate when all words have been processed
Example
The Nivre algorithm (standard version)
• Initialization:
–Make an empty stack
–Put all words in a work queue
• Four operations:
–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack
–Left-arc: add an arc from F toT; remove T from stack
–Right-arc: add an arc from T to F; move F to top of stack –Reduce: remove T from stack
Preconditions
• Four operations:
–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack
– Must be something left in the queue
–Left-arc: add an arc from F toT; remove T from stack
– Queue and stack non-empty; there is no arc to T
–Right-arc: add an arc from T to F; move F to top of stack
– Queue and stack non-empty
–Reduce: remove T from stack
– Queue non-empty; there must exist an arc to T
Example (1)
Example (1)
• Next step: SHIFT
Example (2)
Example (2)
• Next step: LEFT ARC
Example (3)
Example (3)
• Next step: SHIFT
Example (4)
Example (4)
• Next step: RIGHT ARC
Example (5)
Example (5)
• Next step: REDUCE
Example (6)
Example (6)
• Next step: RIGHT ARC
Example (7)
Example (7)
• Next step: RIGHT ARC
Example (8)
Example (8)
• Input queue is empty. Done!
Incremental parsing by classification
• When we are given a new sentence, how do we select the sequence of actions?
• We use a statistical classifier, for instance
• We use a statistical classifier, for instance Naive Bayes
• Classification problem: given a stack and a queue, predict the action
Reminder: Supervised machine learning
Making the training set
• For each sentence, process the words using the parsing algorithm
• In each state, peek at the human-
• In each state, peek at the human- annotated tree to select the action
–If the tree contains T F, then Right-arc –If the tree contains F T, then Left-arc
–If there is some word W in the stack and the tree contains W F, then REDUCE
–Otherwise SHIFT
Example: computing the correct action
Example: actions generated for a
sentence
Possible features
• The classifier will use various features to be able to guess the correct action:
–POS of T –POS of F
–POS of word after F –POS of word under T –Words, ...
• Example: Noun on stack, verb in queue:
Evaluating dependency parsers
• Dependency parsers are evaluated using attachment accuracy
–Probability of a word having the correct head –Estimation: Number of correct attachments / –Estimation: Number of correct attachments /
number of words in the corpus
–State of the art: 92-94% for English
• Also: labeled attachment accuracy
–Probability of a word having the correct head and function tag
• Depend on complexity of linguistic
VG assignment 1
• Implement the Nivre algorithm
• Train the action classifier using a treebank
treebank
• Evaluate the dependency parser on a test set