Lecture 7 part 2 – Dependency

(1)

Statistical Methods for NLP LT 2202

Lecture 7 part 2 – Dependency

February 25, 2014 Richard Johansson

Lecture 7 part 2 – Dependency

parsing

(2)

Syntactic parsing

• Given a sentence, compute a syntactic tree representing the grammatical

analysis of the sentence

–Constituent / phrase structure trees

– Sentence analyzed in terms of nested phrases (S, NP, VP, PP, ...)

–Dependency trees

– Sentence analyzed in terms of relations between words (subject, object, adverbial, ...)

–Feature structure grammars (HPSG, LFG)

(3)

Probabilistic context-free grammars

• Used for phrase structure trees

• Production rules (S VP NP) are assumed to be dependent only on parent phrase type (S) be dependent only on parent phrase type (S)

• Probability of tree is equal to product of the probabilities of rules used to generate the tree

• Parsing = finding tree with highest probability

–CKY algorithm (recall from the NLP course) –Cubic time

(4)

Example

P(S NP VP) = 0.7 P(NP A N) = 0.1 P(VP V NP) = 0.4 P(VP V NP) = 0.4 P(NP N) = 0.34

P(A ”nice”) = 0.0001 ...

Probability:

(5)

Dependency syntax

• In dependency syntax, syntactic relations (arcs, edges, links, ...) are annotated

between words

• The grammatically dominant word is called the head and the subordinated word the dependent

• Dependency arcs normally form a tree

• Arcs may be labeled (subject, object, ...)

(6)

Dependency tree example

• ”in” is the head of ”Rome”

• ”saw” is the root of the dependency tree

(7)

Example: the Språkbanken annotation lab

•http://spraakbanken.gu.se/korp/

annoteringslabb

(8)

Approaches to dependency parsing

• Incremental algorithms

–Process words one by one

–Note the similarity to human sentence processing

processing

–Usually very fast: linear time

• Global scoring algorithms

–Compute a score for the whole tree –Example: PCFG

–Often solved using graph search

–Usually slower than incremental algorithms

(9)

The Nivre algorithm

• The Nivre algorithm is an example of an incremental dependency parsing algorithm

– J. Nivre: “Algorithms for Deterministic Incremental Dependency Parsing” (2008) ,and other papers

Dependency Parsing” (2008) ,and other papers – Implemented in MaltParser

• Basic idea:

–Proceed through the sentence word by word –Keep track of substructures using a stack

–Terminate when all words have been processed

(10)

Example

(11)

The Nivre algorithm (standard version)

• Initialization:

–Make an empty stack

–Put all words in a work queue

• Four operations:

–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack

–Left-arc: add an arc from F toT; remove T from stack

–Right-arc: add an arc from T to F; move F to top of stack –Reduce: remove T from stack

(12)

Preconditions

• Four operations:

–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack

– Must be something left in the queue

–Left-arc: add an arc from F toT; remove T from stack

– Queue and stack non-empty; there is no arc to T

–Right-arc: add an arc from T to F; move F to top of stack

– Queue and stack non-empty

–Reduce: remove T from stack

– Queue non-empty; there must exist an arc to T

(13)

Example (1)

(14)

Example (1)

• Next step: SHIFT

(15)

Example (2)

(16)

Example (2)

• Next step: LEFT ARC

(17)

Example (3)

(18)

Example (3)

• Next step: SHIFT

(19)

Example (4)

(20)

Example (4)

• Next step: RIGHT ARC

(21)

Example (5)

(22)

Example (5)

• Next step: REDUCE

(23)

Example (6)

(24)

Example (6)

(25)

Example (7)

(26)

Example (7)

(27)

Example (8)

(28)

Example (8)

• Input queue is empty. Done!

(29)

Incremental parsing by classification

• When we are given a new sentence, how do we select the sequence of actions?

• We use a statistical classifier, for instance

• We use a statistical classifier, for instance Naive Bayes

• Classification problem: given a stack and a queue, predict the action

(30)

Reminder: Supervised machine learning

(31)

Making the training set

• For each sentence, process the words using the parsing algorithm

• In each state, peek at the human-

• In each state, peek at the human- annotated tree to select the action

–If the tree contains T F, then Right-arc –If the tree contains F T, then Left-arc

–If there is some word W in the stack and the tree contains W F, then REDUCE

–Otherwise SHIFT

(32)

Example: computing the correct action

(33)

Example: actions generated for a

sentence

(34)

Possible features

• The classifier will use various features to be able to guess the correct action:

–POS of T –POS of F

–POS of word after F –POS of word under T –Words, ...

• Example: Noun on stack, verb in queue:

(35)

Evaluating dependency parsers

• Dependency parsers are evaluated using attachment accuracy

–Probability of a word having the correct head –Estimation: Number of correct attachments / –Estimation: Number of correct attachments /

number of words in the corpus

–State of the art: 92-94% for English

• Also: labeled attachment accuracy

–Probability of a word having the correct head and function tag

• Depend on complexity of linguistic

(36)

VG assignment 1

• Implement the Nivre algorithm

• Train the action classifier using a treebank

treebank

• Evaluate the dependency parser on a test set