• No results found

Lecture 7 part 2 – Dependency

N/A
N/A
Protected

Academic year: 2022

Share "Lecture 7 part 2 – Dependency "

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Statistical Methods for NLP LT 2202

Lecture 7 part 2 – Dependency

February 25, 2014 Richard Johansson

Lecture 7 part 2 – Dependency

parsing

(2)

Syntactic parsing

• Given a sentence, compute a syntactic tree representing the grammatical

analysis of the sentence

–Constituent / phrase structure trees

– Sentence analyzed in terms of nested phrases (S, NP, VP, PP, ...)

–Dependency trees

– Sentence analyzed in terms of relations between words (subject, object, adverbial, ...)

–Feature structure grammars (HPSG, LFG)

(3)

Probabilistic context-free grammars

• Used for phrase structure trees

• Production rules (S  VP NP) are assumed to be dependent only on parent phrase type (S) be dependent only on parent phrase type (S)

• Probability of tree is equal to product of the probabilities of rules used to generate the tree

• Parsing = finding tree with highest probability

–CKY algorithm (recall from the NLP course) –Cubic time

(4)

Example

P(S  NP VP) = 0.7 P(NP  A N) = 0.1 P(VP  V NP) = 0.4 P(VP  V NP) = 0.4 P(NP  N) = 0.34

P(A  ”nice”) = 0.0001 ...

Probability:

(5)

Dependency syntax

• In dependency syntax, syntactic relations (arcs, edges, links, ...) are annotated

between words

• The grammatically dominant word is called the head and the subordinated word the dependent

• Dependency arcs normally form a tree

• Arcs may be labeled (subject, object, ...)

(6)

Dependency tree example

• ”in” is the head of ”Rome”

• ”saw” is the root of the dependency tree

(7)

Example: the Språkbanken annotation lab

•http://spraakbanken.gu.se/korp/

annoteringslabb

(8)

Approaches to dependency parsing

• Incremental algorithms

–Process words one by one

–Note the similarity to human sentence processing

processing

–Usually very fast: linear time

• Global scoring algorithms

–Compute a score for the whole tree –Example: PCFG

–Often solved using graph search

–Usually slower than incremental algorithms

(9)

The Nivre algorithm

• The Nivre algorithm is an example of an incremental dependency parsing algorithm

– J. Nivre: “Algorithms for Deterministic Incremental Dependency Parsing” (2008) ,and other papers

Dependency Parsing” (2008) ,and other papers – Implemented in MaltParser

• Basic idea:

–Proceed through the sentence word by word –Keep track of substructures using a stack

–Terminate when all words have been processed

(10)

Example

(11)

The Nivre algorithm (standard version)

• Initialization:

–Make an empty stack

–Put all words in a work queue

• Four operations:

–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack

–Left-arc: add an arc from F toT; remove T from stack

–Right-arc: add an arc from T to F; move F to top of stack –Reduce: remove T from stack

(12)

Preconditions

• Four operations:

–(Assume top of stack is T and first word in queue is F) –Shift: move F to top of stack

– Must be something left in the queue

–Left-arc: add an arc from F toT; remove T from stack

– Queue and stack non-empty; there is no arc to T

–Right-arc: add an arc from T to F; move F to top of stack

– Queue and stack non-empty

–Reduce: remove T from stack

– Queue non-empty; there must exist an arc to T

(13)

Example (1)

(14)

Example (1)

• Next step: SHIFT

(15)

Example (2)

(16)

Example (2)

• Next step: LEFT ARC

(17)

Example (3)

(18)

Example (3)

• Next step: SHIFT

(19)

Example (4)

(20)

Example (4)

• Next step: RIGHT ARC

(21)

Example (5)

(22)

Example (5)

• Next step: REDUCE

(23)

Example (6)

(24)

Example (6)

• Next step: RIGHT ARC

(25)

Example (7)

(26)

Example (7)

• Next step: RIGHT ARC

(27)

Example (8)

(28)

Example (8)

• Input queue is empty. Done!

(29)

Incremental parsing by classification

• When we are given a new sentence, how do we select the sequence of actions?

• We use a statistical classifier, for instance

• We use a statistical classifier, for instance Naive Bayes

• Classification problem: given a stack and a queue, predict the action

(30)

Reminder: Supervised machine learning

(31)

Making the training set

• For each sentence, process the words using the parsing algorithm

• In each state, peek at the human-

• In each state, peek at the human- annotated tree to select the action

–If the tree contains T  F, then Right-arc –If the tree contains F  T, then Left-arc

–If there is some word W in the stack and the tree contains W  F, then REDUCE

–Otherwise SHIFT

(32)

Example: computing the correct action

(33)

Example: actions generated for a

sentence

(34)

Possible features

• The classifier will use various features to be able to guess the correct action:

–POS of T –POS of F

–POS of word after F –POS of word under T –Words, ...

• Example: Noun on stack, verb in queue:

(35)

Evaluating dependency parsers

• Dependency parsers are evaluated using attachment accuracy

–Probability of a word having the correct head –Estimation: Number of correct attachments / –Estimation: Number of correct attachments /

number of words in the corpus

–State of the art: 92-94% for English

• Also: labeled attachment accuracy

–Probability of a word having the correct head and function tag

• Depend on complexity of linguistic

(36)

VG assignment 1

• Implement the Nivre algorithm

• Train the action classifier using a treebank

treebank

• Evaluate the dependency parser on a test set

References

Related documents

The number of transcripts assembled by trinity and the sequences deemed as good by transrate is presented together with the fraction of complete, fragmented and missing BUSCO

However, the board of the furniture company doubts that the claim of the airline regarding its punctuality is correct and asks its employees to register, during the coming month,

The reader should bear in mind that the aim of this study is not to investigate which grammatical categories each subject struggles with but to find out the

Disease resistance estimates based on linkage mapping studies had the lowest median of four underlying effects, while growth traits based on association mapping had about 580

The test result as displayed in figure 4.5 shows that a control strategy that continuously assigns landing calls such as Continuous Allocation can reduce the destination time

We find that empirically random maps appear to model the number of periodic points of quadratic maps well, and moreover prove that the number of periodic points of random maps

We investigate the number of periodic points of certain discrete quadratic maps modulo prime numbers.. We do so by first exploring previously known results for two particular

Although the first spelling is more widely used, both spellings are correct?. bargain