Dynamic Programming Algorithms for Semantic Dependency Parsing

(1)

Linköpings universitet

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-A--17/014--SE

Dynamic Programming

Algorithms for Semantic

Dependency Parsing

Algoritmer för semantisk dependensparsning baserade på

dynamisk programmering

Nils Axelsson

Supervisor : Marco Kuhlmann Examiner : Arne Jönsson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Dependency parsing can be a useful tool to allow computers to parse text. In 2015, Kuhlmann and Jonsson proposed a logical deduction system that parsed to non-crossing dependency graphs with an asymptotic time complexity of O(n3), where “n” is the length of the sentence to parse. This thesis extends the deduction system by Kuhlmann and Jonsson; the extended deduction system introduces certain crossing edges, while maintaining an asymptotic time complexity ofO(n4). In order to extend the deduction system by Kuhlmann and Jonsson, fifteen logical item types are added to the five pro-posed by Kuhlmann and Jonsson. These item types allow the deduction system to intro-duce crossing edges while acyclicity can be guaranteed. The number of inference rules in the deduction system is increased from the 19 proposed by Kuhlmann and Jonsson to 172, mainly because of the larger number of combinations of the 20 item types.

The results are a modest increase in coverage on test data (by roughly 10% absolutely, i.e. approx. 70% Ñ 80%), and a comparable placement to that of Kuhlmann and Jonsson by the SemEval 2015 task 18 metrics. By the method employed to introduce crossing edges, derivational uniqueness is impossible to maintain. It is hard to defien the graph class to which the extended algorithm, QAC, parses, and it is therefore empirically compared to 1-endpoint crossing and graphs with a page number of two or less, compared to which it achieves lower coverage on test data. The QAC graph class is not limited by page number or crossings.

The takeaway of the thesis is that extending a very minimal deduction system is not necessarily the best approach, and that it may be better to start off with a strong idea of to which graph class the extended algorithm should parse. Additionally, several alternative ways of extending Kuhlmann and Jonsson are proposed.

Dependensparsning kan vara ett användbart verktyg för att få datorer att kunna läsa text. Kuhlmann och Jonsson kom 2015 fram till ett logiskt deduktionssystem som kan parsa till ickekorsande grafer med en asymptotisk tidskomplexitetO(n3), där n är meningens som parsas längd. Detta arbete utökar Kuhlmann och Jonssons deduktion-ssystem så att det kan introducera vissa korsande bågar, medan en asymptotisk tidskom-plexitet O(n4)uppnås. För att tillåta deduktionssystemet att introducera korsande bå-gar, introduceras 15 nya logiska delgrafstyper, eller item. Dessa item-typer tillåter deduk-tionssystemet att introducera korsande bågar på ett sådant sätt att acyklicitet bibehålls. Antalet logiska inferensregler tags från Kuhlmanns och Jonssons 19 till 172, på grund av den större mängden kombinationer av de nu 20 item-typerna.

Resultatet är en mindre ökning av täckning på testdata (ungefär 10% absolut, d v s cirka 70% Ñ 80%), och jämförbar placering med Kuhlmann och Jonsson enligt måtten från uppgift 18 från SemEval 2015. Härledningsunikhet kan inte garanteras på grund av hur bågar introduceras i det nya deduktionssystemet. Den utökade algoritmen, QAC, parsar till en svårdefinierad grafklass, som jämförs empiriskt med 1-endpoint-crossing-grafer och grafer med pagenumber 2 eller mindre. QAC:s grafklass har lägre täckning än båda dessa, och har ingen högre gräns i pagenumber eller antal korsningar.

Slutsatsen är att det inte nödvändigtvis är optimalt att utöka ett mycket minimalt och specifikt deduktionssystem, och att det kan vara bättre att inleda processen med en specifik grafklass i åtanke. Dessutom föreslås flera alternativa metoder för att utöka Kuhlmann och Jonsson.

Keywords: Semantic dependency parsing Machine learning Deduction system Crossing edges SemEval 2015

(4)

Acknowledgments

Professional acknowledgements I would like to thank Marco Kuhlmann, without whose help no part of this thesis project would have been possible. I would also like to thank Ola Leifler, whose preparatory course in how to write a Master’s thesis helped me much more than I realised at the time. I would also like to thank hundreds of StackOverflow contributors whose LateX examples helped make many of the tables and figures presented in this table clear enough to actually read.

The assignment tackled by this thesis would not exist without Marco Kuhlmann’s and Peter Jonsson’s Parsing to Noncrossing Dependency Graphs.

Personal acknowledgements Tack till alla vänner och familjemedlemmar som med sitt in-tresse och sitt engagemang drev mig att lägga den tiden som jag behövde lägga för att bli klar med detta projekt. Tack till alla personer vars namn jag använder i mina exempelmeningar. Tack till Tommy Larbrant för att du var min lärare i tyska när mitt språkintresse satte igång.

(5)

List of Figures

1.1 An example of a syntactic dependency graph for the sentence “Agnes ate the apple”. 2 1.2 An example of a semantic dependency graph for the sentence “If Agnes ate the

apple, then why was she still hungry?”. . . 2 2.1 Top: An example of a non-crossing dependency graph. This is graph #22169033

from the SemEval data set, using DM representation.

Bottom: A crossing graph, for comparison. This is graph #22151005 from the Se-mEval data set, also using DM representation. Specifically, the edge from says to not crosses the edge from ’s to all. . . 7 2.2 An arbitrary sentence with six words, covered by a dependency graph with four

edges. The dependency graph has two crossings: edges a2and a3cross, and so do edges a3and a4. . . 8 2.3 The graph from Figure 2.2 after the edge a3has been moved to a new page. The

resulting sub-graph on both pages is non-crossing, so the given graph must have a page number of 2, since there was no way to construct a graph with one page where there were no crossing edges. . . 9 2.4 The sentence “It’s one more for the baseball-loving lawyers, accountants and real estate

developers who ponied up about $1 million each for the chance to be an owner, to step into the shoes of a Gene Autry or have a beer with Rollie Fingers.” in DM (top), PAS (mid-dle) and PSD (bottom) representations. This relatively long sentence was chosen because of the large differences between representations. This is sentence #20214017. 10 2.5 The distribution of sentence lengths across the three SemEval data representations. 11 2.6 The distribution of graphs in DM, PSD and PAS representations by the page

num-ber of each graph. . . 12 2.7 Left: The distribution of sentence lengths for the CCG data set. Right: The

distri-bution of graphs in the CCG representation by the page number of each graph. . . 12 2.8 The sentence “The compromise was a somewhat softened version of what the White House

had said it would accept.” in CCG representation. This sentence was chosen because it is one of the 530 sentences of the CCG set that contains a cycle. The cycle consists of the edges accept Ñ what Ñ had Ñ said Ñ would Ñ accept. This is a linguisti-cally tricky case: what did the White House say it would accept? From the scope of this sentence, the only possible answer is “what the White House said it would accept”, which is a cyclic reference. This is sentence #20250004. . . 13 2.9 The five types of items introduced by Kuhlmann and Jonsson. . . 13 2.10 An attempt at illustrating how the inference rules from table 2.1 go together with

the item types to form a deduction system. . . 15 2.11 Sentence 22102004 from the SemEval 2014 dataset, DM representation. . . 18 2.12 Sentence 22151005 from the SemEval 2014 dataset, DM representation. . . 18

(8)

2.13 An illustration of the process of making the entire test set applicable for a given subset of dependency graphs. The parts of the test set that are not possible to generate by the given algorithm is separated. This separated part of the test set is then run through a process of normalisation to get the closest graphs that can be generated by the algorithm in question. The graphs that already conformed are then added with this normalised set to create the final test set. This final test set is only meaningful for the given algorithm. . . 26 2.14 Top: We bring back the example sentence from the introduction chapter and

gen-erate a labelled dependency graph for it. Ate is now the graph head, with an nsubj (nominal subject) relation to Agnes and an obj (direct object) relation to apple. The has a det (determiner) relationship to apple. Bottom: A theoretical incorrect graph generated by a parser. Two labels are wrong and one edge is wrong, leading to LAS and UAS scores of 1/3 and 2/3 respectively. This is a fictional example and not representative of any parser or algorithm; any parser that cannot generate this very simple graph would presumably not be up for consideration in any context. . 27 3.1 Left: Inference rule R02 from Kuhlmann and Jonsson.

Right: Inference rule R19 from Kuhlmann and Jonsson. . . 35 3.2 Left: Inference rule R09 from Kuhlmann and Jonsson.

Right: R09 from Kuhlmann and Jonsson is extended with a left-isolated consequent. 38 3.3 Left: Inference rule R11 from Kuhlmann and Jonsson.

Right: R11 from Kuhlmann and Jonsson is extended with a set of right-isolated consequents. . . 38 3.4 Left: Inference rule R15 from Kuhlmann and Jonsson.

Right: R15 from Kuhlmann and Jonsson is extended with a set of right-isolated consequents and a single isolated consequent, since the graph is double-isolated with respect to the point k ´ 1. . . 39 3.5 Left: Inference rule R01 from Kuhlmann and Jonsson.

Right: R01 from Kuhlmann and Jonsson is extended with a left-isolated premise. This results in a left-isolated consequent as well. . . 39 3.6 Left: Inference rule R01 from Kuhlmann and Jonsson.

Right: R01 from Kuhlmann and Jonsson is extended with a right-isolated premise. This results in a right-isolated consequent as well. . . 39 3.7 Left: Inference rule R01 from Kuhlmann and Jonsson.

Middle and right: R01 from Kuhlmann and Jonsson is extended with both double-isolated premises. This results in a double-double-isolated consequent in each case. . . 40 3.8 Left: Inference rule R11 from Kuhlmann and Jonsson.

Middle: R11 from Kuhlmann and Jonsson is extended with a left-isolated premise. The result is a double-isolated consequent with respect to the isolated point x. Right: R11 from Kuhlmann and Jonsson is extended with a double-isolated premise, resulting in a double-isolated consequent. . . 40 3.9 Inference rule R01 from Kuhlmann and Jonsson. . . 54 4.1 A class graph illustrating every graph of width four to which Kuhlmann and

Jons-son can parse. . . 57 4.2 A class graph illustrating every graph of width four to which QAC can parse. . . 58 4.3 The 335 non-crossing acyclic graphs of width four to which the algorithm by

Kuhlmann and Jonsson can parse. Each graph in this figure corresponds to a goal node in Figure 4.1, or a node which contains a graph from 1 to 4. . . 59 4.4 The graphs to which QAC can parse of size four, but to which Kuhlmann and

Jonsson cannot parse. These graphs are the non-transparent nodes found in Figure 4.2 which contain a graph from 1 to 4. QAC can also parse to each graph found in Figure 4.3, but they are omitted from this figure because of redundancy. . . 60

(9)

4.5 Inference rule Rcrossing´le f t´in04iNNfrom QAC. . . 60 4.6 The average time taken by QAC to oracle parse a sentence of length n divided by

the average time taken by Kuhlmann and Jonsson to parse a sentence of the same length. . . 62 4.7 Four figures illustrating the distribution of sentences across page numbers. Each

figure is labelled with the representation from the data set (DM, PAS, PSD and CCG) that generated the figure. . . 64 4.8 Four figures illustrating the distribution of sentences across page numbers. Each

figure is labelled with the data set (DM, PAS, PSD and CCG) that generated the figure. Each page number is split into a green bar and a red bar. The green bar represents sentences that QAC could parse, and the red bar represents sentences that QAC could not parse. . . 65 4.9 One figure per data representation. In each of the four figures, the coverage (y

axis) of QAC (blue) is compared to Kuhlmann and Jonsson (orange) for increas-ing sentence lengths (x axis). There is a clear correspondence between increasincreas-ing sentence length and lower coverage. The CCG figure is unclear because of a large empty span between the second longest sentence and the longest sentence, but the difference in coverage for the most common sentence lengths (20-40 words) is clear in all four figures. For higher sentence lengths, the number of graphs on which to calculate coverage gets lower, which explains the erratic jumps between 0% and 100% on the far right of the x axis in each figure. . . 66 4.10 One figure per data representation. In each of the four figures, the coverage (y

axis) of QAC (blue) is compared to Kuhlmann and Jonsson (orange) for increasing number of edges in the gold-standard graphs (x axis). Like Figure 4.9, there is a clear correspondence between an increasing number of edges and lower coverage. For higher numbers of edges, the number of graphs on which to calculate coverage gets lower, which explains the erratic jumps between 0% and 100% on the far right of the x axis in each figure. . . 67 4.11 An Euler diagram displaying how many graphs are in the page number two or

less, 1-endpoint crossing, and QAC graph classes for DM data in the data set. . . . 72 4.12 An Euler diagram displaying how many graphs are in the page number two or

less, 1-endpoint crossing, and QAC graph classes for PAS data in the data set. . . . 73 4.13 An Euler diagram displaying how many graphs are in the page number two or

less, 1-endpoint crossing, and QAC graph classes for PSD data in the data set. . . . 74 4.14 An Euler diagram displaying how many graphs are in the page number two or

less, 1-endpoint crossing, and QAC graph classes for CCG data in the data set. . . . 75 5.1 Inference rules R11-R15, as given by Kuhlmann and Jonsson. The inference rules

(10)

List of Tables

2.1 The nineteen inference rules in the deduction system by Kuhlmann and Jonsson. . 15 3.1 Each item by Kuhlmann and Jonsson, extended with a left-isolated version, a

right-isolated version and a double-isolated version. . . 32 3.2 Every crossing rule in the QAC deduction system. . . 37 3.3 Extended versions of R11-R15 that produce isolated items. . . 41 3.4 Extended versions of R11-R15 that take left-isolated and double-isolated items as

premises. The right-isolated versions are omitted, as the consequents are identical to the corresponding rule in table 3.3. . . 42 3.5 Extensions to every rule R16-R19, such that the extended rules produce and take

left-isolated, right-isolated and double-isolated items. . . 43 3.6 Rules R01-R10 from Kuhlmann and Jonsson are extended to take non-isolated,

left-isolated, right-isolated and double-isolated items as a left and right premise, as well as producing left-isolated, right-isolated and double-isolated items as con-squents. . . 43 3.7 All inference rules of the QAC deduction system. Each rule is given a unique name

describing its purpose. . . 53 4.1 Coverage of QAC and the algorithm by Kuhlmann and Jonsson for each graph

representation in the data set. . . 63 4.2 For each of the four representations, split by page number, how many graphs are

covered by QAC. This data is also presented graphically in Figure 4.8. . . 63 4.3 Results of QAC on DM data after one and five iterations of training. Results are

presented by the metrics used at SemEval 2015. . . 68 4.4 Results of QAC on PAS data after one and five iterations of training. Results are

presented by the metrics used at SemEval 2015. . . 69 4.5 Results of QAC on PSD data after one and five iterations of training. Results are

presented by the metrics used at SemEval 2015. . . 70 4.6 Overall results of QAC after one and five iterations of training. Results are

pre-sented by the metrics used at SemEval 2015. . . 70 5.1 A table presenting all possible isolated items if paths to the left and to the right

are separated. The diagonal corresponds to the isolated items that were defined as part of QAC. . . 81

(11)

1 Introduction

Dependency parsing is an important step in making computers understand what a text means. As human beings, we generally automatically decompose sentences we hear or read into grammatical structures, while handling ambiguities and unclear references based on context. It generally takes human beings years to learn to speak functionally to other hu-man beings. Computers do not have the social contexts we take years to learn. It is therefore understandable that the field of parsing sentences similarly to how humans do intuitively is difficult.

In this thesis, the focus is on semantic dependency parsing. Classically, dependency parsing meant syntactic dependency parsing, and the goal was to cover all words in a sentence with some sort of structure explaining how each word grammatically connected to the other words in the sentence. The semantic approach instead tries to highlight the meaning of the sentence as a whole, which may mean omitting some words from the graph structure, or building a more complicated graph structure than the syntactic approach would require.

Chapter 2 introduces necessary theory for a reader to understand the rest of the thesis. In Chapter 3, an extended parsing algorithm is presented. This extended parsing algorithm is used to answer the research questions presented below in section 1.3. Results that answer the research questions are presented in Chapter 4, and both the results and the algorithm (method) are discussed in Chapter 5. Finally, the research questions are answered to such an extend that they can be, and presented in Chapter 6.

1.1 Motivation

Early work in the field of representing grammar as a tree can be seen in Chomsky’s Syntactic Structures [2]. Dependency parsing is the field of representing grammatical structures as specifically dependency graphs [10].

Figure 1.1 shows an example of an unlabelled syntactic dependency graph for the sentence “Agnes ate the apple”. The graph happens to be a tree, which is a common restriction in syntactic dependency parsing. Whether this dependency graph is correct or not depends on what one is trying to represent. In this case, the edges (arrows) depict a subject-verb-object relation between “Agnes”, “ate” and “apple”, and the definite article is a separate relation in the same tree. In typical dependency parsing, the edges of the tree would be labelled somehow to make the relations clearer.

(12)

1.1. Motivation

Agnes

ate

the

apple

Figure 1.1: An example of a syntactic dependency graph for the sentence “Agnes ate the apple”.

If Agnes ate the apple , then why was she still hungry ?

ARG1 BV ARG1

ARG2 ARG2 ARG1

ARG1 ARG1 ARG1

Figure 1.2: An example of a semantic dependency graph for the sentence “If Agnes ate the apple, then why was she still hungry?”.

Semantic dependency parsing looks a little different. Figure 1.2 depicts a labelled seman-tic graph for the sentence “If Agnes ate the apple, then why was she still hungry?”. A slightly longer sentence than the one shown in Figure 1.1 is required to show the peculiarities of se-mantic parsing. In this example, the word “was” is entirely not part of the graph structure as it is not required: the “ARG1” edge from “hungry” to “she” carries the attributive informa-tion instead. In addiinforma-tion, the punctuainforma-tion tokens are also omitted.

The task of this thesis project was to improve the semantic parsing algorithm proposed by Kuhlmann and Jonsson

citeKuhlmann2015. The algorithm proposed by Kuhlmann and Jonsson parses to non-crossing graphs, like the one seen in Figure 1.2. Kuhlmann and Jonsson also show that parsing to the class of all graphs with a page number of two or less (see Section 2.2.5) is NP-hard. The task in this thesis project is to find extensions to the algorithm that parse to a better class than the non-crossing class, while maintaining reasonable time complexity. How good a graph class is is measured by its coverage on test data, among other things.

(13)

1.2. Aim

The algorithm by Kuhlmann and Jonsson has an asymptotic time complexity ofO(|V|3), where V is the set of nodes, or words, in the sentence. The coverage of various classes of graphs, or algorithms that parse into a certain class of graphs, can be tested on pre-tagged sentences.

The dataset used by Kuhlmann and Jonsson consists of two separate parts. Most of the dataset is the dataset that was also used in task 8 of SemEval 2014

citeOepen2014, as well as in task 18 of SemEval 2015

citeOepen2015. The same dataset that was used by Kuhlmann and Jonsson is also used in this thesis.

1.2 Aim

The aim of this thesis is to find dependency parsing algorithms. These algorithms must parse to classes of dependency graphs that cover more data in the test sets than the set of non-crossing dependency graphs. The time complexity of the algorithms must also be possible to show to be non-NP-complex, and preferably close to theO(|V|3)obtained by Kuhlmann and Jonsson.

Finding algorithms with better coverage than the class of crossing graphs and non-NP time complexity helps semantic dependency parsing applications generate more accurate semantic dependency graphs, which can improve all kinds of applications where a computer needs to parse text to extract the meaning of the text. One example of an application that is improved by enhanced semantic dependency parsing is machine translation.

Even if the explored algorithms and graph classes turn out to not have better performance than previously explored algorithm or class, the results can still be valuable. If the reason why the class performs worse, by some metric, can be explained as part of the thesis, then it is assumed that the results are of value to future research.

1.3 Research questions

The following research questions are explored in this thesis:

1. How can the algorithm by Kuhlmann and Jonsson be extended to include crossing graphs?

2. To what class of graphs do the proposed extensions to the algorithm parse?

3. What time complexity does the proposed extended algorithm obtain, compared to the

O(|V|3)obtained by the base algorithm?

4. If research question 2 is possible to answer, is the proposed extended algorithm sound, complete and unique?

5. How do the proposed extensions to the algorithm affect its coverage on test data, and how does this compare to other, unexplored graph classes?

1.4 Delimitations

All test data is in English. Dependency parsing in general is relatively language-specific. Different languages have grammar that requires different classes of dependency graphs to represent. For syntactic dependency parsing, the performance of new algorithms is generally tested on several languages, commonly Czech because of the availability of Czech test data as part of the Prague Czech-English Dependency Treebank. Such non-English test data was not available for this thesis project, which means it is impossible to prove that the results on English improve the algorithm’s performance on other languages.

(14)

1.4. Delimitations

It is impossible to explore all possible classes of dependency graphs, or all extensions to an algorithm. In addition, some of the more obvious extensions to the algorithm by Kuhlmann and Jonsson, like extending the graph class to classes similar to those proposed by Pitler et al. [19] or Pitler and McDonald [20], were considered too large to implement in the limited time given to a Master’s thesis, and could not be explored at all.

Because of time restrictions, all ideas that were encountered during the thesis project could not be explored. The best candidates for future research are documented in the discussion chapter, Chapter 5, under the method section.

(15)

2 Theory

Dependency parsing into classes of graphs is connected to graph theory and logic. This chap-ter attempts to introduce necessary theory to understand the thesis for a reader who has little previous experience in these specific areas.

The method employed in this thesis includes using a deduction system to generate a class of dependency graphs. As will be shown in this chapter, such a deduction system can then be analysed for time complexity (and space complexity, if needed), and the described algorithm can be implemented and run on test data to find out how well it performs.

2.1 Parsing

The word parsing itself can refer to several fields. The relevant fields for this thesis are syntactic parsing and semantic parsing, and specifically how they can be used to make computers parse text. Syntax is the role of each word in a sentence, while semantics refers to the meaning of words or a sequence of words.

Parsing a sentence can be done in many ways. Dependency parsing is the field of parsing sentences by creating structures depicting which words refer to other words [10, p. 1]. This thesis extends and evaluates a dependency parsing algorithm.

2.2 Dependency graphs

A dependency graph can be any graph that illustrates how anything depends on anything else. This thesis focusses on semantic dependency graphs, which are strictly connected to language parsing. A semantic dependency graph illustrates the semantic meaning of a sen-tence by creating a structure of dependencies, illustrating how certain words depend on other words in the sentence.

In this thesis, dependency graphs consist of nodes and edges. A node is a point to which an edge can connect. An edge is a directed arrow that starts in a node and ends in a node, usually another node. An edge from the node named a to the node b is denoted a Ñ b.

(16)

2.2. Dependency graphs

2.2.1 Definition of a semantic dependency graph

This thesis will use the definition by Oepen et al. [17] to define dependency graphs. The defi-nition by Oepen et al. is used because it is specific to semantic dependency graphs. There are many other definitions, notably that by Kübler et al. [10]. The differences are mostly superfi-cial: for example, many definitions and scientific papers call edges arcs, which is equivalent. The definition can be found in definition 1.

Definition 1. A semantic dependency graph G is defined as follows:

• G is the dependency graph; G= (V, E,`_V,`_E).

• x is the covered sentence such that x = x1, x2...xn, where each xi is the word at position i, 1 ď i ď n, in the sentence.

• V and E are the nodes and edges in the graph, respectively. The nodes are defined as having a one-to-one correspondence to the words in the covered sentence. The edges go from node to node, according to E Ď V ˆ V.

• `Vis a mapping that labels the nodes in V: for each node i,`V(i)gives the tagging information for the word i, such as part of speech, word form, etc.

• `_Eis a mapping that similarly labels the edges in E: for each edge e,`_E(e)gives the label for the edge e.

The semantic graph originally seen in Figure 1.2 can be explained by the definitions in definition 1. The sets according to definition 1 are, then:

1. G= (V, E,`V,`E)

2. x= x1x2x3x4x5x6x8x9x10x11x12and V =t1, 2, ..., 12u, covering the words in x. x1is If, x2is Agnes, and so on for each word in Figure 1.2.

3. E=t1 Ñ 7, 1 Ñ 3, 3 Ñ 2, 3 Ñ 5, 4 Ñ 5, 7 Ñ 12, 8 Ñ 12, 11 Ñ 12, 12 Ñ 10u. Labelling is handled separately through`E, so no labels are needed in this set.

4. `Vis a function that gives node tagging information for each node in V. For example, if the sentence has been tagged with word form, lemma and part of speech, the response

`V(3) =tate, eat, verb-indicative-preteriteu could be expected.

5. `E is a function that gives edge labelling information for each edge in E. An exam-ple of several labels seen in the figure would be`_E(1 Ñ 7) = ARG1, `_E(3 Ñ 5) =

ARG2, `E(4 Ñ 5) =BV.

2.2.2 Notations for dependency relations

Kübler et al. [10, pp. 13-14] use the following standard notations to depict relations within syntactic dependency graphs. They are useful for introducing restrictions on such graphs later on in this chapter, and the definitions are general enough to hold for semantic graphs as well.

• xi ÝÑ˚ xj means that there is a directed path from xi to xj. This holds either if xi is xj (that is, all nodes have a directed path to themselves, even if no such edge exists), or if xi ÝÑxkand xkÝÑ˚xjfor some xkPx.

• xi ÐÑxjmeans that either xiÝÑxjor xjÝÑxi. Pitler et al. [19] also depict this as eij. • xi ÐÑ˚ xjmeans that either xiis xjor xi ÐÑxkand xk ÐÑ˚ xjfor some xkP x. This

(17)

2.2. Dependency graphs

I think the market is in good shape .

ARG1 BV ARG1

ARG1 ARG2

ARG2

That

’s

not

all

,

he

says

.

ARG1

neg

ARG1 ARG2

ARG2

Figure 2.1: Top: An example of a non-crossing dependency graph. This is graph #22169033 from the SemEval data set, using DM representation.

Bottom: A crossing graph, for comparison. This is graph #22151005 from the SemEval data set, also using DM representation. Specifically, the edge from says to not crosses the edge from ’s to all.

2.2.3 Non-crossing graphs

According to Kuhlmann and Jonsson [11], a dependency graph is non-crossing if, for all pairs of edges ei, ejPE, eiand ejdo not cross. Two edges cross if either of the conditions in definition 2 holds true. Figure 2.1 shows a non-crossing graph and a crossing graph.

Definition 2. Two edges eiand ejare non-crossing if neither of the two conditions below holds true for the two edges, and crossing if either condition holds. le f t(e)is the left endpoint of an edge e, and right(e)is its right endpoint. If e=1 Ñ 5 or e=5 Ñ 1, then le f t(e) =1 and right(e) =5.

1. le f t(ei)ăle f t(ej)ăright(ei)ăright(ej) 2. le f t(ej)ăle f t(ei)ăright(ej)ăright(ei)

Flajolet and Noy [6] and Orden and Santos [18] use more advanced mathematical and geometrical definitions to define a non-crossing graph, but the definition by Kuhlmann and Jonsson will be used in this thesis report, as it is specific to dependency parsing.

2.2.4 Cyclicity and acyclicity

(18)

2.3. Data set

w

₀

_w

1

w

2

w

3

w

4

w

5

First page

a

₁

a

₂

a

₃

a

₄

Figure 2.2: An arbitrary sentence with six words, covered by a dependency graph with four edges. The dependency graph has two crossings: edges a2and a3cross, and so do edges a3 and a4.

The parsing algorithm by Kuhlmann and Jonsson only considers acyclic graphs [11, p. 560]. The SemEval 2014 test set, used both by this thesis and by Kuhlmann and Jonsson, is filtered in such a way that no cyclic graphs are contained in any representation [17, p. 66].

Kübler et al. argue that acyclicity is logical from a linguistic perspective, “as any depen-dency tree not satisfying this property would imply that a word implicitly is dependent upon itself” [10, p. 15].

2.2.5 Page number

Page number is also sometimes referred to as pagenumber (without a space) or, more rarely, book-thickness. Planarity is the preferred term by Kübler et al. and Pitler, but Kuhlmann and Jonsson explicitly call it pagenumber instead.

If the edges of a graph are split across several pages, Jacobson [8, p. 552] defines page num-ber as the minimum numnum-ber of such pages, into which the edges E in a graph G= (V, E,`V,`E) needs to be split for there not to be any page where any edges cross.

Definition 3. Assume that noncrossing(ej, ek) holds true if the edges ej and ek are noncrossing, according to definition 2. If the edges in E of a dependency graph are split across sets P1, P2, ..., Pmso that all edges appear in exactly one set, then the page number of the graph is the minimum number m of such sets needed so that for each set, no edges in any set cross. This can be expressed logically as @_{i, j, k} ai, ajPPk ðñ noncrossing(ai, aj).

By definition 3, non-crossing dependency graphs have a page number of 1, because their edges do not need to be split into any sets for the graph to become non-crossing. A graph with one crossing will need to be split exactly once so that the crossing edges end up on different pages, which results in a page number of two.

Figures 2.2 and 2.3 depict splitting a crossing dependency graph into pages and the re-sulting page number. In Figure 2.3, the optimal way to move one edge to another page is performed immediately, which shows that the page number is 2 (because the page number is defined by the minimum number of pages needed to create non-crossing subgraphs). One could conceivably also move each edge to its own page, which would create non-crossing sub-graphs for any given graph, but the page number would still be 2.

2.3 Data set

This thesis uses a data set of graphs to train the new algorithm, to empirically test how the algorithm performs, and to compare the algorithm to other algorithms that used the same

(19)

2.3. Data set

w

₀

_w

1

w

2

w

3

w

4

w

5

First page

a

₁

a

₂

a

₃

a

₄

Second page

Figure 2.3: The graph from Figure 2.2 after the edge a3has been moved to a new page. The re-sulting sub-graph on both pages is non-crossing, so the given graph must have a page number of 2, since there was no way to construct a graph with one page where there were no crossing edges.

data set or parts of the same data set. The used data is the same as that used by Kuhlmann and Jonsson, which is split into two parts [11]: first, the English data set that was used in task 18 of SemEval 2015 [16], and second, data extracted from CCGBank [11, p. 559].

2.3.1 The SemEval data set

There are three main graph types in this data set:

DM DMgraphs are DELPH-IN MRS-Derived Bi-Lexical Dependencies[16, p. 916].

PAS PASgraphs are Enju Predicate–Argument Structures[16, p. 916].

PSD PSDgraphs are Prague Semantic Dependencies[16, p. 917].

For more information about how these data sets were created, see Oepen [16]. Notably, these three representations all contain the same sentences, tagged in three different ways. This is illustrated in Figure 2.4, which presents the same sentence for all three representations. There are 35655 sentences in the data that was used for this thesis project.

Figure 2.5 shows a figure that depicts the distribution of the lengths of sentences in these three representations. This semi-histogram is identical for all three representations, as the number of nodes in a given sentence is constant. Only the edges in a graph are different between representations.

Figure 2.6 shows how graphs are distributed across page numbers for all three represen-tations. This is different for each representation, as it depends on the edges in each graph.

2.3.2 The CCG data set

The CCG data set consists of 41517 sentences, some of which are different sentences than those found in the SemEval data set. The databank was originally compiled by Hockenmaier

(20)

2.4. Deduction systems

It ’s one morefor the baseball-lovinglawyers, accountantsandrealestatedevelopers whoponiedup about$ 1 millioneachfor thechanceto be an owner, to stepintothe shoesof a GeneAutryor havea beerwithRollieFingers.

ARG1 ARG1

ARG1

mwe compound times BV BV ARG1 BV ARG1 compound BV compound BV

conj ARG1 ARG2 ARG1 ARG2 ARG2 BV ARG2

ARG2 ARG2

ARG2 ARG1 ARG2

ARG1 _and_c

ARG1

_or_c

DM

verb_ARG1verb_ARG2adj_ARG1

coord_ARG1 coord_ARG2

noun_ARG1

adj_ARG1 adj_ARG1 det_ARG1 comp_MODcomp_ARG1 det_ARG1 prep_ARG1 det_ARG1 prep_ARG1 noun_ARG1 coord_ARG2

det_ARG1 noun_ARG1 prep_ARG1

coord_ARG1 adj_ARG1

noun_ARG1 prep_ARG2 verb_ARG2 prep_ARG2 det_ARG1 verb_ARG2

prep_ARG2 coord_ARG2 adj_ARG1

coord_ARG1 prep_ARG2 prep_ARG1 adj_ARG1 relative_ARG1 det_ARG1 verb_ARG1 verb_ARG2 prep_ARG2 verb_ARG3 comp_MOD prep_ARG1 coord_ARG1 comp_ARG1 coord_ARG2 PAS

ACT-arg PAT-arg RSTR CONJ.member RSTR RSTR ACT-arg RSTR NE DISJ.member NE PAT-arg RSTR EXT RSTR

PAT-arg PAT-arg PAT-arg RSTR CONJ.member CONJ.member PAT-arg DIR3-arg APP ACMP PAT-arg RSTR COMPL RSTR PAT-arg PAT-arg RSTR DISJ.member CAUS PAT-arg DISJ.member PAT-arg PSD

Figure 2.4: The sentence “It’s one more for the baseball-loving lawyers, accountants and real estate developers who ponied up about $1 million each for the chance to be an owner, to step into the shoes of a Gene Autry or have a beer with Rollie Fingers.” in DM (top), PAS (middle) and PSD (bottom) representations. This relatively long sentence was chosen because of the large differences between representations. This is sentence #20214017.

and Steedman [7]. CCG stands for Combinatory Categorical Grammar. Figure 2.7 depicts how sentence lengths are distributed in this data set on the left, and the page number distribution of the CCG set on the right.

The CCG set is not acyclic, like the SemEval sets are. The number of cyclic graphs in the CCG set is 530 out of the 41517 graphs. In Figure 2.8, a cyclic graph from the CCG set is shown.

2.3.3 Graph IDs

Each graph in all of the four representations has a unique ID. The ID consists of a number sign (#) followed by eight numbers. These numberings are consistent across all four repre-sentations; for example, graph #20214017 in CCG representation is the same sentence as that shown in Figure 2.4.

2.4 Deduction systems

Generating a graph G = (V, E,`_V,`_E) for a sentence S is a form of parsing, where data is processed to create some sort of structure that logically structures said data. Shieber et al. [22] provide the formalisms for parsing as deduction, where a logical deduction system is used to parse a sentence. Such a system consists of the following logical structures [22, p. 6]:

Items The deduction system has one or more types of items, described by a letter and the indices of the sentence where the item exists. An item A that takes two indices (often a left and a right border of the item) is represented as[A, i, j], which represents that the sequence from word xito word xjis an instance of the item A.

Axioms Any logical system must contain axioms that are true before any logical rule is ap-plied. A deduction system starts with certain such axioms. Often, axioms are of the

(21)

1 5 9 14 20 26 32 38 44 50 56 62 68 74 80 86

Graphs in 35656 sentences, DM, PSD, PAS

Sentence length Occurences 0 200 400 600 800 1000 1200

Figure 2.5: The distribution of sentence lengths across the three SemEval data representations.

type[A, i, i+1], which represents that the initial assumption is that each word xi+1is an instance of some item A.

Goal items The deduction system accepts certain items as completed graphs. [S, 1, n] is a common goal item, and represents that the deduction system is finished when it has generated an object S covering the entire sentence, from word w1to word wn.

Inference rules In the deduction system, inference rules [B, i, j] [C, j, k]_{[A, i, k]} , represent that two ob-jects B and C, covering words xito xjand xjto xk, logically mean that the item[A, i, k] can be assumed to exist. Inference rules allow the axioms to be used to create new items, and so on until a goal item is reached.[A , i , k]ÐÝ[B , i , j],[C , j , k]is an alternative way of writing the same thing.

The arguments of an inference rule are called premises: they are the items on top of the line, or to the right of the arrow. The items below the line, or to the left of the arrow, are called consequents.

(22)

2.4. Deduction systems 1 2 3 DM Page number Occurences 0 5000 10000 15000 20000 1 2 3 PAS Page number Occurences 0 5000 10000 15000 20000 1 2 3 4 PSD Page number Occurences 0 5000 10000 15000 20000

Figure 2.6: The distribution of graphs in DM, PSD and PAS representations by the page num-ber of each graph.

1153147637995 113 133 153 173 193 213 233 Graphs in 41407 sentences, CCG Sentence length Occurences 0 500 1000 1500 1 2 3 4 5 CCG Page number Occurences 0 5000 10000 15000 20000

Figure 2.7: Left: The distribution of sentence lengths for the CCG data set. Right: The distri-bution of graphs in the CCG representation by the page number of each graph.

(23)

The compromise was a somewhat softened version of what the White House had said it would accept .

ARG-1

ARG-1 ARG-2 ARG-1

ARG-1 ARG-2

ARG-1

ARG-1 ARG-2 ARG-1 ARG-2 ARG-1 ARG-1 ARG-2 ARG-1 ARG-1 ARG-2 ARG-1 ARG-2 CCG

Figure 2.8: The sentence “The compromise was a somewhat softened version of what the White House had said it would accept.” in CCG representation. This sentence was chosen because it is one of the 530 sentences of the CCG set that contains a cycle. The cycle consists of the edges accept Ñ what Ñ had Ñ said Ñ would Ñ accept. This is a linguistically tricky case: what did the White House say it would accept? From the scope of this sentence, the only possible answer is “what the White House said it would accept”, which is a cyclic reference. This is sentence #20250004.

1

2

3

4

5 i k

i k

Figure 2.9: The five types of items introduced by Kuhlmann and Jonsson.

This format, starting with axioms and ending with goals, and using inference rules to transform the set of currently known logical statements into another set of logical statements until that set contains one or more of the goals, is a grammatical deduction system. Neder-hof [12] introduces weights and scores to the system proposed by Schieber et al., allowing Dijkstra-style search through the inference rules to find the best goal item. This is expanded on below.

Kuhlmann and Jonsson [11, pp. 563-564] introduce five types of items and nineteen in-ference rules to create a deduction system that creates the maximum noncrossing dependency graph. This thesis report will use the syntax proposed by Shieber et al. to extend these five types of items and nineteen inference rules.

2.4.1 The deduction system as proposed by Kuhlmann and Jonsson

The five types of items proposed by Kuhlmann and Jonsson [11, p. 563] are depicted in Figure 2.9. Going by their label in the picture, they are:

(24)

1. A min-max covered item, corresponding to a subgraph that contains an edge from its left side to its right side, or i Ñ k.

2. A max-min covered item, corresponding to a subgraph that contains an edge from its right side to its left side, or k Ñ i.

3. A min-max connected item, corresponding to a subgraph that contains a directed path from its left side to its right side, or i Ñ˚_{k ^ i Û k ^ i ‰ k.}

4. A max-min connected item, corresponding to a subgraph that contains a directed path from its right side to its left side, or k Ñ˚ _{i ^ k Û i ^ i ‰ k.}

5. A bland item, corresponding to a subgraph where there is no directed path between the left side and the right side. A bland item must also not be covered by an edge, as in min-max covered or max-min covered items.

The nineteen inference rules in the deduction system by Kuhlmann and Jonsson are de-picted below in table 2.1. Each rule is presented as an illustration.

Rule Illustration Rule Illustration Rule Illustration

R01 i j j k i k _R02 i j j k i k _R03 i j j k i k R04 i j j k i k R05 i j j k i k R06 i j j k i k R07 i j j k i k R08 i j j k i k R09 i j j k i k R10 i j j k i k _R11 i k-1 i k _R12 i k-1 i k R13 i k-1 i k R14 i k-1 i k R15 i k-1 i k R16 i k i k i k R17 i k i k i k R18 i k i k k i

(25)

Rule Illustration Rule Illustration Rule Illustration

R19

i k

k i

Table 2.1: The nineteen inference rules in the deduction system by Kuhlmann and Jonsson.

Figure 2.10 attempts to illustrate how the system works by connecting items and inference rules as a graph. Rules can be classified as merging, expanding and covering rules, depending on what the rules does. A merging rule connects two adjacent items, an expanding rule expands an item by one node to the right, and a covering rule adds a covering edge to an item, turning it into a covered item.

Min-Max Covered R11 Expanding Unary R02 Merging Left R05 Merging Right R03 Merging Right R01 Merging Left Right R07 Merging Right R09 Merging Right Bland R15 Expanding Unary R10 Merging Left Left R19 Covering Unary R17 Covering Unary Max-Min Covered R12 Expanding

Unary Left Right

R04 Merging Left Right Right R06 Merging Right R08 Merging Right Max-Min Connected R14 Expanding Unary Left Left _R18 Covering Unary Min-Max Connected R13 Expanding Unary Left Left R16 Covering Unary

Figure 2.10: An attempt at illustrating how the inference rules from table 2.1 go together with the item types to form a deduction system.

Through analysing their rules and items, Kuhlmann and Jonsson prove that their system is unique, sound and complete [11, p. 565]. These terms are explained in Section 2.7.

(26)

2.5. Running an algorithm described as a deduction system

2.5 Running an algorithm described as a deduction system

Shieber et al. [22, p. 24] describe the algorithm contained in algorithm 1 for checking if a sen-tence can be described with a deduction system. This problem is closely related to that of finding the most appropriate derivation for the given sentence using the system. The algorithm uses the word chart for the list of items whose results together with other items in the chart have been proven, and agenda for the list of items whose consequences have not yet been explored.

Algorithm 1. Proving that a sentence can be described by a deduction system (Shieber et al.)

1. Initialise the chart to be empty and the agenda as the axioms of the deduction system. 2. Until the agenda is empty, run the following three steps:

a) Pick an arbitrary item from the agenda.

b) Add it to the chart if it is not already on the chart.

c) If the item was added to the chart, calculate all consequences of the item and the items already on the chart (using inference rules) and add those consequences to the agenda. 3. If a goal item is in the chart, then the sentence is part of the grammar described by the deduction

system.

Nederhof [12, p. 140] extends algorithm 1 to instead find the graph with the best score described by the system. The algorithm is an extension of Knuth’s generalisation of Dijkstra’s algorithm. It is defined as Algorithm 2, below.

Algorithm 2. Finding the best derivation for a sentence in a deduction system (Nederhof)

The following notations are used.

D The set of items whose lowest score has been found.

E The set of items reachable in one inference rule from D, but not necessarily with the lowest score. f(x1, ...xm) A scoring function that assigns a score to each generated item such that the score is

larger than or equal to the sum of the scores of the components used to create the item.

µ(Ij) The lowest possible score for the item Ij: items for which this has been calculated should hence be in D.

With these definitions, the algorithm follows. 1. Initialise D as the empty set.

2. Set E to the set of items I0such that I0RD and that at least one inference rule creates I0from items I1, ..., Im, m ě 0, where f is the weight function resulting from such an inference rule and the results I1, ..., ImPD.

3. For each I0PE, set v(I0)to f(µ(I1), ..., µ(Im)).

4. If E is empty, the algorithm failed (the sentence could not be described by the deduction system). 5. Choose the I P E with the lowest v(I).

6. Set D :=D Y tIu, and µ(I):=v(I). 7. If I is a goal item, return µ(I). 8. Go to step 2.

To efficiently run algorithm 2, Nederhof notes that tables should be used to avoid calcu-lating the same scores more than once [12, p. 139]. Using tables for this purpose is typical of dynamic programming approaches.

(27)

2.6. A practical example of how the parser by Kuhlmann and Jonsson parses a sentence

2.5.1 Scorers

Scorers are implementations of the scoring function f(...) mentioned in algorithm 2. Kuhlmann and Jonsson use online passive-aggressive training as described by Crammer et al. [3][11, p. 565] Scorers, and training using scorers, are generally trained using machine learn-ing, like in this case. Using this method, Kuhlmann and Jonsson report a training time of 40 minutes per training set, when ten iterations of training are used [11, p. 566]. The same feature models and training methods as Kuhlmann and Jonsson are used in this thesis.

2.5.2 Oracle scorers

Another way to implement the weight function f(...)mentioned in algorithm 2 is to use an oracle scorer. Kübler et al. describe an oracle as a mechanism, which, given a state, can always predict the optimal way to proceed [10, p. 25]. When this is applied to a weight function, an oracle scorer is a weight function that scores in such a way that the parser finds the optimal parse [11, p. 566]. Oracle scorers that can find the optimal parse for any unknown graph obviously do not exist, but when the data that is being parsed also has a canonical, gold-standard graph for each sentence, it is possible to construct an oracle scorer that guides the deduction system into choosing the optimal inference rules.

Kuhlmann and Jonsson implement an oracle scorer that scores correct edges (i.e. edges that are in the gold-standard graph) with a score of+1, and incorrect edges (i.e. edges that are not in the gold-standard graph) with a score of ´1 [11, p. 566]. Whenever an oracle scorer is mentioned in this thesis, it should be assumed that these numbers were used.

2.6 A practical example of how the parser by Kuhlmann and Jonsson

parses a sentence

In this section, it is illustrated how the deduction system by Kuhlmann and Jonsson, as de-scribed in Section 2.4.1, parses two sentences. Both sentences were parsed using an oracle scorer that gives a score of 1 to correct edges and a score of ´1 to edges that are not part of the real graph. The gold standard graph in DM representation was chosen as the reference for the oracle scorer.

The two sentences are the following.

1. “Numerous injuries were reported.” This is a five-node sentence, with the ID #22102004 from the SemEval dataset. There are no crossing edges in the gold-standard DM graph. The sentence as it is given in the test data can be seen in Figure 2.11.

2. “That’s not all, he says.” This is an eight-node sentence; the comma, the possessive suffix (-’s) and the dot get their own nodes. Its graph ID is #22151005, and there is one crossing edge in the canonical DM representation. The sentence as it is given in the test data can be seen in Figure 2.12.

Derivation of “Numerous injuries were reported.”

Below, each subgraph is presented together with its item form and what generated it.

Subgraphs of size 1 The axioms are bland, empty, graphs of size one, and appear here. Item 2 is generated by covering the axiom above it.

1. Axiom: [Bland, 1, 2]

(28)

Numerous

injuries

were

reported

.

ARG1

ARG2

Figure 2.11: Sentence 22102004 from the SemEval 2014 dataset, DM representation.

That

’s

not

all

,

he

says

.

ARG1

neg

ARG1 ARG2

ARG2

Figure 2.12: Sentence 22151005 from the SemEval 2014 dataset, DM representation.

2. R17: [Bland, 1, 2] 1

[Min ´ Max Covered, 1, 2]

Numerous injuries

ARG1

3. Axiom: [Bland, 2, 3]

injuries were

Subgraphs of size 2 Merging rules can theoretically create subgraphs of size 2, but this does not occur in this specific derivation. One of the subgraphs of size 1 is extended to the right using R15, and the result is covered using R19.

(29)

1. R15: [Bland, 2, 3] 3

[Bland, 2, 4]

injuries were reported

2. R19: [Bland, 2, 4] 1

[Max ´ Min Covered, 2, 4]

injuries were reported

ARG2

Subgraphs of size 3 The merging rule R02 is used to merge a subgraph of size 2 with a subgraph of size 1.

1. R02: [Min ´ Max Covered, 1, 2]

2_[_{Max ´ Min Covered, 2, 4}_]2

[Bland, 1, 4]

Numerous

injuries

were

reported

ARG1

ARG2

Goal graphs (size 4) Finally, the subgraph of size 3 is extended by one node to the right, creating a bland item with the same width as the target sentence. This is the best goal item as scored by the oracle scorer, and the parser returns it.

(30)

1. R15: [Bland, 1, 4] 1

[Bland, 1, 5]

Numerous

injuries

were

reported

.

ARG1

ARG2

Derivation of “That’s not all, he says.”

This is a larger graph than the previous derivation.

Subgraphs of size 1 The gold standard graph contains three edges with a length of one. All of these must be created as subgraphs of size one, by covering an axiom.

1. Axiom: [Bland, 1, 2] That ’s

2. R19: [Bland, 1, 2] 1

That ’s ARG1 3. Axiom: [Bland, 2, 3] ’s not 4. R19: [Bland, 2, 3] 3

’s not neg 5. Axiom: [Bland, 3, 4] not all 6. Axiom: [Bland, 6, 7] he says

(31)

7. R19: [Bland, 6, 7] 6

he says

ARG1

Subgraphs of size 2 The two max-min covered items on the left of the sentence are merged, creating a max-min connected item.

1. R04: [Max ´ Min Covered, 1, 2]

[Max ´ Min Connected, 1, 3]

That ’s not ARG1 neg 2. R15: [Bland, 3, 4] 5 [Bland, 3, 5] not all ,

Subgraphs of size 3 Here, the deduction system chooses to not include the edge1_{s Ñ all.} Because there are only two edges that cross, and they cross each other, the choice of which to include and which not to include is arbitrary.

1. R15: [Bland, 3, 5] 2

[Bland, 3, 6]

not

all

,

he

Subgraphs of size 4 Subgraphs are merged and covered. 1. R10: [Bland, 3, 6]

[Bland, 3, 7]

not

all

,

he

says

(32)

2. R19: [Bland, 3, 7] 1

not

all

,

he

says

ARG1 ARG2

Subgraphs of size 6 The final two subgraphs are merged.

1. R08: [Max ´ Min Connected, 1, 3]

[Max ´ Min Connected, 1, 7]

That

’s

not

all

,

he

says

ARG1

neg ARG1

ARG2

Goal graphs (size 7) Because the dot at the end of the sentence is not part of the actual graph, the only way for the parse to end is to extend the existing graph by one node to the right without connecting it.

(33)

2.7. Deduction systems, soundness, completeness and uniqueness

1. R14: [Max ´ Min Connected, 1, 7] 1

[Bland, 1, 8]

That

’s

not

all

,

he

says

.

ARG1

neg ARG1

ARG2

2.7 Deduction systems, soundness, completeness and uniqueness

A deduction system described using the logical syntax from Shieber et al. can be proven to be sound and complete. Definitions 5 and 4 follow from Nivre [13, p. 519].

Definition 4. A deduction system is sound with respect to the class Cmif every goal item that results from using the system’s inference rules is a graph of the class Cm.

Definition 5. A deduction system is complete with respect to the class Cmif every graph in the class Cmcan be generated by the system.

2.7.1 Proving soundness of an algorithm

Algorithms do not have to be described by deduction systems; Nivre [13, pp. 522-523] shows soundness of a state-based transition system instead, but using methods that can be expanded to also apply to deduction systems.

Nivre shows that his transition system for projective dependency forests (and therefore trees) is sound by first proving that the axioms of the system defines a set of projective de-pendency trees, or a projective dede-pendency forest. By proving that each inference rule that can be applied to such a forest also results in a projective dependency forest, Nivre proves that, at each iteration, including the first iteration, his deduction system generates a set of projective dependency trees: his system is therefore sound with respect to the set of projective dependency forests, and therefore also for the set of projective dependency trees. [13, pp. 522-523]

Kuhlmann and Jonsson [11, p. 563] prove soundness on a deduction system by showing that the axioms of their deduction system are sound. Similarly to Nivre, they also show that the resulting items from all their inference rules, when applied to a set of dependency graphs that are non-crossing, will generate a set of graphs that is also non-crossing. This informally shows that the system is sound.

This method, where the soundness of the axioms in the system is first shown and the soundness of iteration n+1 of the system as long as iteration n is also sound, is a type of inductive proof. It is, however, only applicable if there is a graph class Cmthat can be defined independently of the deduction system for which soundness is to be proven.

(34)

2.8. Deduction systems and time complexity

2.7.2 Proving completeness of an algorithm

To prove that his projective transition system is complete, Nivre applies an induction proof to the length of the sentence x= (w0, w1, ..., wn). If the length of the sentence is one, then clearly no edges must be added for the graph to be a projective dependency tree. Nivre goes on to show that, if the claim holds for any length |x| ď p, p ą 1, then based on the inference rules in the system, the claim must also hold for the length |x|+1. [13, p. 524]

Kuhlmann and Jonsson show the completeness of their deduction system by also utilising an induction proof over the length of the sentence. The sentence of length one is covered by the axioms, and if all graphs can be generated for a length m, which must consist of items over lengths n ă m for which completeness has already been proven, then the inference rules will always combine the complete sub-items to form every non-crossing graph. [11, p. 563]

Overall, proving correctness for a deduction system depends on finding an induction proof that shows that, for any length of the base sentence x, the system will generate the correct tree, graph or forest.

2.7.3 Proving uniqueness of a deduction system

A deduction system is unique if each result of the system has exactly one unique derivation of operations in the system. Kuhlmann and Jonsson prove that their deduction system is unique in this sense [11, p. 565].

A deduction system does not need to be unique; it is merely a property of some such sys-tems. Uniqueness is proven differently for different deduction system, and a general method is hard to extract. Nivre does not mention derivational uniqueness [13]. A system with derivational uniqueness only visits each graph it can create once, which is desirable when dealing with search algorithms and trying to find the optimal graph. In addition, a deriva-tionally unique system can be analysed theoretically, and can be used to count and display the number of graphs inside the graph class for a given length. Kuhlmann and Jonsson use their algorithm to count the number of non-crossing graphs for various lengths, which also helps prove that their algorithm must parse to the non-crossing class [11, p. 565].

2.8 Deduction systems and time complexity

If a deduction system is converted to a Nederhof-style algorithm as described in Section 2.5, and dynamic programming is used to ensure that no scores are calculated more than nec-essary, then the asymptotic time complexity of the algorithm is found by looking at the in-ference rule that has the most variable points in its definition. In the list of inin-ference rules created by Kuhlmann and Jonsson, as seen in Section 2.4.1, the largest inference rules depend on three points as their input: therefore, the worst case time complexity for the deduction sys-tem is that all of these isys-tems have to be calculated exactly once, leading to a time complexity ofO(|V|3), where V is the set of nodes in the graph. [11, pp. 562-563]

Koo and Collins [9, pp. 2-4], whose deduction system is not defined by logical inference rules, use a similar argument regarding the possible items in their system and the rules appli-cable to each item to arrive at the conclusion that their third-order parsing algorithms require

O(n4₎_{time, where n is the number of nodes in the provided sentence.}

2.9 Coverage of test data

How well a parsing algorithm performs, regardless if it is described as a logical deduction system or not, can be tested on a data set. When the algorithm specifically can only generate graphs of a certain graph class, then there is an upper bound on the performance of the algorithm. These two concepts are separated in the sub-sections below.

Dynamic Programming Algorithms for Semantic Dependency Parsing

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Datateknik

2017 | LIU-IDA/LITH-EX-A--17/014--SE

Dynamic Programming

Algorithms for Semantic

Dependency Parsing

Algoritmer för semantisk dependensparsning baserade på

dynamisk programmering

Nils Axelsson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Motivation

Agnes

ate

the

apple

1.2

Aim

1.3

Research questions

1.4

Delimitations

2

Theory

2.1

Parsing

2.2

Dependency graphs

2.2.1

Definition of a semantic dependency graph

2.2.2

Notations for dependency relations

That

’s

not

all

,

he

says

.

2.2.3

Non-crossing graphs

2.2.4

Cyclicity and acyclicity

w

w

w

w

w

w

First page

a

a

a

a

2.2.5

Page number

2.3

Data set

w

w

w

w

w

w

First page

a

a

a

a

Second page

2.3.1

_w

_w