• No results found

Uniform Parsing for Hyperedge Replacement Grammars

N/A
N/A
Protected

Academic year: 2021

Share "Uniform Parsing for Hyperedge Replacement Grammars"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available atScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

Uniform parsing for hyperedge replacement grammars

Henrik Björklunda, Frank Drewesa,∗, Petter Ericsonb,a,Florian Starkec

aDepartmentofComputingScience,UmeåUniversity,Sweden

bDigitalandCognitiveMusicologyLab,ÉcolePolytechniqueFédéraledeLausanne,Switzerland cFacultyofComputerScience,TUDresden,Germany

a rt i c l e i nf o a b s t ra c t

Articlehistory:

Received24April2018 Accepted26October2020 Availableonline27November2020

Keywords:

Graphgrammar Hyperedgereplacement Uniformparsing Complexity

NaturalLanguageProcessing Meaningrepresentation

It iswellknownthathyperedge-replacement grammarscangenerateNP-completegraph languagesevenunderseeminglyharshrestrictions.Thismeansthattheparsingproblemis difficulteveninthenon-uniformsetting,inwhichthegrammarisconsideredtobefixed rather thanbeingpartoftheinput.Littleisknown aboutrestrictionsunderwhichtruly uniformpolynomialparsingispossible.Inthispaperweproposealow-degreepolynomial- timealgorithmthatsolvestheuniformparsingproblemforarestrictedtypeofhyperedge- replacementgrammarswhichweexpecttobeofinterestforpracticalapplications.

©2020TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Hyperedge-replacement grammars (HR grammars, for short) are context-free graph grammars that were introduced in [3,18], see also [17,11]. They represent one of the two most successful formal models for the description of graph languages (the other being confluent node-replacement grammars), becauseof their favorable algorithmic andlanguage- theoretic propertieswhichcloselyresemble thoseofcontext-freestringgrammars.Unfortunately,thesimilarities between the stringandgraphcases failto extendto oneof themostimportantcomputational problemsin thecontext offormal languages: theparsing problem. Ithas beenknownfora long time that eventhe non-uniformmembership problemfor context-freegraphlanguagesisintractable(unlessP=NP).Inparticular,therearehyperedgereplacementgraphlanguages whichareNP-complete [1,19].Severerestrictionsmustbeplacedonthegrammarsinordertomake atleastnon-uniform polynomial parsingpossible.Early resultsinthisregard can befound in [20,21,10].In [20] thedegree ofthepolynomial thatboundstherunningtimevarieswiththelanguage.Thealgorithmin [21],whichconsidersonlyedgereplacement,and its generalizationtohyperedgereplacementby [10] arecubicinthesize oftheinputgraph,butdependexponentially on the grammarifconsidered ina uniformsetting.Moreover, therestrictions [21] and [10] placed onthe considered graph languagesareverystrong,anditwasshownin [9] thatevenaslightrelaxationresultsinNP-completenessagain.Forthese reasons,theseparsingalgorithmsaremainlyoftheoreticalinterest.

Inrecentyearsthequestionofefficientlyparsinghyperedgereplacementlanguagesreceived renewedinterest,because hyperedgereplacementwasproposed asasuitable mechanismfordescribingsentencesemanticsinnaturallanguage pro- cessing, and in particular the abstract meaning representation (AMR) proposed in [2]. Regarding the use of hyperedge replacement in thisapplication area, see [7]. The same paper described a generalrecognition algorithm together witha

*Correspondingauthor.

E-mailaddresses:henrikb@cs.umu.se(H. Björklund),drewes@cs.umu.se(F. Drewes),petter.ericson@epfl.ch(P. Ericson),Florian.Starke@tu-dresden.de (F. Starke).

https://doi.org/10.1016/j.jcss.2020.10.002

0022-0000/©2020TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).

(2)

detailedcomplexityanalysis.Unsurprisingly,therunningtimeofthealgorithmisexponentialeveninthenon-uniformcase, one oftheexponents beingthe maximumdegreeofnodesinthe inputgraph.The sameistruefortherecentalgorithm by [16] whichimplementsparsingforso-calledregulargraphgrammars.

Unfortunately, the node degree is one of the parameters one would ideally not wish to limit, since meaning repre- sentations do not have boundednode degree. Moreover, naturallanguage processingoften has to deal withalgorithmic learning situationsinwhichlargecorporamustbe parsedandgrammarsadjusted inan iterativeprocess. Thus,trulyuni- formpolynomial-time solutionswouldbe valuable, providedthat thepolynomials haveareasonably lowdegree andthe restrictionsonthegrammarsare“natural”.

Parsing agraphG withrespecttoagivenHRgrammarG meanstocheckwhetherthereisaderivationtreeinG that yields G.Thus, thetaskistodecompose G recursivelyinto subgraphsthat canbe generatedfromthe nonterminalsof G. Intuitively,theNP-completenessoftheproblemcomesfromthefactthatagraphhasexponentiallymanysubgraphs.Thisis themaindifferencebetweengraphandstringparsing.Inthelattercase,thewell-knowndynamic programmingapproach by Cocke, Kasami, andYounger isefficient because a stringhas only quadraticallymany substrings. One wayto achieve polynomial parsing in thegraph caseas well is to makesure that only polynomially manydecompositions are possible candidates forwell-formedderivation trees.Inthispaperweachieve thisby imposingrestrictions onG which guarantee that the overall shape of a suitable decomposition of G can be “read off” G itself. Intuitively,what remains is to check whetherappropriate rulesofG can beassignedtotheverticesofthisdecompositioninordertoturnitintoaderivation tree.

Anattemptata setofrestrictionsserving thispurposewasmadein [5].Motivatedbythefactthat meaningrepresen- tationssuchasAMRaretypicallyacyclic,HRgrammarswereconsideredthatgeneratedirectedacyclicgraphs.However,as acyclicityalonedoesnotmakeparsinganyeasieradditionalconditionswereplacedontheformoftherules.Inthepresent paper,wegeneralizetheapproach:thegeneratedgraphsmayhavecycles,theallowedrulesareconsiderablymoregeneral, andtherestrictions arefewer andformulatedinan axiomaticwaywhichallows fordifferentconcretizations.Weimpose twoconditionsonourgrammars,calledreentrancypreservation andorderpreservation.Thelatterisrelativetoanorderingof thenodesofinputgraphsthatcanbeinstantiatedindifferentways.

WeexpectourparsingalgorithmtobeusefulfordescribingandprocessinglanguagesofsemanticgraphssuchasAMR.In fact,typicalstructuresoccurringinsuchgraphsprovidedthestartingpointforthedevelopmentoftherestrictionsproposed in thispaper.Along withtheformal development ofthe notions andresultsleading to ourparsing algorithm,we tryto illustratethepotential usefulnessofourgrammarsby meansofasmallrunningexamplethat stretchesfromSection 2to Section6.ItshowshowtogeneratealanguageofAMR-likesemanticgraphsbyanHRgrammarthatsatisfiesourconditions foruniformpolynomialparsing(Examples2.4,4.3and5.4,aswellasSection6.1).

Letusbrieflydescribe theideabehind therestrictions weimposeon HRgrammarstomake themefficientlyparsable.

Whenworkingwithhyperedgereplacement,anonterminalhyperedgeisaplaceholderattachedtoasequenceofnodes.This placeholder willeventually be replacedby a subgraph thatshares the attachednodesofthe hyperedge(andonly those) withtherestofthegeneratedgraph.Onedifficultyparsinghastofaceisthat,afterthereplacementofahyperedge,itmay not bevisible intheresultinggraphwhich nodesthereplacedhyperedgehadbeenattachedto.Reentrancy preservation, whichisthefirstconditionwedescribeinthispaper,makesitpossibletorecoverthissetofnodessolelyfromthestructure ofthegeneratedgraph.

One difficultyremains:even iftheattachednodesofa nonterminalhyperedgecan uniquelybe recovered,it maystill be unclear inwhich order they had been attachedto the hyperedge.This is what is avoided by the condition oforder preservation.Itensures,forexample,thatarulecannotreplaceanonterminalhyperedgebyanothernonterminalhyperedge attachedtothesamenodesbutinadifferentorder.

Thankstothetworestrictions,weobtainauniformparsingalgorithmwhichisroughlyquadraticinboththesizeofthe grammarandthatoftheinputgraph.1

Asafinalnoteonrelatedwork,wementionherethatanotherrecentapproachtoefficientparsingforHRgrammarswas presentedin [13,14],wherepredictivetop-downandbottom-upparsersareproposed,generalizingtechniquesfromcompiler construction tothe graphcase. Theapproach thusdiffers fromoursinthat ityields a parsergeneratorwhich,withonly thegrammarasinput,constructsaquadraticparserforthespecificlanguagegeneratedbythatgrammar.Providedthatthe grammaranalysiscanbeperformedinpolynomialtime(whichdependsontheexactvariantoftheparsergeneratorused), thisapproachisthusuniformlypolynomialaswell.

The nextsection compiles thebasicnotions relevanttohyperedgereplacementgrammars.Section 3and4define and study reentrancyandorderpreservation,respectively.Section5presentsonepossibleconcretizationofourabstractnotion of preservedorders. The parsing algorithm and the main resultof thispaper are presented in Section 6, andSection 7 concludesthepaper.

1 Theexactrunningtimedependsonhowefficientlythechosenordercanbecomputed.

(3)

2. Preliminaries

The setofnon-negativeintegersisdenotedby N.Forn∈ N, [n]denotes {1,. . . ,n}.Givena set S, S denotesthe set of all finite sequences over S,and S denotes the set ofnon-repeating sequences in S, i.e.those sequences in which no element of S occurs twice. The empty sequence is denoted by ε, S+=S\ {ε},and S=S\ {ε}. The length ofa sequence wSisdenotedby|w|,and[w]denotesthesmallestsubset A ofS suchthat wA.Thecanonicalextensions of a mapping f:ST to S andto the powerset of S are denoted by f as well, i.e., f(a1· · ·ak)= f(a1)· · ·f(ak) for a1,. . . ,akS,and f(S)= {f(a)|aS}forSS.AsequenceswSwithsS mayalsobedenotedby(s,w). 2.1. Orderingsubsetsofaset

As mentionedin theintroduction,one oftheprerequisitesof ourparsingalgorithm isa waytoordervarious subsets ofthenodesofan inputgraph.Consideranarbitrarybinaryrelationonaset S.Givenasequence w=s1. . .skS,we saythat w isorderedbyifsisi+1foralli∈ [k1],andmoreover,sisjimpliesi<j foralli,j∈ [k].Wefurthermore say thatorders asubset AS ifthe elements of A canbe arrangedin asequence w whichis orderedby.Clearly, iforders A,thissequence w isuniquelydetermined.Inthefollowing,wedenotethissequenceby A (providedthat

indeed orders A). Note that, for the sake of generality, we place no further restrictions on, andit is thus neither necessarilyanorderon A (whereitmaylacktransitivity)noronS (whereitmaybeotherwiseentirelyarbitrary).

2.2. Hypergraphs

Throughoutthispaper,we fixa countablyinfinitesupply LAB ofsymbolscalledlabels,such thatevery σLAB hasa uniquerank rank(σ)∈ N.Similarly,wefixcountablyinfinitesuppliesV andE ofverticesandhyperedges,respectively.

Definition2.1(hypergraph).A(directedhyperedge-labeled)hypergraph overLAB isatupleG= (V,E,att,lab,ext)with thefollowingcomponents:

VVandEE aredisjointfinitesetsofnodes andhyperedges,respectively.

Theattachment att:EV assigns toeachhyperedgee asequenceofattachednodes.ForeE with att(e)= (v,w) wealsodenotev bysrc(e)andw bytar(e),callingthemthesource andthesequenceoftargets ofe,respectively.

Thelabeling lab:E→ assignsalabeltoeachhyperedge,subjecttotheconditionthatrank(lab(e))= |tar(e)|forevery eE.

The sequence extV is thesequence ofexternalnodes. IfextG= (v,w), then we denote thenode v by G and the sequencew ofnodesbyG ,respectively,andweimposetheadditionalrequirementthatsrc(e)∈ [/ G ]foralleE.2 Thesize|G|ofG is

eE|att(e)|.3

Notethatweforbidatt(e)(foreE)tocontainanynoderepeatedly.Inthefollowing,wesimplycallhyperedgesedges andhypergraphsgraphs.Ourdivisionoftheattachmentofeveryedgeintoasinglesourcenodeandanynumberoftarget nodesissimilartothatusedintheliteratureonterm(hyper)graphs.Itmakesitmeaningfultospeakaboutdirectedpaths (definedbelow).Ourgraphsare,however,moregeneralthantermgraphsinthatwe,forthemoment,donotimposefurther structuralconditionsonthem.

Throughoutthepaper,ifthecomponentsofagraph G arenotexplicitlynamed,wedenotethemby VG, EG,attG,etc.

Ifthe componentsof G are given explicitnames (and thusthe subscript isdropped) weextend thisinthe obviousway toderived notations,dropping thesubscripteventhere.Wefurthermoreusethenotation outG(v) todenotethesetofall outgoingedgesofanode vVG,i.e.,outG(v)= {eEG|srcG(e)=v}.

Anisomorphism h:GH isapairofbijectivemappings(hV:VGVH,hE:EGEH)suchthatattHhE=hVattG, labHhE=labG,andextH=hV(extG).IfsuchanisomorphismexistswewriteGH andsaythatthegraphsareisomorphic.

Apath oflengthk∈ N fromuV toeE inG isasequence p=e1· · ·ekE+wheresrc(e1)=u, src(ei+1)∈ [tar(ei)] forall i∈ [k1],andek=e. Iffurthermore v isanode in[tar(ek)]then pv is apathfrom u to v.Both p and pv pass the nodes src(e2),. . . ,src(ek),and we saythat p contains e1,. . . ,ek aswell assrc(e1),. . . ,src(ek),while pv additionally contains v.Ifsrc(e1)∈ [tar(ek)],thepathisacycle.Wesaythatthepathisasourcepath ifu=G.

Anode v oran edgee isreachable fromanode u ifu=v orthereisapathfromu tov orfromu toe, respectively.

Wesimplysaythatv ande arereachableinG iftheyarereachablefromG.IfG isclearfromthecontextwemayjustwrite

2 Recallthat[G ]denotesthesetofnodesoccurringinG .

3 Thissimpledefinitionofsizeissufficientandappropriateforourpurposesastheclassesofgrammarsconsideredinthepaperonlygenerateconnected hypergraphs,andbythedefinitionofhypergraphsitholdsthatexternalnodesarepairwisedistinctand1≤ |att(e)|≤ |V|for allhyperedgese.Thus,

|V|≤ |G|,|E|≤ |G|,and|ext|≤ |G|.

(4)

s

u

v w

v w

A e

c f

gb a h

Fig. 1. Example drawing of a graph G.

“reachable” instead of“reachable inG”. Notethat, by definition,paths arealways directed, andthus all ofthesenotions refertodirectedpaths.

Therank ofG= (V,E,att,lab,ext)isrank(G)= |G |andthatofeE isrankG(e)=rank(lab(e)).Thein-degree ofanode uV is|{eE|u∈ [tar(e)]}| anditsout-degree is |{eE|src(e)=u}|.Anodeofout-degree 0isa leaf,andanode v of in-degree 0, suchthateveryother nodeinV isreachablefromv,isaroot.Thus,therootofagraphisuniqueifitexists.

Ifitdoes,we saythat G isrooted.Notethat, iftherootisG,thenthewholegraphG isalsoreachable.Notefurthermore that,byourgeneralconditiononthesourcesofedges,allnodesinG areleaves.Thereadershouldkeepthisfactinmind becausewewilloccasionallymakeuseofitwithoutexplicitlymentioningit.

Foralabel A ofrankk,welet A denotethegraph({0,. . . ,k},{e},att,lab,0· · ·k)suchthatatt(e)=0· · ·k,andlab(e)= A.

2.3. Drawingconventions

WedrawgraphsasshowninFig.1: externalnodesaredepictedasbulletsandnon-externalonesascircles.Thenode G isalways thetopmostbullet.An edgeeEG isdepictedasaboxwiththeedgelabelinscribed,whichcanbedropped if itisnotrelevant.TheattachmentattG(e)isindicatedbyalinedrawnfromsrcG(e)to(theboxrepresenting)e,andarrows pointingfrome tothenodesintarG(e).ThearrowsleavetheboxintheorderinwhichtheyappearintarG(e),fromleftto right.Similarly,thenodesinG arearrangedfromlefttoright.Forexample,inthefigurewehavetarG(e)=uv,G=s,and G =v w.

2.4. Hyperedgereplacement

LetH andF begraphsandeEH suchthat VHVF= [extF],EHEF = ∅,andattH(e)=extF.Theresultofsubstituting e by F inH isthegraphG=H[e:F]suchthatG= (VHVF,(EHEF)\ {e},attG,labG,extH)with

attG(f)=



attH(f) if fEH\ {e} attF(f) if fEF

labG(f)=



labH(f) if fEH\ {e} labF(f) if fEF.

For graphs H and F and an edge eEH withrankH(e)=rank(F) it should be clearthat we mayalways choose an isomorphiccopy F ofF suchthat H[e:F]isdefined.Toavoidthecumbersometechnicalitiesofconstantlyhavingtodeal with explicitisomorphisms, we shall thereforealways assume that F itself fulfills the requirements.If it does not, it is assumedthat F issilently replacedby anappropriate isomorphic copy.Notethat thisispossibleby ourassumptionthat neitherattachmentsofedgesnorthesequencesofexternalnodesofgraphscontainrepetitions.

For the remainder of the paper, we assume that LAB is partitioned into two disjoint subsets LABN and LABT, both countably infinite,whoseelements are callednonterminals and terminals,respectively.Naturally, a terminal(nonterminal) edgeisanedgelabeledbyaterminal(nonterminal,respectively).Wesometimesjustcallthemterminalsandnonterminals ifthereisnodanger ofconfusion.Byconvention, weusecapitalletterstodenotenonterminals,andlowercaselettersfor terminalsymbols.Furthermore,werefertothegraphH above,inwhichthereplacementtakesplace,asthehostgraph.

Definition2.2(hyperedgereplacementgrammar).Ahyperedgereplacementgrammar (HRgrammar,forshort)isasystemG = (,N,S,R) whereLABT, NLABN, SN is theinitialnonterminal,and R isasetofrules, alsocalledHR rules.Each ruleisoftheformAF where AN andF isagraphoverN withrank(F)=rank(A).

Thesize ofG is|G|=

(AF)∈R|F|.

Forgraphs G,H , we let HRG if there exista rule AFR and an edge eEH withlab(e)=A such that G= H[e:F].Asusual,R denotesthereflexivetransitiveclosureofR.Ifthereisnodangerofconfusionwe oftenwrite

References

Related documents

The result of this study has within the setting of using the workshop facilitation tool during the EDWs, identified a set of digital features affording; engagement, interactivity

while Prostigmata had significantly higher di- versity than Oribatida and Astigmata through- out the year. Also, we found no significant dif- ferences between diversity of

In this thesis we investigated the Internet and social media usage for the truck drivers and owners in Bulgaria, Romania, Turkey and Ukraine, with a special focus on

participation in the strategy formulation process. When it comes to participation in the strategy formulation process, this study shows that it is equally critical to engage

Note that in the original WRA, WAsP was used for the simulations and the long term reference data was created extending the M4 dataset by correlating it with the

Vernacular structures like these exist all over the world and are not exclusive to the Sámi design tradition, but this makes them no less part of the Sámi cul- ture...

To calculate the transmission coe¢ cient for a tunneling e¤ect of the Coulomb barrier the phase - integral asymptotic approximation is a well-known alternative to solve the semi

In this survey we have asked the employees to assess themselves regarding their own perception about their own ability to perform their daily tasks according to the