Contents lists available atScienceDirect
Journal of Computer and System Sciences
www.elsevier.com/locate/jcss
Uniform parsing for hyperedge replacement grammars
Henrik Björklunda, Frank Drewesa,∗, Petter Ericsonb,a,Florian Starkec
aDepartmentofComputingScience,UmeåUniversity,Sweden
bDigitalandCognitiveMusicologyLab,ÉcolePolytechniqueFédéraledeLausanne,Switzerland cFacultyofComputerScience,TUDresden,Germany
a rt i c l e i nf o a b s t ra c t
Articlehistory:
Received24April2018 Accepted26October2020 Availableonline27November2020
Keywords:
Graphgrammar Hyperedgereplacement Uniformparsing Complexity
NaturalLanguageProcessing Meaningrepresentation
It iswellknownthathyperedge-replacement grammarscangenerateNP-completegraph languagesevenunderseeminglyharshrestrictions.Thismeansthattheparsingproblemis difficulteveninthenon-uniformsetting,inwhichthegrammarisconsideredtobefixed rather thanbeingpartoftheinput.Littleisknown aboutrestrictionsunderwhichtruly uniformpolynomialparsingispossible.Inthispaperweproposealow-degreepolynomial- timealgorithmthatsolvestheuniformparsingproblemforarestrictedtypeofhyperedge- replacementgrammarswhichweexpecttobeofinterestforpracticalapplications.
©2020TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Hyperedge-replacement grammars (HR grammars, for short) are context-free graph grammars that were introduced in [3,18], see also [17,11]. They represent one of the two most successful formal models for the description of graph languages (the other being confluent node-replacement grammars), becauseof their favorable algorithmic andlanguage- theoretic propertieswhichcloselyresemble thoseofcontext-freestringgrammars.Unfortunately,thesimilarities between the stringandgraphcases failto extendto oneof themostimportantcomputational problemsin thecontext offormal languages: theparsing problem. Ithas beenknownfora long time that eventhe non-uniformmembership problemfor context-freegraphlanguagesisintractable(unlessP=NP).Inparticular,therearehyperedgereplacementgraphlanguages whichareNP-complete [1,19].Severerestrictionsmustbeplacedonthegrammarsinordertomake atleastnon-uniform polynomial parsingpossible.Early resultsinthisregard can befound in [20,21,10].In [20] thedegree ofthepolynomial thatboundstherunningtimevarieswiththelanguage.Thealgorithmin [21],whichconsidersonlyedgereplacement,and its generalizationtohyperedgereplacementby [10] arecubicinthesize oftheinputgraph,butdependexponentially on the grammarifconsidered ina uniformsetting.Moreover, therestrictions [21] and [10] placed onthe considered graph languagesareverystrong,anditwasshownin [9] thatevenaslightrelaxationresultsinNP-completenessagain.Forthese reasons,theseparsingalgorithmsaremainlyoftheoreticalinterest.
Inrecentyearsthequestionofefficientlyparsinghyperedgereplacementlanguagesreceived renewedinterest,because hyperedgereplacementwasproposed asasuitable mechanismfordescribingsentencesemanticsinnaturallanguage pro- cessing, and in particular the abstract meaning representation (AMR) proposed in [2]. Regarding the use of hyperedge replacement in thisapplication area, see [7]. The same paper described a generalrecognition algorithm together witha
*Correspondingauthor.
E-mailaddresses:henrikb@cs.umu.se(H. Björklund),drewes@cs.umu.se(F. Drewes),petter.ericson@epfl.ch(P. Ericson),Florian.Starke@tu-dresden.de (F. Starke).
https://doi.org/10.1016/j.jcss.2020.10.002
0022-0000/©2020TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
detailedcomplexityanalysis.Unsurprisingly,therunningtimeofthealgorithmisexponentialeveninthenon-uniformcase, one oftheexponents beingthe maximumdegreeofnodesinthe inputgraph.The sameistruefortherecentalgorithm by [16] whichimplementsparsingforso-calledregulargraphgrammars.
Unfortunately, the node degree is one of the parameters one would ideally not wish to limit, since meaning repre- sentations do not have boundednode degree. Moreover, naturallanguage processingoften has to deal withalgorithmic learning situationsinwhichlargecorporamustbe parsedandgrammarsadjusted inan iterativeprocess. Thus,trulyuni- formpolynomial-time solutionswouldbe valuable, providedthat thepolynomials haveareasonably lowdegree andthe restrictionsonthegrammarsare“natural”.
Parsing agraphG withrespecttoagivenHRgrammarG meanstocheckwhetherthereisaderivationtreeinG that yields G.Thus, thetaskistodecompose G recursivelyinto subgraphsthat canbe generatedfromthe nonterminalsof G. Intuitively,theNP-completenessoftheproblemcomesfromthefactthatagraphhasexponentiallymanysubgraphs.Thisis themaindifferencebetweengraphandstringparsing.Inthelattercase,thewell-knowndynamic programmingapproach by Cocke, Kasami, andYounger isefficient because a stringhas only quadraticallymany substrings. One wayto achieve polynomial parsing in thegraph caseas well is to makesure that only polynomially manydecompositions are possible candidates forwell-formedderivation trees.Inthispaperweachieve thisby imposingrestrictions onG which guarantee that the overall shape of a suitable decomposition of G can be “read off” G itself. Intuitively,what remains is to check whetherappropriate rulesofG can beassignedtotheverticesofthisdecompositioninordertoturnitintoaderivation tree.
Anattemptata setofrestrictionsserving thispurposewasmadein [5].Motivatedbythefactthat meaningrepresen- tationssuchasAMRaretypicallyacyclic,HRgrammarswereconsideredthatgeneratedirectedacyclicgraphs.However,as acyclicityalonedoesnotmakeparsinganyeasieradditionalconditionswereplacedontheformoftherules.Inthepresent paper,wegeneralizetheapproach:thegeneratedgraphsmayhavecycles,theallowedrulesareconsiderablymoregeneral, andtherestrictions arefewer andformulatedinan axiomaticwaywhichallows fordifferentconcretizations.Weimpose twoconditionsonourgrammars,calledreentrancypreservation andorderpreservation.Thelatterisrelativetoanorderingof thenodesofinputgraphsthatcanbeinstantiatedindifferentways.
WeexpectourparsingalgorithmtobeusefulfordescribingandprocessinglanguagesofsemanticgraphssuchasAMR.In fact,typicalstructuresoccurringinsuchgraphsprovidedthestartingpointforthedevelopmentoftherestrictionsproposed in thispaper.Along withtheformal development ofthe notions andresultsleading to ourparsing algorithm,we tryto illustratethepotential usefulnessofourgrammarsby meansofasmallrunningexamplethat stretchesfromSection 2to Section6.ItshowshowtogeneratealanguageofAMR-likesemanticgraphsbyanHRgrammarthatsatisfiesourconditions foruniformpolynomialparsing(Examples2.4,4.3and5.4,aswellasSection6.1).
Letusbrieflydescribe theideabehind therestrictions weimposeon HRgrammarstomake themefficientlyparsable.
Whenworkingwithhyperedgereplacement,anonterminalhyperedgeisaplaceholderattachedtoasequenceofnodes.This placeholder willeventually be replacedby a subgraph thatshares the attachednodesofthe hyperedge(andonly those) withtherestofthegeneratedgraph.Onedifficultyparsinghastofaceisthat,afterthereplacementofahyperedge,itmay not bevisible intheresultinggraphwhich nodesthereplacedhyperedgehadbeenattachedto.Reentrancy preservation, whichisthefirstconditionwedescribeinthispaper,makesitpossibletorecoverthissetofnodessolelyfromthestructure ofthegeneratedgraph.
One difficultyremains:even iftheattachednodesofa nonterminalhyperedgecan uniquelybe recovered,it maystill be unclear inwhich order they had been attachedto the hyperedge.This is what is avoided by the condition oforder preservation.Itensures,forexample,thatarulecannotreplaceanonterminalhyperedgebyanothernonterminalhyperedge attachedtothesamenodesbutinadifferentorder.
Thankstothetworestrictions,weobtainauniformparsingalgorithmwhichisroughlyquadraticinboththesizeofthe grammarandthatoftheinputgraph.1
Asafinalnoteonrelatedwork,wementionherethatanotherrecentapproachtoefficientparsingforHRgrammarswas presentedin [13,14],wherepredictivetop-downandbottom-upparsersareproposed,generalizingtechniquesfromcompiler construction tothe graphcase. Theapproach thusdiffers fromoursinthat ityields a parsergeneratorwhich,withonly thegrammarasinput,constructsaquadraticparserforthespecificlanguagegeneratedbythatgrammar.Providedthatthe grammaranalysiscanbeperformedinpolynomialtime(whichdependsontheexactvariantoftheparsergeneratorused), thisapproachisthusuniformlypolynomialaswell.
The nextsection compiles thebasicnotions relevanttohyperedgereplacementgrammars.Section 3and4define and study reentrancyandorderpreservation,respectively.Section5presentsonepossibleconcretizationofourabstractnotion of preservedorders. The parsing algorithm and the main resultof thispaper are presented in Section 6, andSection 7 concludesthepaper.
1 Theexactrunningtimedependsonhowefficientlythechosenordercanbecomputed.
2. Preliminaries
The setofnon-negativeintegersisdenotedby N.Forn∈ N, [n]denotes {1,. . . ,n}.Givena set S, S∗ denotesthe set of all finite sequences over S,and S denotes the set ofnon-repeating sequences in S∗, i.e.those sequences in which no element of S occurs twice. The empty sequence is denoted by ε, S+=S∗\ {ε},and S⊕=S\ {ε}. The length ofa sequence w∈S∗isdenotedby|w|,and[w]denotesthesmallestsubset A ofS suchthat w∈A∗.Thecanonicalextensions of a mapping f:S→T to S∗ andto the powerset of S are denoted by f as well, i.e., f(a1· · ·ak)= f(a1)· · ·f(ak) for a1,. . . ,ak∈S,and f(S)= {f(a)|a∈S}forS⊆S.Asequencesw∈S∗withs∈S mayalsobedenotedby(s,w). 2.1. Orderingsubsetsofaset
As mentionedin theintroduction,one oftheprerequisitesof ourparsingalgorithm isa waytoordervarious subsets ofthenodesofan inputgraph.Consideranarbitrarybinaryrelation≺onaset S.Givenasequence w=s1. . .sk∈S∗,we saythat w isorderedby≺ifsi≺si+1foralli∈ [k−1],andmoreover,si≺sjimpliesi<j foralli,j∈ [k].Wefurthermore say that≺ orders asubset A⊆S ifthe elements of A canbe arrangedin asequence w whichis orderedby ≺.Clearly, if≺orders A,thissequence w isuniquelydetermined.Inthefollowing,wedenotethissequenceby A≺ (providedthat
≺ indeed orders A). Note that, for the sake of generality, we place no further restrictions on ≺, andit is thus neither necessarilyanorderon A (whereitmaylacktransitivity)noronS (whereitmaybeotherwiseentirelyarbitrary).
2.2. Hypergraphs
Throughoutthispaper,we fixa countablyinfinitesupply LAB ofsymbolscalledlabels,such thatevery σ∈LAB hasa uniquerank rank(σ)∈ N.Similarly,wefixcountablyinfinitesuppliesV andE ofverticesandhyperedges,respectively.
Definition2.1(hypergraph).A(directedhyperedge-labeled)hypergraph over⊆LAB isatupleG= (V,E,att,lab,ext)with thefollowingcomponents:
• V⊆VandE⊆E aredisjointfinitesetsofnodes andhyperedges,respectively.
• Theattachment att:E→V⊕ assigns toeachhyperedgee asequenceofattachednodes.Fore∈E with att(e)= (v,w) wealsodenotev bysrc(e)andw bytar(e),callingthemthesource andthesequenceoftargets ofe,respectively.
• Thelabeling lab:E→ assignsalabeltoeachhyperedge,subjecttotheconditionthatrank(lab(e))= |tar(e)|forevery e∈E.
• The sequence ext∈V⊕ is thesequence ofexternalnodes. IfextG= (v,w), then we denote thenode v by G and the sequencew ofnodesbyG ,respectively,andweimposetheadditionalrequirementthatsrc(e)∈ [/ G ]foralle∈E.2 Thesize|G|ofG is
e∈E|att(e)|.3
Notethatweforbidatt(e)(fore∈E)tocontainanynoderepeatedly.Inthefollowing,wesimplycallhyperedgesedges andhypergraphsgraphs.Ourdivisionoftheattachmentofeveryedgeintoasinglesourcenodeandanynumberoftarget nodesissimilartothatusedintheliteratureonterm(hyper)graphs.Itmakesitmeaningfultospeakaboutdirectedpaths (definedbelow).Ourgraphsare,however,moregeneralthantermgraphsinthatwe,forthemoment,donotimposefurther structuralconditionsonthem.
Throughoutthepaper,ifthecomponentsofagraph G arenotexplicitlynamed,wedenotethemby VG, EG,attG,etc.
Ifthe componentsof G are given explicitnames (and thusthe subscript isdropped) weextend thisinthe obviousway toderived notations,dropping thesubscripteventhere.Wefurthermoreusethenotation outG(v) todenotethesetofall outgoingedgesofanode v∈VG,i.e.,outG(v)= {e∈EG|srcG(e)=v}.
Anisomorphism h:G→H isapairofbijectivemappings(hV:VG→VH,hE:EG→EH)suchthatattH◦hE=hV◦attG, labH◦hE=labG,andextH=hV(extG).IfsuchanisomorphismexistswewriteG≡H andsaythatthegraphsareisomorphic.
Apath oflengthk∈ N fromu∈V toe∈E inG isasequence p=e1· · ·ek∈E+wheresrc(e1)=u, src(ei+1)∈ [tar(ei)] forall i∈ [k−1],andek=e. Iffurthermore v isanode in[tar(ek)]then pv is apathfrom u to v.Both p and pv pass the nodes src(e2),. . . ,src(ek),and we saythat p contains e1,. . . ,ek aswell assrc(e1),. . . ,src(ek),while pv additionally contains v.Ifsrc(e1)∈ [tar(ek)],thepathisacycle.Wesaythatthepathisasourcepath ifu=G.
Anode v oran edgee isreachable fromanode u ifu=v orthereisapathfromu tov orfromu toe, respectively.
Wesimplysaythatv ande arereachableinG iftheyarereachablefromG.IfG isclearfromthecontextwemayjustwrite
2 Recallthat[G ]denotesthesetofnodesoccurringinG .
3 Thissimpledefinitionofsizeissufficientandappropriateforourpurposesastheclassesofgrammarsconsideredinthepaperonlygenerateconnected hypergraphs,andbythedefinitionofhypergraphsitholdsthatexternalnodesarepairwisedistinctand1≤ |att(e)|≤ |V|for allhyperedgese.Thus,
|V|≤ |G|,|E|≤ |G|,and|ext|≤ |G|.
s
u
v w
v w
A e
c f
gb a h
Fig. 1. Example drawing of a graph G.
“reachable” instead of“reachable inG”. Notethat, by definition,paths arealways directed, andthus all ofthesenotions refertodirectedpaths.
Therank ofG= (V,E,att,lab,ext)isrank(G)= |G |andthatofe∈E isrankG(e)=rank(lab(e)).Thein-degree ofanode u∈V is|{e∈E|u∈ [tar(e)]}| anditsout-degree is |{e∈E|src(e)=u}|.Anodeofout-degree 0isa leaf,andanode v of in-degree 0, suchthateveryother nodeinV isreachablefromv,isaroot.Thus,therootofagraphisuniqueifitexists.
Ifitdoes,we saythat G isrooted.Notethat, iftherootisG,thenthewholegraphG isalsoreachable.Notefurthermore that,byourgeneralconditiononthesourcesofedges,allnodesinG areleaves.Thereadershouldkeepthisfactinmind becausewewilloccasionallymakeuseofitwithoutexplicitlymentioningit.
Foralabel A ofrankk,welet A• denotethegraph({0,. . . ,k},{e},att,lab,0· · ·k)suchthatatt(e)=0· · ·k,andlab(e)= A.
2.3. Drawingconventions
WedrawgraphsasshowninFig.1: externalnodesaredepictedasbulletsandnon-externalonesascircles.Thenode G isalways thetopmostbullet.An edgee∈EG isdepictedasaboxwiththeedgelabelinscribed,whichcanbedropped if itisnotrelevant.TheattachmentattG(e)isindicatedbyalinedrawnfromsrcG(e)to(theboxrepresenting)e,andarrows pointingfrome tothenodesintarG(e).ThearrowsleavetheboxintheorderinwhichtheyappearintarG(e),fromleftto right.Similarly,thenodesinG arearrangedfromlefttoright.Forexample,inthefigurewehavetarG(e)=uv,G=s,and G =v w.
2.4. Hyperedgereplacement
LetH andF begraphsande∈EH suchthat VH∩VF= [extF],EH∩EF = ∅,andattH(e)=extF.Theresultofsubstituting e by F inH isthegraphG=H[e:F]suchthatG= (VH∪VF,(EH∪EF)\ {e},attG,labG,extH)with
attG(f)=
attH(f) if f∈EH\ {e} attF(f) if f∈EF
labG(f)=
labH(f) if f∈EH\ {e} labF(f) if f∈EF.
For graphs H and F and an edge e∈EH withrankH(e)=rank(F) it should be clearthat we mayalways choose an isomorphiccopy F ofF suchthat H[e:F]isdefined.Toavoidthecumbersometechnicalitiesofconstantlyhavingtodeal with explicitisomorphisms, we shall thereforealways assume that F itself fulfills the requirements.If it does not, it is assumedthat F issilently replacedby anappropriate isomorphic copy.Notethat thisispossibleby ourassumptionthat neitherattachmentsofedgesnorthesequencesofexternalnodesofgraphscontainrepetitions.
For the remainder of the paper, we assume that LAB is partitioned into two disjoint subsets LABN and LABT, both countably infinite,whoseelements are callednonterminals and terminals,respectively.Naturally, a terminal(nonterminal) edgeisanedgelabeledbyaterminal(nonterminal,respectively).Wesometimesjustcallthemterminalsandnonterminals ifthereisnodanger ofconfusion.Byconvention, weusecapitalletterstodenotenonterminals,andlowercaselettersfor terminalsymbols.Furthermore,werefertothegraphH above,inwhichthereplacementtakesplace,asthehostgraph.
Definition2.2(hyperedgereplacementgrammar).Ahyperedgereplacementgrammar (HRgrammar,forshort)isasystemG = (,N,S,R) where⊆LABT, N⊆LABN, S∈N is theinitialnonterminal,and R isasetofrules, alsocalledHR rules.Each ruleisoftheformA→F where A∈N andF isagraphover∪N withrank(F)=rank(A).
Thesize ofG is|G|=
(A→F)∈R|F|.
Forgraphs G,H , we let H⇒RG if there exista rule A→F∈R and an edge e∈EH withlab(e)=A such that G= H[e:F].Asusual, ⇒∗R denotesthereflexivetransitiveclosureof⇒R.Ifthereisnodangerofconfusionwe oftenwrite ⇒