Ontology Slice Generation and Alignment for Enhanced Life Science Literature Search

Full text

(1)Institutionen för datavetenskap Department of Computer and Information Science Final thesis. Ontology Slice Generation and Alignment for Enhanced Life Science Literature Search by. Jonas Bergman Laurila LIU-IDA/LITH-EX-A--09/002--SE 2009-01-26. Linköpings universitet SE-581 83 Linköping, Sweden. Linköpings universitet 581 83 Linköping.

(2) .

(3) Linköpings universitet Institutionen för datavetenskap. Final thesis. Ontology Slice Generation and Alignment for Enhanced Life Science Literature Search by. Jonas Bergman Laurila LIU-IDA/LITH-EX-A--09/002--SE 2009-01-26. Supervisor:. Patrick Lambrix IDA, Linköpings universitet. Examiner:. Patrick Lambrix IDA, Linköpings universitet.

(4) .

(5) Abstract Query composition is an often complicated and cumbersome task for persons performing a literature search. This thesis is part of a project which aims to present possible queries to the user in form of natural language expressions. The thesis presents methods of ontology slice generation. Slices are parts of ontologies connecting two concepts along all possible paths between them. Those slices hence represent all relevant queries connecting the concepts and the paths can in a later step be translated into natural language expressions. Methods of slice alignment, connecting slices that originate from different ontologies, are also presented. The thesis concludes with some example scenarios and comparisons to related work..

(6)

(7) Acknowledgement Thanks goes to Patrick Lambrix, for supervising. José M. Peña, for a lecture about adjacency matrices, and usage of ditto. Maria Lingemark, for the opposition. Family and friends, for constantly asking harassing questions like "Aren’t you supposed to have finished that master thesis by now?".

(8)

(9) Contents 1 Introduction 1.1 Problem statement and contribution . . . . . . . . . . . . . . 1.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Ontologies . . . . . . . . . . . . . . . . . 2.1.1 Definition . . . . . . . . . . . . . 2.1.2 Components . . . . . . . . . . . 2.1.3 OWL, RDF and XML . . . . . . 2.1.4 Examples . . . . . . . . . . . . . 2.1.5 Ontology Alignment . . . . . . . 2.1.6 Instantiation . . . . . . . . . . . 2.1.7 Description logics and reasoning 2.2 Natural Language Processing . . . . . . 2.2.1 Natural Language Generation . . 3 Slice creation 3.1 An ontology traversing method . 3.2 A matrix method . . . . . . . . . 3.3 Method comparisons: Advantages 3.3.1 Reuse of data . . . . . . . 3.3.2 Matrix multiplication . . 3.4 Implementation . . . . . . . . . . 3.4.1 Libraries . . . . . . . . . .. 1 2 3. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 5 5 5 6 6 7 8 8 9 9 10. . . . . . . . . . . . . . . . . . . . . . . and disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 11 13 16 19 20 20 20 21. . . . .. . . . .. . . . .. . . . .. . . . .. 23 23 24 26 26. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 4 Slice alignment 4.1 Align slices using graphs . . . . . . . . . . . . . . . 4.2 Align slices using matrices . . . . . . . . . . . . . . 4.3 Large graphs creation versus merging of ontologies 4.4 Implementation . . . . . . . . . . . . . . . . . . . .. . . . .. 5 Results 27 5.1 Slice creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.1.1 The ghrelin scenario . . . . . . . . . . . . . . . . . . . 27 iii.

(10) 5.2. 5.1.2 Statements retrieved from the Pathway Ontology . . . Slice alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 The inflammation scenario . . . . . . . . . . . . . . . 5.2.2 Statements retrieved from the SIGNAL-ONTOLOGY and the Gene Ontology . . . . . . . . . . . . . . . . .. 27 29 29 29. 6 Conclusion 33 6.1 Comparison with related work . . . . . . . . . . . . . . . . . . 33 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Bibliography. 37. iv.

(11) Chapter 1. Introduction This master thesis is a part of a larger project proposed and led by Patrick Lambrix1 and Christopher J.O. Baker2 . The project aims to, as the title imposes, enhance life science literature search. One of the major problems in search is the fact that the users of information lack knowledge of query technologies and the underlying structures of data. E.g. a biologist may have questions and ideas formulated in the language of his own domain but might not be able to use those ideas and questions to compose a query. Another problem is that a user might lack knowledge in his or her own domain needed to formulate a question even in natural language. In systems of today, if a user performs a keyword search e.g. ’enzyme’, all documents that contain the word ’enzyme’ are retrieved. In more advanced systems documents containing subclasses of enzyme are also retrieved. But the user might also be interested in concepts closely related to enzyme. The problem often is that he or she lacks prior knowledge of related concepts and cannot formulate an advanced enough query. This problem can be solved by using ontologies, which contain such information. A system using such knowledge could produce expressions related to ’enzyme’ and present it to the user, an example could be ’which enzyme has been reported to be found in fungi, and acts on substrate?’. A click on the expression would produce answers in the form of a table with instances, in this case different kinds of enzymes, fungi and substrates. Also links to documents containing provenance information are displayed, to either the complete expression or parts of it. A first goal of the project as whole is to provide the user with possible queries presented to him or her in the form of natural language expressions, and will thus support the user with hints and surveys of what actually can be found in large sets of scientific literature. Also if possible direct answers to 1 2. Department of Computer and Information Science, Linköpings universitet, Sweden. Data Mining Department, Institute for Infocomm Research, A-STAR, Singapore.. 1.

(12) the queries presented should be given. Another goal is to create expressions containing terms from different ontologies, to connect different fields and topics. Previous work has been done, mostly in form of two systems: KnowleFinder [Ang et al., 2008] and SAMBO [Lambrix and Tan, 2006]. KnowleFinder is an existing system which use a more narrow idea than the one presented above, narrow mostly because the graph mining algorithms do not make use of all ontology-contained knowledge. Aligned ontologies are not used in KnowleFinder either. It is still a great start though. SAMBO is a system that is used to align and merge biomedical ontologies. (Read more about ontologies and the alignment of ditto in section 2.1.) SAMBO could produce the alignments needed in future graph-mining algorithms (Slice Aligner in the list below). To reach the goals of this project, technologies from both research groups are needed. A list of modules needed in a future system is stated below. Two of them, Slice Generator and Slice Aligner, are parts investigated more in detail in this thesis. • Names Entity Recognizer. Recognizes and normalizes named entities (ontology terms) in text. • Ontology Instantiator. Instantiates ontologies with text segments based on the occurrence of the ontology terms in the text segments. Read more about ontology instantiation in section 2.1.6. • Slice Generator. Finds the paths in an ontology connecting (two) given concepts in the ontology. Read more about slice creation in chapter 3. • Slice Aligner. Finds connections between two slices from different ontologies. Read more about slice alignment in chapter 4. • Annotator. Annotates the documents and segments using aligned slices based on the occurrence of ontology terms on the documents/segments and slices. • NLP Slice Translator. Generates expressions in natural language, based on the slice-contained pathways. Read more about NLP in section 2.2. • Query Engine. Answers queries previously generated or entered by the user.. 1.1. Problem statement and contribution. In order to create a Slice generator and a Slice Aligner, some questions had to be considered. What should an ontology slice consist of? How 2.

(13) much information should it contain? How can slices be aligned and in what purpose? What previous work has been done and, in case of existence, how does it appear to fit our problem? Can it be altered to become a suitable solution or do I have to start over from scratch? In the case of slice generation, previous work had been done in the form of an algorithm named ARQ which can retrieve paths between two concepts in an ontology. To create a solution more adequate to our project I altered the ARQ algorithm to make use of more ontology contained information (An ontology traversing method, section 3.1 page 13). I also created an alternative solution, with a different approach using matrices as a represenions of ontologies (A matrix method, section 3.2 page 16). Although previous work with ontology alignment have been made, slice alignment is something new. Therefore I have stated a first, fairly narrow, approach of slice alignment in my thesis. The algorithms I created to make use of this approach retrieved paths between the most important slice contained concepts, the search concepts which previously were used to create the actual slices. Those paths could then extend the slice contained paths, to cover concepts occuring in different fields or different aspects of the same field.. 1.2. Thesis structure. This first chapter presents the thesis in short. The second chapter covers background information about ontologies and natural language processing. The third and fourth chapter covers the methods of slice creation and alignment respectively, with justifications, comparisons and words about the implementation. Chapter five shows some results connected to example scenarios. The last chapter is a conclusion, including comparisons with related work and words about future work.. 3.

(14)

(15) Chapter 2. Background 2.1. Ontologies. 2.1.1. Definition. There are different definitions of ontologies. The word ontology was first used in philosophy and was then borrowed and used by researchers of Artificial Intelligence and Knowledge Representation. And I quote one of the best known definitions by Tom Gruber [Gruber, 1993]: "An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what “exists” is exactly that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names are meant to denote, and formal axioms that constrain the interpretation and well-formed use of these terms." The definition has been criticized and is still being discussed. A shorter and less high-flown description of an ontology could be the following: An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. 5.

(16) 2.1.2. Components. Now that we got some definition for ontologies that occur in the field of Knowledge Representation, it is time to see what they consist of. Four main components of such an ontology are [Stevens et al., 2000]: • Concepts, sets or classes of entities. e.g. human, man, woman. • Relations between concepts. e.g. human is a mammal. • Axioms can be seen as rules or facts that are always true. e.g. A man can never be a woman. • Instances. Most ontologies don’t have instances. Those who have are simply called instantiated ontologies and contain entities of some concept. e.g. Jim or Carol are instances of human.. 2.1.3. OWL, RDF and XML. For those ontologies that are to be seen as knowledge bases1 (or those that just define the structure of a knowledge base) the standard language for representation is Web Ontology Language (OWL) [W3C, 2004]. OWL is built upon the Resource Description Framework (RDF) which is designed to be read and understood by computers rather than by humans. RDF is written in Extensible Markup language (XML) and can, because of that, easily be exchanged between different types of operating systems and application languages. XML is used to store and transport data. XML focuses on what data is instead of how data looks, which is the main goal of the more well-known markup language HTML. XML, RDF and OWL are all recommended by the World Wide Web Consortium (W3C) which makes them standards for industry and the web community. OWL is divided into three sub languages, OWL Lite, OWL DL and OWL Full with different levels of expressiveness. • OWL Lite supports those users primarily needing a classification hierarchy and simple constraint features. It can almost reach the same expressiveness as OWL DL with some tricks such as building complex combinations of the OWL Lite features. • OWL DL supports those users who want the maximum expressiveness without losing computational completeness. It can thus be used by reasoning systems. More about reasoning in section 2.1.7. • OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. 1. The part of an expert system that contains the facts and rules needed to solve problems. Ontologies become machine-readable knowledge bases if they are instantiated. Read more about instantiation in section 2.1.6.. 6.

(17) 2.1.4. Examples.

(18)

(19)

(20)

(21) . .

(22) . .

(23)

(24) . Figure 2.1: Visualization of three concepts in the adult mouse anatomical dictionary..

(25) -&((((./0,

(26) , ,

(27) ,

(28)

(29) +-&((((./1", "

(30) , ,

(31) + & ", " , " , "

(32) ,

(33)

(34) ,2"

(35) ,

(36)

(37) , '

(38) "

(39) ,

(40)

(41) , '2"

(42) , "

(43) ,

(44)

(45) +-&((((./1,

(46) , ,

(47) +-&((((.1)", ,

(48) + & ", " , " , "

(49) ,

(50)

(51) ,' '"

(52) ,

(53)

(54) ,

(55) "

(56) , "

(57) , ",. Figure 2.2: Two concepts, or classes as they are called in OWL Two examples of ontology entries written in OWL are displayed in figure 2.2. The entries are gathered from the Adult Mouse Anatomical Dictionary (MA) [Hayamizu et al., 2005]. The first entry is ear bone and is a part of the middle ear which is the second entry. The names and synonyms of the concepts are represented as labels. Also a diagram for the concepts and the relations between them are shown in figure 2.1. 7.

(58) 2.1.5. Ontology Alignment. Since there often exist more than one ontology within a certain domain, focusing on different aspects of the domain or just simply made by different companies or groups, the need for ontology alignment arises i.e. finding mappings between concepts in different ontologies. One can assume that many of these ontologies contain overlapping information, which is the case for many ontologies covering the field of anatomy. At time of writing there exist 26 anatomy ontologies published on OBO2 . In the alignment process, concepts and relations that occur in both ontologies are searched for by using different techniques such as these below [Lambrix, 2004, Lambrix and Tan, 2008]. • Linguistic matching. Concept similarity is measured with string matching. Names, synonyms and other attributes of concepts are used. A somewhat simple approach and might miss alignments due to usage of different naming. • Structure-based strategies. By using information about a concept’s environment e.g. parents and children, similarity can be measured between concepts. Relations as is-a and part-of can define such an environment. Usually the process is iterative, because of the need of some prior structure knowledge. Therefore it is a good idea to combine this technique with others. • Constraint-based approaches. Uses information given by axioms. Not commonly used. • Instance-based strategies. As I mentioned earlier in section 2.1.2, instantiated ontologies are not very common. In the case they are, information from instances can be used to measure similarity between the concepts to which they belong. Together with the above mentioned techniques, auxiliary information is also used. This includes the usage of thesauri and dictionaries or information given from previous alignment attempts. When ontologies have been aligned it is also possible to merge the ontologies into a new ontology. The newly created ontology contains all the knowledge included in the source ontologies and also the inter-ontology relations, that were found in the alignment step. There exist many systems which can align ontologies, for further reading I suggest the Ontology Matching website3 .. 2.1.6. Instantiation. Instantiation is the process where an ontology gets populated by data, that for instance were gathered from scientific literature. You may say 2 3. Open Biomedical Ontologies, http://www.obofoundry.org, [Smith et al., 2007]. Ontology Matching, http://www.ontologymatching.org/.. 8.

(59) that the instantiated ontology becomes a machine-readable knowledge base [Lambrix et al., 2007]. At the moment there exist a number of tools which assist in the manual creation of instantiated ontologies. An example is Knote which creates a prepared web based form (concepts are already inserted from the ontology) [Motta et al., 2000]. Manual creation is considered timeconsuming, which has led to the creation of semi-automatic tools, few tools of that kind exist at the moment though.. 2.1.7. Description logics and reasoning. An OWL DL ontology is equivalent to a Description logic (DL) Knowledge base. Description logics is a family of logic based knowledge representation formalisms. The different DLs are separated by their expressiveness with a denotation system displayed below (at least parts of it). OWL-DL provides the expressiveness of SHOIN (D) . • S. This allows, among other things, basic booleans as ∩, ∪ and ¬, plus restricted quantifiers ∃ and ∀. • H for role hierarchy (e.g., hasDaughter ⊆ hasChild) • O for nominals/singleton classes (e.g., {Sweden}) • I for inverse roles (e.g., isChildOf ≡ hasChild−1 ) • N for number restrictions (e.g., ≥ 2hasChild, ≤ 3hasChild) •. (D). Use of data type properties (e.g., integer and string). There is much more to say about description logics, I suggest The Description Logics Handbook [Baader et al., 2003] for further reading. Most important for the work stated in this thesis, is that if an ontology is built restricted to the rules given by this description logic, then reasoners can be used in a proper way. Reasoners are made to help with design and maintenance of ontologies, i.e. check for unsatisfiabilities in the ontology (inconsistencies, contradictions and so on). Furthermore, reasoners can be used in querying over ontology classes and instances for finding more general or specific classes and to retrieve individuals matching a given query. In the implementation of my work I used Jena as a reasoner. Read more about Jena in section 3.4.1.. 2.2. Natural Language Processing. Natural Language Processing (NLP) can be divided into two opposite approaches [Paris et al., 1991]. 9.

(60) • NLG, Natural Language Generation. Convert information stored in computers into normal-sounding human language. Interesting to us and will be described more in detail. • NLU, Natural Language Understanding. Convert human language into more formal representations, which can be ’understood’ more easily by computers.. 2.2.1. Natural Language Generation. Natural Language Generation will be used during this project in the step where pathways in slices are converted into natural language statements. The three steps, shown below, are almost always performed by NLG systems [Reiter, 2000]. Content determination and text planning are often done simultaneously. Content determination: decisions are made on what information is to be shown to the reader. Text planning: the overall rhetorical structure is made. Those tasks could, if the underlying data structures are well-known, be hard-coded into the NLG system. It will lose some flexibility but is more robust than a, somewhat more sophisticated, AI approach. Sentence planning is used to make the text more easily read without altering the information. e.g. aggregations, Ribonuclease A contains βsheets. Ribonuclease A contains α-helices. into Ribonuclease A contains β-sheets and α-helices. Sentence planning also includes pronominalization4 and introduction of discourse markers5 . Realization is used to create individual sentences, which are grammatically correct or close enough. The realization step includes morphology (e.g. α-helices instead of α-helixs), agreement (e.g. I am instead of I is) and reflexives (e.g. Water reacts with itself instead of Water reacts with water).. 4. Replacing nouns with pronouns which get their meaning from the context. e.g. I, She and Who. 5 A word or phrase that marks a boundary in a discourse. e.g. like and also.. 10.

(61) Chapter 3. Slice creation In this chapter I will describe methods of slice creation. A slice should represent all relevant queries containing the given concepts in an ontology i.e. not intentionally relevant in an user perspective, and will thus not include queries containing concepts, or relations between concepts, that do not occur in the ontology. Figure 3.1 is an attempt to visualize slices and their relations to documents. Slices are marked out as parts of ontologies connected to one or more document. Each document can also be connected to more than one slice. The actual connections are made between concepts (occurring in slices) and documents rather than slices and documents, to make it possible to retrieve documents related to parts of expressions in addition to the entire expression. Below are some more precise definitions needed to fully understand the algorithms in the oncoming sections. • Path. In graph theory a path is a sequence of vertices such that from each of its vertices there is an edge to the next vertex in the sequence. A path is acyclic i.e. it is a sequence with no repeated vertices [Chartrand, 1985a]. In the following methods, ontological concepts and relations will be seen as vertices and edges respectively. • Slice. A slice is a set of all possible paths between two concepts. • Search Concept. Search concepts are the concepts from which slices are created. i.e. start/end-vertices in the paths. • Domain and Range. Both can be used in an OWL object property. Object properties are used to define relations between concepts. Domain restricts which resources that can have the property and Range restricts which values the property can have. E.g. the relation ’has Sentence’ should have document or text as domain and sentence as range. 11.

(62) In some of the algorithms I have used the words domain and range in every kind of relation as a way to define directions in the relations. A slight abuse of notation one might say.. . . . Figure 3.1: Uppermost is a visualization of two aligned ontologies containing three slices. At the bottom is a visualization of documents and their relations to the above slices. An algorithm used to retrieve paths between concept, called the ARQ algorithm and part of the KnowleFinder system, traversed the ontology by walking between object properties in a directed manner, from domain to range. Since many ontologies have a more hierarchy-like structure, which use subclass relations more frequently than object properties, much information 12.

(63) will not be retrieved with the ARQ-algorithm. The algorithm does look at the ancestor concepts and investigated their involvement in object properties but it does not preserve the subclass relations, which could in fact be of importance for the user performing a search. My approach differs from this one, as I assume no directions (making it possible to travel from range to domain in an object property) in the ontologies while traversing and subclass relations are treated as equal to object properties. This will retrieve more information but may create somewhat messy expressions. So if they are to be used with the NLG algorithms already used within KnowleFinder some extensions might have to be done. But the goal as I see it must be to handle those messy expressions in the future also.. 3.1. An ontology traversing method. This method is similar to the ARQ algorithm used within KnowleFinder, but is extended with subclass relations and assumes no directions in ontologies. Two search concepts are used as input, together with the corresponding ontology. The method is divided into two steps. The reason is to avoid visiting a concept more than one time. The first step is to retrieve certain and uncertain paths between the two given search concepts in the ontology. This can be made with algorithm 1. Uncertain paths are those who have encountered an already visited concept and are therefore stopped, certain are those who did not get stopped and got all the way to the other search concept. The next step is to check if the uncertain paths could append parts of the certain paths to become certain paths themselves. This can be made with algorithm 2, which takes certain and uncertain paths previously created as input. The algorithm works in an iterative manner as follows: For each uncertain path retrieve the concept which caused the stop (the one already visited). Walk through the relations in all certain paths from source → target and compare the concept with domains in those relations. Stop this when they are equal and append all the relations, from this point to the last relation in the certain path, to the uncertain path. This path can now be considered a certain path. Continue in this way until no more new certain paths are found. The output consists of a slice between the two search concepts.. 13.

(64) Algorithm 1 getPaths(required parameters) Require: Csource , Ctarget , pathT hisF ar Ensure: Certain and uncertain paths between Csource and Ctarget Add Csource to visitedlist //globally Add Csource to pathT hisF ar / pathT hisF ar to relationlist Retrieve adjacent relations with range1 ∈ if relationlist is empty then Remove Csource from visitedlist 2 end if for all R in relationlist do if R.range = Ctarget then Create a new path Add R to path Add path to paths else if R.range ∈ / visitedlist then pathsF romRange ← getPaths(R.range,Ctarget ,pathT hisF ar) for all path in pathsF romRange do Add R to path Add path to paths end for else Create a new path Add R to path path.visitedConceptW asF ound = true Add path to paths end if end if end for return paths. 1. Adjacent relations could be subclasses, superclasses, object properties and also inverse object properties. Every kind of relation is translated to a relation with domain and range as object properties have, marked up to make it easy in a later step to translate back to the original relation. Note that the range in this case can be domain in the underlying object property, if the relation is an inverse object property. 2 When relationlist is empty it tells us that we have come to a dead end. Dead ends could occur when the next step would make a loop (range ∈ pathT hisF ar). If that is the case, then other paths should still be able to walk through this concept.. 14.

(65) Algorithm 2 correct(required parameters) Require: paths Ensure: corrected paths //Separate uncertain from certain pathways for all path in paths do if path.visitedConceptW asF ound then Add path to uncertains else Add path to certains end if end for repeat correctionM ade ← false //Copy paths to avoid conflicts while iterating certainsCp ← certains uncertainsCp ← uncertains for all uncertain in uncertainsCp do //Retrieve the already visited concept range ← uncertain.f irstElement.range for all certain in certainsCp do connectionF ound ← false Create a new path tempP ath //Iterate through certain in order: source → target for R = certain.last to certain.f irst do if !connectionF ound then if range = R.domain then Add R to tempP ath connectionF ound ← true end if else Add R to tempP ath end if end for if tempP ath is free from loops then Turn tempP ath the other way around Append uncertain to tempP ath if tempP ath ∈ / certains then Add tempP ath to certains correctionM ade ← true end if end if end for end for until !correctionMade return paths 15.

(66) 3.2. A matrix method. . . . . Figure 3.2: An undirected graph with five vertices and six edges. As ontologies can be seen as graphs, with concepts and instances as vertices and different kinds of relations as edges between those vertices, it is also possible to represent ontologies as matrices. One way is to use an adjacency matrix. For a graph like the one in figure 3.2, the corresponding adjacency matrix is shown in equation 3.1. where ai,j = 1 if vertices i and j are adjacent, meaning they have an edge between them, or ai,j = 0 otherwise. [Chartrand, 1985b] Consider the matrices in equations 3.1 - 3.3. At a1,5 the number of direct edges between the vertices related to index 1 and 5 are displayed. In this case it says: "There exists no 1-way between vertex 1 and 5." When the adjacency matrix is multiplied with itself it gives us information about 2-ways between different vertices, which means ways that include walking across two edges (not necessarily two different). E.g. a21,5 = 1 in matrix A2 tells us: "There exists one single 2-way between vertex 1 and 5." In the last matrix, A3 , the number of 3-ways between vertices are displayed. a31,5 = 2 tells us: "There exist two different 3-ways between vertex 1 and 5." If we continue to multiply with the adjacency matrix in this manner, we could of course for a matrix AN get information about N-ways between vertices.   0 1 1 1 0 1 0 0 1 0     (3.1) A = 1 0 0 1 0   1 1 1 0 1 0 0 0 1 0 16.

(67) . 3 1   A2 = 1  2 1 . 4 5   A3 = 5  6 2. . 1 2 2 1 1. 1 2 2 1 1. 2 1 1 4 0. 1 1   1  0 1. 5 2 2 6 1. 5 2 2 6 1. 6 6 6 4 4. 2 1   1  4 0. (3.2). . (3.3). What have those adjacency matrices to do with slice creation, where the goal is to retrieve all possible paths between given concepts in an ontology? Those matrices only tell us about the number of N-ways, or do they? The thing is that they actually contain more information if they are used together. Say that we have looked into matrix A3 and discovered that between vertex 1 and vertex 5 there exist two different 3-ways, then we can backtrack with help of matrix A2 and A to discover the underlying 2-ways and 1-ways that constitute the 3-ways. This is done by looking at the sum shown below a31,5 =. 5 Ø. a21,k ak,5. (3.4). k=1. If a2i,k ak,j Ó= 0, there exist a number of 2-ways between vertices i and k, and 1-ways between vertices k and j, which together results in a number of 3ways between vertices i and j. With this as background I propose a method for slice creation, in a manner as follows. First some preparations: • Create an adjacency matrix representing a given ontology. Also create an array with concepts to keep a connection between matrix indices and related concepts and a matrix with information about the edges (or relations). • Multiply the adjacency matrix with itself as many times as we wish. Now we can choose how long paths we actually want, E.g. creating A9 will give us paths with maximum length ten. Everything made to this point can be stored and used for slice creation within the given ontology. Then the actual slice creation: • For two given concepts as input, that we wish to create a slice between i.e. search concepts, retrieve the appropriate indices by using the array of concepts. 17.

(68) • Calculate all paths between the indices, by using algorithm 3 on page 19, for all N between 1 and the maximum path length. Note that the algorithm also makes sure that a path does not contain any loops, and thus follows the definition of a path which I previously stated. This is done by storing the visited indices and sending them forward in the recursion. • Represent the slice in a way we wish as output, by translating the indices previously retrieved to actual concepts and relations. This is done by using the previously created array of concepts and the matrix with information about the edges.. 18.

(69) Algorithm 3 getPaths(required parameters) retrieves all the paths, with a certain length N between concepts with indices i and j in the Adjacency matrix A Require: N , i, j, visitedIndices //visitedIndices should be initialized with i and j to ensure that no loops //occur in the paths. Ensure: paths if N = 1 then if aij Ó= 0 then Add relation Rij to path Add path to paths return paths else return null end if else qn N −1 //Investigate the sum aN k=1 ai,k ak,j i,j = //where n = number of rows or columns of A for k = 1 to n do −1 if aN Ó= 0 and ai,k Ó= 0 and k ∈ / visitedIndices then i,k Add k to partOf P athList end if end for for all k in partOf P athList do paths ← getPaths(N − 1, i, k, [visitedIndices, j]) for all path in paths do if path Ó= null then Add relation Rk,j to path Add path to allP aths end if end for end for return allP aths end if. 3.3. Method comparisons: Advantages and disadvantages. Ontologies are often very large, it is not uncommon that they constitute over 10’000 concepts and numerous of relations between them. Ontology reasoning can therefore be a very demanding operation. Keeping algorithms effective is very important, at least when they are to be used in e.g. online search engines. As an example, traversing through large graphs in a recursive 19.

(70) manner can produce large lists and other kinds of data structures, even if handled correctly. I should say that my main task was not to keep everything as effective as possible, the main goal was to produce something that worked. But it is still an attribute for me to consider when comparing methods.. 3.3.1. Reuse of data. When comparing the two methods of slice creation, one thing that comes to mind is reuse of data. When using the traversing method, you have to begin from start in each slice creation step. But if matrices are used, much of the computations are done only once for each ontology. Possibly more could be stored during the matrix creation than thought of in the method described previously, maybe whole paths could be stored instead of just the number of paths that exist. This could be made by adjusting methods for matrix multiplication, which is not an easy task since they are very complex as it is. To sum it up, the ability to reuse data is definitely an advantage for the matrix method.. 3.3.2. Matrix multiplication. Matrix multiplication is a very heavy calculation. Algorithms are often written to suite special kind of matrices. Attributes as sparseness and symmetry affects the way matrix multiplication algorithms should be created. The sparseness of the matrices used here differs depending on the look of corresponding ontologies. Large ontologies with few and deep relations will create sparse adjacency matrices. But smaller or shallower will create dense adjacency matrices. Also, after some multiplications the sparseness will decrease. Therefore, choosing effcient algorithms for matrix multiplication is not an easy task. This will be left for future consideration. The adjacency matrix and its multiples will be symmetric, at least when the corresponding ontologies are considered to be undirected. This is great since it will lower the computation costs. Considering those heavy computations, it could be a disadvantage for the matrix method, but keep in mind that it only has to be done once for each ontology.. 3.4. Implementation. The methods were implemented with JavaT M 2 Platform Standard Edition 5.0 (J2SET M 5.0). Some external libraries were also used (see below). The implementation was divided into three parts: • A Matrix Factory. Creates an adjacency matrix and multiples of the adjacency matrix which corresponds to a certain ontology. The 20.

(71) matrices are stored in a directory chosen by the user and can be used during the future path finding and alignment steps. • Slice generators. Two different slice generators were implemented. One which used the adjacency matrices and one which traversed the ontologies. The output consist of an xml-file, with a format as in figure 5.1. 3.4.1. Libraries. In addition to java standard libraries, I also used libraries that, at this moment, are free and open source. • Jena1 . A framework for building semantic web applications. It includes a RDF API and an OWL API. It also support In-memory and persistent storage (useful when working with large ontologies). Jena was used during my work to read and write ontologies and to retrieve certain ontology contained objects. • Colt2 . Provides a set of libraries for high performance scientific and technical computing in Java. It is a great example of how java no longer can be seen as unsuited for such works. This package was used during my work to handle matrices and algorithms for matrices. It includes support for both sparse and dense matrices, matrices of dimension up to three and matrices containing just about any object. It is developed and used at Cern.. 1 2. http://jena.sourceforge.net/ http://acs.lbl.gov/˜hoschek/colt/. 21.

(72)

(73) Chapter 4. Slice alignment Slices can be created from different ontologies, covering different aspects and concepts. Sometimes ontologies contain overlapping information and slices could have been created with the same or a similar purpose but from different ontologies. To present information from both ontologies in a proper manner, the need for slice alignment arise. A simple approach would be to create a merged ontology in an early step, before slice creation. This is a trivial case and will not be covered in my thesis. Sometimes it is not possible to merge ontologies or it is simply not wished for. But still the need for alignments between slices can exist, e.g. to make aggregation of expressions possible. Alignments are in this case introduced to make creation of expressions covering different topics and disciplines possible. Since slices contain paths between source and target concepts, which we denote search concepts, a first narrow approach is to introduce connections between those search concepts in different slices. Those connections could themselves be seen as paths (and later on expressions). When thought of it in this perspective, one could introduce not only direct mappings between concepts contained in the slices, but mappings that are found a bit outside the slices. To avoid too much widening, only the shortest paths are retrieved, at least in a first attempt. Therefore I propose two methods of slice alignment, they have a similar approach but different kind of data representation is used.. 4.1. Align slices using graphs. In graph theory there exist numerous ways to compute the shortest pathway between vertices. This could be used as a solution to our problem, as ontologies could be translated into graphs with relations as edges and concepts as vertices. The method could look something like this: • Create a graph containing information from both source ontologies, and information about alignments previously produced by an alignment system e.g. SAMBO. 23.

(74) • Retrieve the shortest paths between search concepts in the first slice to search concepts in the second slice. The shortest path could be retrieved with a variant of Dĳkstra’s algorithm, which I used in my implementation [Dĳkstra, 1959]. • Translate the paths into expressions in the same manner as previously could have been made within the slice creation step.. 4.2. Align slices using matrices. As previously mentioned the adjacency matrix could give us information about path length between vertices in the corresponding graph. So why not use that information when we already have created the matrices. To retrieve the shortest paths between search concepts in different slices, one can divide it into finding paths from search concepts to concepts occurring in alignments which together constitute paths between search concepts. To describe this method better, I have created an example case. It consists of two very small ontologies, with search concepts and concepts occurring in alignments marked out. A visualization of the example ontologies can be seen in figure 4.1. • Create a matrix (or table) S for each slice, which will keep track of connections between search concepts and concepts occurring in alignments. The goal is to fill it with numbers that represent path lengths. The column order is critical, column i in the first matrix is related to column i in the second matrix. They represent concepts occurring in the same alignment. In the example case, the matrices would have the size 2x3, since we have three alignments and two search concepts in each slice. • Initialize it with zeros, which says that no path has yet been found. • Look into the adjacency matrix A for each ontology. If there exist a 1way between a search concept and a concept occurring in an alignment, then put a one in the appropriate place in the S-matrix. In our example case the resulting matrices would look like the first two in table 4.1, we have not yet found paths between all search concepts and have to move on. • Look into the next adjacency matrix multiple and check for paths not yet found between concepts. If found, add the path length to the Smatrix. Iterate in this manner until all shortest paths between search concepts are found, or if we don’t have any more adjacency matrix multiples left. In the example case, we stop after A4 . Note that we 3 or between sc2 and ¥. 1 don’t have to find paths between sc1 and ¥ 24.

(75) . . .

(76). . . . . . . . . . Figure 4.1: A graph that visualizes alignments between two ontologies. The ontologies are separated with a vertical black line. Vertices shown in grey correspond to search concepts. Vertices shown in black correspond to concepts occurring in alignments. White vertices correspond to ordinary concepts. The mappings between concepts that constitute the ontology alignment are shown with broken lines. The areas surrounded with broken lines corresponds to slices made between the search concepts.. 25.

(77) sc1 sc2. 1 ¥ 1 0. 2 ¥ 0 0. 3 ¥ 0 0. A2. sc1 sc2. 1 ¥ 1 0. 2 ¥ 0 0. A4. sc1 sc2. 1 ¥ 1 0. 2 ¥ 4 4. A. sc3 sc4. 4 ¥ 0 0. 5 ¥ 0 0. 6 ¥ 0 1. 3 ¥ 0 2. sc3 sc4. 4 ¥ 2 0. 5 ¥ 0 2. 6 ¥ 0 1. 3 ¥ 0 2. sc3 sc4. 4 ¥ 2 4. 5 ¥ 4 2. 6 ¥ 3 1. Table 4.1: The tables show how the S-matrices are altered during the alignment process. To the left, the corresponding adjacency matrix multiple, which is used to retrieve paths of certain length between concepts, is shown. Note that a zero means that no path yet has been found.. 4.3. Large graphs creation versus merging of ontologies. To generate the shortest paths between concepts as in the first method of slice alignment, a graph containing both ontologies and information about alignments have to be created. This is very similar to merging of ontologies. The only difference is the narrowing of connections between inter-ontology search concepts. The choice of method depends therefore on thoughts about degree of information retrieval between inter-ontology search concepts.. 4.4. Implementation. I only implemented the alignment method that used graphs. The one using matrices came to mind much later and I didn’t have the time to test it. The method is also implemented with JavaT M 2 Platform Standard Edition 5.0 (J2SET M 5.0) and is simply called Slice Aligner. The output consist of an xml-file, with a format as in figure 5.2. In addition to the previously mentioned libraries (section 3.4.1) JGraphT1 was also used. A powerful library which provides algorithms and objects in mathematical graph-theory. JGraphT is optimized for data models, algorithms, high-performance applications and large-scale applications. It supports vertices of any kind of object, which is useful. There also exists a closely related library that handles visualization, called JGraph.. 1. http://jgrapht.sourceforge.net/. 26.

(78) Chapter 5. Results This chapter is divided into two sections. The first section shows slice creation results and the second section shows slice alignment results. They both contain an example scenario, which hopefully will show the usefulness of slice creation and alignment.. 5.1 5.1.1. Slice creation The ghrelin scenario. A biochemist is interested in finding information about ghrelin1 . He does not have much prior knowledge of ghrelin, but for some reason he wants to read about it. He enters ghrelin into a search field and a list of expressions in natural language appears as result. Amongst others the expressions ’Ghrelin system pathway is a part of energy homeostasis pathway which has part leptin system pathway.’ and ’Ghrelin system pathway is a peptide and protein hormone signaling pathway which has subclass leptin system pathway.’ appears. If the biochemist finds one of the expressions useful, he can choose to retrieve provenance information, in form of articles related to the expression. He could also retrieve a table of instances which belong to the terms in the expression (in this example that might not be the case, since the terms are already on a low and precise level). He might also be satisfied with the result this far i.e. the expression could be all that he needed to know.. 5.1.2. Statements retrieved from the Pathway Ontology. The above example scenario could be made possible if a slice previously was created from the pathway ontology, with ghrelin system pathway and leptin system pathway as search concepts. 1. Ghrelin is a hormone that stimulates appetite. Ghrelin levels increase before meals and decrease after meals. It is considered the counterpart of the hormone leptin.. 27.

(79) The pathway ontology was created by the Rat Genome Database Team [Twigger et al., 2006] and contains biological pathways such as disease pathways, regulatory pathways and metabolic pathways. To show how the methods of slice creation turn out to work, I made a slice between two terms in the pathway ontology (ghrelin system pathway and leptin system pathway). I restricted the slice to only contain paths of max length 2. Figure 5.1 displays the slice in the actual output format produced by my implementation. In table 5.1 I exchanged the IDs of the concepts into corresponding labels instead, to make it more easily read and understood. I have also stated some possible natural language translations of the paths in the slice. As can be seen, two paths of length 2 were retrieved. One that made its way through a subclass-relation and one who walked through an object property (in this case a part of -relation, which is commonly used and often mixed up with the is a-relation). As the subclass-relation is a standard I chose to use the names is-a and hasSubClass to fill the predicate element up with something. Relations using object properties have a link to corresponding object property, which could be used in a later step to retrieve labels.

(80) !!

(81) !"!#

(82) !$%&$%'(((()*+!

(83)

(84) !!

(85) !"!#

(86) !$%&$%'((((*,*!

(87)

(88) -./.

(89)

(90) -.0. -.12314$5. !!

(91) !"!#

(92) !$%&$%'(((()*+! -.267415$82$4859. !!

(93) !"!#

(94) !262'84&' ! -.12314$5. !!

(95) !"!#

(96) !$%&$%'((((*):! !

(97)

(98) -./. -.12314$5. !!

(99) !"!#

(100) !$%&$%'((((*):! -.;3<48=4267415$82$4859. !!

(101) !"!#

(102) !262'84&' ! -.12314$5. !!

(103) !"!#

(104) !$%&$%'((((*,*! ! !

(105) -./.

(106)

(107) -.0. -.12314$5. !!

(108) !"!#

(109) !$%&$%'(((()*+! -.=>$481?==. '! -.12314$5. !!

(110) !"!#

(111) !$%&$%'((((@,:! !

(112)

(113) -./. -.12314$5. !!

(114) !"!#

(115) !$%&$%'((((@,:! -.=>61?==.="1

(116) ! -.12314$5. !!

(117) !"!#

(118) !$%&$%'((((*,*! ! ! !. Figure 5.1: The two shortest paths between search concepts ghrelin system pathway and leptin system pathway in the pathway ontology.. 28.

(119) Statement Concept # 1 ghrelin system pathway 2 ghrelin system pathway. Relation. Concept. Relation. Concept. part_of. energy homeostasis pathway peptide and protein hormone signaling pathway. part_of−1. leptin system pathway leptin system pathway. is_a. is_a−1. Statement Possible natural language representation # 1 Ghrelin system pathway is a part of energy homeostasis pathway which has part leptin system pathway. 2 Ghrelin system pathway is a peptide and protein hormone signaling pathway which has subclass leptin system pathway. Table 5.1: Uppermost: a table showing paths between two terms in the pathway ontology. At the bottom: Possible natural language generations of the above paths.. 5.2 5.2.1. Slice alignment The inflammation scenario. A person interested in learning more about inflammation, types the word in a search field. A list of expressions is retrieved and amongst others these are displayed: Fever is an inflammatory response, Leukotriene2 metabolism is a leukotriene response and Leukotriene response is an immune response which is a defense response which has subclass inflammatory response, where the last expression is a connection between the two first expressions. Possibly the three expressions could have been aggregated into a single expression which is one of the things that is made easier with slice alignment, besides just formulating expressions containing information from different ontologies.. 5.2.2. Statements retrieved from the SIGNAL-ONTOLOGY and the Gene Ontology. The above scenario is made possible if two slices from the Gene Ontology [Ashburner et al., 2000] and the SIGNAL-ONTOLOGY [Takai-Igarashi and Takagi, 2000] are aligned, which I have done and will show below. 2. Leukotrienes are partly responsible for the effects of an inflammatory response.. 29.

(120) From each ontology only a small part was used, parts which contained terms in the field of immunology. Those parts were previously taken out as test cases in the work with SAMBO, to evaluate matchers inside the alignment system. This comes in handy, as alignments had been made and I could test my methods of slice alignment. In figure 5.2 the shortest path between two search concepts (leukotriene response and inflammatory response) from different slices is displayed in xml-format, a format I chose to use in my implementation. It contains only the ID of the concepts. In table 5.2 labels of each concept is displayed instead, along with the corresponding expression in natural language. !

(121) ! "#$$

(122) % %&"$

(123)

(124) %' ()" $

(125) !

(126) !"#$$***

(127) %$*

(128) $'+,(---./0 $

(129) ! !

(130) 1,21345! "#$$

(131) % %&"$

(132)

(133) %' ()" $

(134) !

(135) 6743)1866! $

(136) !

(137) 1,21345! "#$$

(138) % %&"$

(139)

(140) %'9()" $

(141) ! $ ! !

(142) 1,21345! "#$$

(143) % %&"$

(144)

(145) %'9()" $

(146) !

(147) 89+2325!$

(148) !

(149) 1,21345!"#$$***

(150) %$*

(151) $'+,(---./00$

(152) ! $ ! !

(153) 1,21345!"#$$***

(154) %$*

(155) $'+,(---./00$

(156) !

(157) 6743)1866! $

(158) !

(159) 1,21345!"#$$***

(160) %$*

(161) $'+,(---./0:$

(162) ! $ ! !

(163) 1,21345!"#$$***

(164) %$*

(165) $'+,(---./0:$

(166) !

(167) 67;1866!6<1

(168) $

(169) !

(170) 1,21345!"#$$***

(171) %$*

(172) $'+,(---./0 $

(173) ! $ ! $!. Figure 5.2: The shortest path between two slices created from parts of the SIGNAL-ONTOLOGY and parts of the Gene Ontology, which both covers terms related to immune defense. Stretching from search concept Leukotriene response to search concept Inflammatory response (GO_0006954). Notice the alignment, which is the actual breakpoint between the ontologies.. 30.

(174) Statement GO. Concept fever. SO. leukotriene metabolism leukotriene response immune response. Alignment Align. con.. Statement GO SO Alignment. Relation Concept is_a inflammatory response is_a leukotriene response is_a immune response is_a defense response. Relation Concept -. -. =. immune response inflammatory response. is_a−1. Possible natural language representation Fever is an inflammatory response. Leukotriene metabolism is a leukotriene response. Leukotriene response is an immune response which is a defense response which has subclass inflammatory response.. Table 5.2: Uppermost: a table showing paths between 1) Two terms in the Gene Ontology. 2) Two terms in the SIGNAL-ONTOLOGY. 3) The shortest alignment between the two above paths. At the bottom: Possible natural language generations of the above paths.. 31.

(175)

(176) Chapter 6. Conclusion The questions previously asked in the problem statement have got their answer in the form of well defined slices and slice alignment. I have also showed how they can be created with algorithms of slice creation and alignment. When comparing the methods as I did in section 3.3. I have come to the conclusion that the matrix method is to be preferred. It is easy to see that it works in all cases, and it should be really fast if the matrices are pre-calculated. The results shows us that the methods work in practice and beyond that hopefully shows how slices can come into use in a future system, although the example scenarios can seem a bit strained.. 6.1. Comparison with related work. Work has previously been made that uses ontologies in the search for articles. I have listed two of them below and made comparisons to the larger project, which my thesis has been a part of. • GoPubMed is a web server which allows users to explore PubMed search results with the Gene Ontology (GO). The system categorizes articles according to the GO, which the user can navigate through after a first query has been made. The result is presented in form of a list with titles of articles, where the GO terms are highlighted. The user can view definitions of terms, see closely related terms and can thus navigate through PubMed articles with the use of an ontology. [Doms and Schroeder, 2005] GoPubMed shows the hierachical structure of GO to present relations to other terms. Our approach can show relations other than subclass relations and relations between concepts occurring in different ontologies, not only the GO. This will provide the user with more knowledge, hopefully without further look up in articles, which are shown only if the user wishes so. 33.

(177) The key difference between the systems is that our approach aims to give the user knowledge in form of natural language expressions, instead of just links to articles related to one or more terms in the query, and also connecting knowledge from different fields or from different aspects in the same field. Some of the GoPubMed’s benefits could be used in a future system using our approach, e.g. showing definitions of ontological terms or highlighting of ontological terms occurring in abstracts. It should be fairly easy to implement and should thus not be seen as a benefit only for GoPubMed. • Textpresso is a text-mining system for scientific literature which categorizes words and sentences according to terms contained in an ontology. The categories are divided into three groups. The first contains biological concepts. The second group comprises terms that characterize a biological entity or establish a relation between two of them. The last group covers terms that can be used for involvement in the semantics of sentences [Müller et al., 2004]. Together the groups of categories can be used to make a more precise search. A user types a keyword and then chooses categories from a list. The result consists of a list of articles with sentences, in which the search phrase exists, also displayed. The user can then choose to read the abstract if he finds the sentences useful. I would say that this system is more similar to our approach than GoPubMed, as it aims to put the search keyword(s) in relation to others presented in natural language. The difference lies in how the relationship to other terms is retrieved and presented. Textpresso uses sentences retrieved from articles which contains the keywords. Our approach derives expressions from ontologies and will leave the investigation of underlying literature to a later step. Our system will also be able to produce answers in the form of instances related to the expression, which hopefully will provide the user with knowledge before the source is reviewed.. 6.2. Future work. • Optimize algorithms of slice creation and alignment. Although I have put much effort in the creation of those algorithms, improvements can be made to make them more efficient. This could perhaps be done by finding and removing unnecessary steps. A way, which I previously mentioned, could be to alter the matrix multiplication algorithms to store paths when ’created’. 34.

(178) • Put the idea and methods into actual use. To prove that an approach with natural language expressions mined from ontologies is useful, we need to put it into actual use. Documents need to be mined for terms occurring in ontologies and connected to slices. Slices that were previously created or created on-demand. The statements also have to be translated into natural language, and for that we need a natural language generator.. 35.

(179)

(180) Bibliography [Ang et al., 2008] Ang, W. T., Kanagasabai, R., and Baker, C. (2008). Knowledge translation: Computing the query potential of bio-ontologies. (Poster presentation at the Semantic Web Applications and Tools for Life Sciences Workshop, November 2008). [Ashburner et al., 2000] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25(1):25–29. [Baader et al., 2003] Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., and Patel-Schneider, P. F., editors (2003). The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press. [Chartrand, 1985a] Chartrand, G. (1985a). Connected Graphs, page 41. In [Chartrand, 1985c]. [Chartrand, 1985b] Chartrand, G. (1985b). Graphs and Matrices, pages 217–222. In [Chartrand, 1985c]. [Chartrand, 1985c] Chartrand, G. (1985c). Introductory graph theory. Dover publications. [Dĳkstra, 1959] Dĳkstra, E. W. (1959). A note on two problems in connexion with graphs. In Numerische Mathematik, volume 1, pages 269–271. [Doms and Schroeder, 2005] Doms, A. and Schroeder, M. (2005). Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Research, 33:W783–W786. [Gruber, 1993] Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199–220. 37.

(181) [Hayamizu et al., 2005] Hayamizu, T. F., Mangan, M., Corradi, J. P., Kadin, J. A., and Ringwald, M. (2005). The adult mouse anatomical dictionary: a tool for annotating and integrating data. Genome Biology, 6(3):r29. [Lambrix, 2004] Lambrix, P. (2004). Ontologies in bioinformatics and systems biology. In Dubitzky, W. and Azuaje, F., editors, Artificial Intelligence Methods and Tools for Systems Biology, chapter 8, pages 129–146. [Lambrix and Tan, 2006] Lambrix, P. and Tan, H. (2006). Sambo - a system for aligning and merging biomedical ontologies. Journal of Web Semantics, Special issue on Semantic Web for the Life Sciences, 4(3):196–206. [Lambrix and Tan, 2008] Lambrix, P. and Tan, H. (2008). Ontology alignment and merging. In Burger, A., Davidson, D., and Baldock, R., editors, Anatomy Ontologies for Bioinformatics: Principles and Practice, chapter 6, pages 133–150. [Lambrix et al., 2007] Lambrix, P., Tan, H., Jakoniene, V., and Strömbäck, L. (2007). Biological ontologies. In Baker, C. J. O. and Cheung, K.-H., editors, Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, chapter 4, pages 85–99. [Müller et al., 2004] Müller, H.-M., Kenny, E. E., and Sternberg, P. W. (2004). Textpresso: An ontology-based information retrieval and extraction system for biological literature. PLoS Biology, 2(11). [Motta et al., 2000] Motta, E., Buckingham, S. S., and Domingue, J. (2000). Ontology-driven document enrichment: principles, tools and applications. International Journal of Human-Computer Studies, 52(6):1071–1109. [Paris et al., 1991] Paris, C. L., Swartout, W. R., and Mann, W. C. (1991). Natural Language Generation in Artificial Intelligence and Computational Linguistics, page xv. Springer. [Reiter, 2000] Reiter, E. (2000). Building natural language generation systems. [Smith et al., 2007] Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., Consortium, T. O., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.-A., Scheuermann, R. H., Shah, N., Whetzel, P. L., and Lewis, S. (2007). The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25:1251–1255. [Stevens et al., 2000] Stevens, R., Goble, C. A., and Bechhofer, S. (2000). Ontology-based knowledge representation for bioinformatics. Briefings in Bioinformatics, 1(4). 38.

(182) [Takai-Igarashi and Takagi, 2000] Takai-Igarashi, T. and Takagi, T. (2000). Signal-ontology: Ontology for cell signaling. Genome Informatics, 11:440– 441. [Twigger et al., 2006] Twigger, S. N., Shimoyama, M., Bromberg, S., Kwitek, A. E., and Jacob, H. J. (2006). The rat genome database, update 2007—easing the path from disease to data and back again. Nucleic Acids Research, 35:D658–D662. [W3C, 2004] W3C (2004). OWL Web Ontology Language Guide. http: //www.w3.org/TR/2004/REC-owl-guide-20040210/.. 39.

(183) .

(184) På svenska Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ In English The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ © [Jonas Bergman Laurila].

(185)

No results found