A Session-Based System for Aligning Large Ontologies

Full text

(1)

(2)

(3)

(4)

(5) .

(6) . !"#$ %&!&'(# )&%&&*%(. +

(7) #,-% -' +. / . +

(8) ,-% -' +.

(9)

(10)

(11)

(12)

(13)

(14)

(15) . .

(16)

(17)

(18)

(19)

(20) . .

(21)

(22)

(23)

(24)

(25)

(26) .

(27)

(28) .

(29) !" #$ %&%$'()*#$ %% ""+,)-, "+.+),,+"+"( +/+0+1"!2( '& &+/+0(3"."+.

(30)

(31) Linköping University Electronic Press. Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet – or its possible replacement – from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for noncommercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.. © Muzammil Zareen Khan..

(32)

(33) Linköping University Department of Computer and Information Science. Abstract Ontologies are a key technology for the Semantic Web. In different areas, a large number of ontologies have been developed so far by different people or organizations under the same domains and many of them contain overlapping information. In order to get more benefit from different ontologies having inter-related knowledge they have to be aligned or merged. A number of systems have been developed for aligning and merging ontologies and various alignment strategies are used in these systems. However, there is no system available which supports multiple alignment sessions for aligning large ontologies adequately. In this thesis work we propose a session-based framework for aligning and merging large ontologies. We have implemented two types of sessions, computation sessions to generate suggestions and validation sessions to validate these generated suggestions. Furthermore after categorizing suggestions into accepted and rejected ones, we generated partial reference alignment (PRA) that can be used to compute similarities between terms and to filter mapping suggestions. We have also proposed recommendation process integrated with computation and validation sessions in order to find out which matchers, and combinations are better to use for alignment process. Either computation and validation sessions may use the recommended settings or the user can select other matchers and combinations.. Key words: Ontology alignment, Biomedical Ontologies, PRA. URL for the document: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60156.

(34) Linköping University Department of Computer and Information Science. Acknowledgement I would like to express my gratitude to Patrick Lambrix, my examiner and supervisor for his guidance in this thesis work. It was really productive and incredible to work with him on this project. His guidance and supervision has opened so many new ventures for me. I would also like to thank Qiang Liu for his help and support during my work. I owe a lot to my friend and roommate for last two years, Syed Ahmad Aamir, not only for language proofreading but supporting and providing guidance to me at every step during my stay in Linköping. Finally, a special thank goes to my family and all of my friends for supporting throughout my studies and for never questioning the choices I make in my life and trusting me. Thank you all!. Linköping, September 2010 Muzammil Zareen Khan. iv.

(35) Linköping University Department of Computer and Information Science. Table of Contents 1. ITRODUCTIO........................................................................................................... 8 1.1 1.2 1.3 1.4 1.5. 2. MOTIVATION ............................................................................................................ 8 PROBLEM STATEMENT .............................................................................................. 8 PROJECT GOAL ......................................................................................................... 8 METHOD OF WORK ................................................................................................... 8 OUTLINE ................................................................................................................... 9. BACKGROUD ........................................................................................................... 10 2.1 THE SEMANTIC WEB ............................................................................................... 10 2.2 ONTOLOGY ............................................................................................................. 10 2.2.1 The Structure of Ontology ............................................................................. 10 2.2.2 Use of Ontology ............................................................................................ 11 2.3 ONTOLOGY ALIGNMENT .......................................................................................... 12 2.3.1 Alignment Framework................................................................................... 13 2.3.2 General Alignment Strategies........................................................................ 13 2.4 SAMBO ................................................................................................................... 14 2.5 RECOMMENDATION OF ONTLOLOGY ALIGNMENT STRATEGY ................................... 15 2.5.1 Selecting Segment Pair.................................................................................. 16 2.5.2 Segment Pair Alignment................................................................................ 16 2.5.3 Evaluation and Recommendation .................................................................. 16. 3. REQUIREMETS ........................................................................................................ 17 3.1 MAIN REQUIREMENTS ............................................................................................. 17 3.1.1 Framework Requirements ............................................................................. 17 3.1.2 Supporting Features...................................................................................... 17. 4. AALYSIS AD DESIG ........................................................................................... 18 4.1 4.2 4.3 4.4 4.5 4.6. 5. DESCRIPTION .......................................................................................................... 18 FRAMEWORK ARCHITECTURE .................................................................................. 18 COMPUTATION SESSION .......................................................................................... 19 VALIDATION SESSION.............................................................................................. 20 RECOMMENDATION PROCESS .................................................................................. 21 SYSTEM FLOW ........................................................................................................ 22. IMPLEMETATIO................................................................................................... 24 5.1 TECHNOLOGY ......................................................................................................... 24 5.2 INFORMATION STORAGE .......................................................................................... 24 5.2.1 User Information........................................................................................... 24 5.2.2 Session Information....................................................................................... 25 5.2.3 Session Data ................................................................................................. 26 5.3 IMPLEMENTATION ................................................................................................... 28. 6. TESTIG THE SYSTEM............................................................................................. 29 6.1 DESCRIPTION .......................................................................................................... 29 6.2 PROCEDURE ............................................................................................................ 29 6.2.1 Test Cases ..................................................................................................... 29. v.

(36) Linköping University Department of Computer and Information Science. 6.2.2 Matchers, Weight and Threshold................................................................... 29 6.3 RESULTS ................................................................................................................. 30 7. COCLUSIO AD FUTURE WORK ...................................................................... 31 7.1 7.2. CONCLUSION .......................................................................................................... 31 FUTURE WORK ....................................................................................................... 31. 8. REFERECE LIST ...................................................................................................... 32. 9. APPEDICES ............................................................................................................... 34 9.1 9.2 9.3. APPENDIX A : IMPLEMENTED FUNCTIONS DESCRIPTION ........................................... 34 APPENDIX B : IMPLEMENTED CLASSES AND PUBLIC VARIABLES ............................... 42 APPENDIX C : SETUP GUIDE FOR THE SYSTEM ......................................................... 44. vi.

(37) Linköping University Department of Computer and Information Science. List of Figures and Tables Figures 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21. FIGURE 2.1 EXAMPLE OF OWL FILE ................................................................................. 11 FIGURE 2.2 EXAMPLE OF OVERLAPPING INFORMATION ...................................................... 12 FIGURE 2.3 ALIGNMENT STRATEGY .................................................................................. 13 FIGURE 2.4 COMBINING ANF FILTERING IN SAMBO.......................................................... 14 FIGURE 2.5 SCREEN SHOT FOR MAPPING SUGGESTION ...................................................... 15 FIGURE 2.6 SCREEN SHOT FOR MANUAL ALIGN IN SAMBO.............................................. 15 FIGURE 4.1 SESSION BASED FRAMEWORK FOR ALIGNING LARGE ONTOLOGIES..................... 18 FIGURE 4.2 COMPUTATION SESSION .................................................................................. 20 FIGURE 4.3 VALIDATION SESSION ..................................................................................... 21 FIGURE 4.4 RECOMMENDATION PROCESS .......................................................................... 22 FIGURE 4.5 THE SEQUENCE DIAGRAM............................................................................... 23 FIGURE 5.1 JSP MODEL ARCHITECTURE ........................................................................... 24 FIGURE 5.2 USERS.XML .................................................................................................... 25 FIGURE 5.3 TESTER.XML................................................................................................... 25 FIGURE 5.4 USER OPTIONS ............................................................................................... 25 FIGURE 5.5 TESTER_SUGGESTIONSLIST.XML .................................................................... 26 FIGURE 5.6 TESTER_HISTORYLIST.XML ............................................................................ 26 FIGURE 5.7 RELATIONRECOMMENDATION.XML................................................................. 27 FIGURE 5.8 USING COMPUTATION RECOMMENDATIONS FOR ALIGNING RELATIONS ............. 27 FIGURE 5.9 CONCEPTRECOMMENDATIONS.XML ................................................................ 27 FIGURE 5.10 USING COMPUTATION RECOMMENDATION FOR ALIGNING CONCEPTS .............. 28. Tables 01 02. TABLE 6.1 TEST CASES..................................................................................................... 29 TABLE 6.2 MATCHERS AND THRESHOLD APPLIED ON TEST CASES ..................................... 30. vii.

(38) Linköping University Department of Computer and Information Science. 1. Introduction. 1.1. Motivation. Semantic web is an emerging technology which presents methods and technologies to allow machines to understand the meaning (semantics) of information on the World Wide Web by focusing on machines [1]. Ontologies are a key technology for the Semantic Web. They classify and define the concepts and relations within a specific domain [2]. Rapid development of ontologies increases the chances of overlapping information in them under the specific domains. For clear definition, better organization and easy retrieval of information, it is necessary to align ontologies in order to remove overlapping information. There are lots of systems available to align small ontologies but there is no system available that supports to align large ontologies in an adequate way. The computation of mapping suggestions in the case of large ontologies can take a long time as compared to small ones. Generally there are too many mapping suggestions which require user actions and no methodology is available to encounter the problem of aligning the large ontologies in multiple sessions. Further, a large number of online portals (e.g. BioPortal [3]) are available today containing many more ontologies and mappings and all this information should be used in alignment process clearly defining the alignment strategies available.. 1.2. Problem Statement. In different areas, a large number of ontologies have been developed so far and many of them contain overlapping information [4, 5, and 6]. Ontology alignment is actually finding this overlap between ontologies of same domain. There is a need for ontology alignment systems which support the alignment of large ontologies separated in computation and validation sessions and the available information can be used including acceptance or rejection of mappings as well as information about alignment strategies.. 1.3. Project Goal. The goal of this thesis work was to develop a framework targeted at session based alignment of large ontologies. In which the users can validate the suggestions in multiple sessions and the knowledge obtained in one session can be used in other session. We have implemented this proposed framework using java technology and SAMBO [4] has been used as base system.. 1.4. Method of Work. We have divided this thesis work in three phases. Here are the details for each phase. Phase I Background The initial phase was to study the research papers to build the understanding for ontology, and ontology alignment. Furthermore, SAMBO was examined as base system for this system. Phase II Analysis and Design In this phase we have analyzed our work and have suggested a framework for aligning. -8-.

(39) Linköping University Department of Computer and Information Science. large ontologies in multiple sessions including computation session, validation session and recommendation process. Phase III Implementation Then we have implemented the design using open source java technology (JSP and Servlet) along with XML engine to store and retrieve the users session data.. 1.5. Outline. The introduction of chapters is given below. Chapter 1 Introduction This chapter gives the brief introduction about the problem statement, the project goal and the motivation behind this work. This chapter also includes the work distribution for the whole thesis work done. Chapter 2 Background This chapter contains the detailed discussion on ontologies, describes the structure and components of ontology, few uses of ontologies. The overview of ontology alignment, alignment framework and alignment strategies are also described in this chapter. This chapter also includes the overview of SAMBO and the method of recommending the ontology alignment strategy. Chapter 3 Requirements This chapter describes the main framework requirements and the supporting features of a session based system for aligning large ontologies. Chapter 4 Analysis and Design This chapter includes the proposed framework for aligning large ontologies with session based approach and describes the different modules of the framework. This chapter also includes the work flow of the system. Chapter 5 Implementation This chapter gives introduction to the technology which is used to implement this thesis work and describes how the system has been implemented with the help of screen shots. Chapter 6 Testing of the System This chapter defines the test approaches, includes different test cases that have been used and the results obtained for testing the system. Chapter 7 Conclusion and Future Work The summary of the work is given in this chapter as well as a few words about the future possible work regarding this thesis work.. -9-.

(40) Linköping University Department of Computer and Information Science. 2. Background. 2.1. The Semantic Web. The vision of the Semantic Web was conceived by Tim-Burner Lee, the director and the co-founder of W3C. According to him the Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation [7]. The main idea behind the Semantic Web is to define a web of data to enable computers in a way so that they can process the meaning of things. Ontologies are a key technology for the Semantic Web [2].. 2.2. Ontology. There are different definitions of ontology in the computer science area. Ontologies define the basic concepts and relations, along with the rules for combining them for a specific domain of interest [8]. In the context of computer and information sciences, an ontology defines a set of representational parameters typically classes, properties or attributes and relationships with which is modeled a domain of knowledge or discourse [9]. Ontology is a representation of knowledge within a specific domain by a set of concepts and relationships between them. Ontologies define a way to communicate between people and organizations by provide common terminologies over domains [10].. 2.2.1 The Structure of Ontology Ontologies consist of following components [4, 11]: o Concepts: A concept in any domain represents a set or a class of entities. There are two kinds of concepts: primitive and defined. A primitive concept defines necessary characteristics for an entity so that an entity can belong to a class. A defined concept defines both sufficient and necessary characteristics for an entity so that the entity can belong to a class. o Relations: The relations between concepts describe the concepts and its characteristics or the interaction between concepts or its properties. There are many types of relations like is-a relation or part-of relation. o Instants: The instances represent the actual entities. For example, “Atom is a concept and potassium is an instance of that concept. It could be argued that Potassium is a concept representing the different instances of potassium and its isotopes, etc”. o Axioms: The axioms represent knowledge on concepts/relations that can be checked on its logic. Axioms can be such things as domain restrictions, cardinality restrictions or disjointness restrictions. The figure 2.1 shows an example ontology file in owl format. This is a part of Gene ontology written in OWL [Ontology Web Language see more in 12] format about behavior. The Gene Ontology project has major initiative in bioinformatics regarding gene product attributes [13]. The ontology given in figure 2.1 contains the information of concepts represented by OWL classes containing class ID, the label for the class and the relation by mentioning the sub-class information with other class if any.. - 10 -.

(41) Linköping University Department of Computer and Information Science. Figure 2.1 Example of OWL File. 2.2.2 Use (Role) of Ontologies The use of ontology provides many benefits such as knowledge sharing and portability across the platform, reusability, reliability, maintenance and improved documentation. Ontologies provide an effective and efficient way for handling the information to understand a specific field [4]. There are three different types of uses of ontology [11]: o Domain-Oriented: The ontologies can be used specifically for a single domain or can be used as a generalized for same sort of domains o Task-Oriented: The ontologies can be used specifically for a single task or can be used as a generalized for same sort of tasks o Generic: These ontologies capture common high level concepts, useful at the time of reusing an ontology A few general examples of ontology uses are here [10]: o In natural language processing, ontologies provide required definition of terms (concepts or relations) for content words while information processing [14]. o WordNet is one of the largest lexical ontologies. It is a database of English which contains nouns, verbs, adverbs and adjectives grouped into sets of cognitive. - 11 -.

(42) Linköping University Department of Computer and Information Science. synonyms. The structure of WordNet makes it possible to use it as an effective tool for computational linguistics and natural language processing [15]. o Ontologies are used to share common understanding of the structure of information among people or agents which could be software or organizations. If different websites are using same underlying ontology of the terms then the computer agent can extract the required information according to the input [16]. o Ontologies can also be used to separate the domain knowledge from the operational knowledge. Ontology of PC-components with their characteristics can be developed to apply configurations in order to made PCs [16].. 2.3. Ontology Alignment. There is often more than one ontology developed under a given domain by different people or organizations, so there is a chance of overlapping information in these ontologies. Finding this overlap and aligning the ontologies is needed for a number of reasons. It is helpful in searching the required knowledge accurately is beneficial if different ontologies are integrated to each other under the same domain like researchers would be able to get more benefit, if the ontologies containing information about genes, gene sequence information, proteins, gene functions, pathway information, genetic diseases, phenotypes, etc are connected to each other because all of these contain interrelated knowledge [17]. Ontology alignment is the process of deciding or settling down the similarities or overlapping information between concepts or relations in ontologies of the same domain. Ontology alignment allows for relating metadata and bringing two ontologies of same domain into a mutual agreement. The figure 2.2 shows the overlapping information between two different ontologies of the same domain highlighting the equivalent concepts, equivalent relations and is-a relations in both of the ontologies. The used ontologies are part of Gene Ontology and Signal Ontology. In order to effectively use the given information in both of the ontologies, it is necessary to align them to find out the overlapping information.. Figure 2.2: Example of overlapping information [18]. Ontology alignment and merging is an important step in ontology engineering [e.g. 19]. Often we would like to use multiple ontologies. Many ontology alignment systems have been developed so far (e.g. overview in [20]), supporting the user to find inter-ontology relationships. In this thesis work we have used the SAMBO [4] framework for aligning and merging the ontologies.. - 12 -.

(43) Linköping University Department of Computer and Information Science. 2.3.1 The Framework A framework [4] for aligning two ontologies based on the computation of similarity values between terms in the source ontologies is shown in figure 2.3. The alignment algorithm is receiving two source ontologies as input including several matchers. The matchers calculate similarities between the terms from the input ontologies by implementing strategies based on linguistic matching, structure-based strategies, constraint-based approaches, instance-based strategies and strategies that use auxiliary information or a combination of these. Mapping suggestions are then determined by combining and filtering the results generated by one or more matchers. The use of different matchers and combining and filtering the results helps us in obtaining different alignment strategies. Once the suggestions have been generated, all the suggestions will be presented to the user in order to accept or reject them. A conflict checker in the framework is used to avoid conflicts introduced by the alignment relationships.. Figure 2.3: Alignment Strategy [4]. 2.3.2 General Strategies Different strategies [4] are being applied by matchers in the system in order to calculate similarities between the terms from the different source ontologies. Here we include the general overview of the types of strategies that are used by current systems for aligning. - 13 -.

(44) Linköping University Department of Computer and Information Science. ontologies. o Strategies based on linguistic matching: These strategies use textual descriptions (e.g. names, synonyms and definitions) of the concepts and relations to determine similarities. o Structure-based strategies: The structure of the ontologies has been used in these strategies to provide suggestions. A graph structure is provided through is-a, partof or other relations over the concepts. o Constraint-based approaches: These approaches use the axioms to provide suggestions. o Instance-based strategies: In these strategies, instances are used in defining similarities between concepts if they are available otherwise they can be obtained. o Use of auxiliary information: This kind of information includes external resources and information about previously aligned or merged ontologies. o Combining different approaches: The combination of different approaches may give better results.. 2.4. SAMBO. SAMBO [4] is a web-based system for aligning and merging biomedical ontologies. This thesis work is using SAMBO as base system for aligning large ontologies. SAMBO was developed at IDA, Linköping University. This was the best performing system in the Anatomy track of the 2008 Ontology Alignment Evaluation Initiative (OAEI) []. SAMBO facilitates the users to align and merge two ontologies in OWL format. The system separately handles the alignment of relations and alignment of concepts in two different steps. The second step will start after the user finishes the first step. The system offers different matchers and allows users to select for alignment process. The system provides two working modes: o Suggestion Align - In this case the user selects which suggestion(s) to align or not o Manual Align - In this case the user can choose which relations and concepts to align without any suggestions and each mapping will be added individually. Figure 2.4 shows how the different matchers that can be used and how the weight value can be assigned to these matchers. When the user clicks on the start button, the system will start computing suggestions.. Figure 2.4 Combination and filtering [4]. Figure 2.5 shows a mapping suggestion and how the user can decide whether the terms are equivalent, or there is an is-a relation between the terms or the suggestion should be rejected. Users can also assign a new name and give comments also for the case that. - 14 -.

(45) Linköping University Department of Computer and Information Science. suggestion is not rejected.. Figure 2.5 Mapping Suggestion [4]. In addition to the suggestion mode, the system also provides an interface to align ontologies manually and the source ontologies are illustrated using is-a and part-of hierarchies (i and p icons, respectively) in figure 2.6.. Figure 2.6 Manual align [4]. 2.5. Recommendation of Ontology Alignment Strategy. The recommendation process will recommend the alignment strategy based on the evaluation results after evaluating the available alignment strategies on several small pieces of the ontologies. The method of recommending the alignment strategy includes. - 15 -.

(46) Linköping University Department of Computer and Information Science. different steps: selecting segment pairs, segment pair alignment, evaluation and recommendation [5].. 2.5.1 Selecting Segment Pairs Instead of using the complete ontology, only a part of ontology will be selected and treated as ontology. The selected segments from input ontologies will represent pieces of the knowledge that the ontologies represent [5].. 2.5.2 Segment Pair Alignment The alignment on the selected segment pairs will be generated or it will be defined by domain experts if already available [5].. 2.5.3 Evaluation and Recommendation KitAMO [21] can be used for the evaluation process. It provides an integrated system for comparative evaluation and analysis of alignment strategies and their combinations. It notifies on similarity values and evaluates strategies, combination weights and thresholds based on their performance and the quality of their alignments [5]. The recommendation process (discussed in section 4.4) will recommend the best alignment strategies based on the performance of the strategies on the segment given and that would be available for use while computation of suggestions.. - 16 -.

(47) Linköping University Department of Computer and Information Science. 3. Requirements. 3.1. Main Requirements. The main requirement is to develop a session based ontology alignment system which supports large ontologies and introduces the notion of computation and validation sessions.. 3.1.1 Framework Requirements The required key features for the framework are listed below: o Supports the alignment of large ontologies o Introduce the notion of computation and validation sessions for relations and concepts o Knowledge obtained in one session would be available for use in other session. 3.1.2 Supporting Features The supporting key features for implementing the framework are listed below: o Users should be able to store and load sessions in order to align in multiple sessions o The accepted and rejected suggestions should be available for further use o The system should show the recommended alignment strategy provided by a recommendation process o Users should be allowed to use the recommended alignment strategy or may define his own parameters. - 17 -.

(48) Linköping University Department of Computer and Information Science. 4. Analysis and Design. 4.1. Description. The SAMBO [4] has been used as base system for aligning large ontologies with the notion of computation and validation session. For this purpose SAMBO has been analyzed in detail and the alignment framework is enhanced to meet the session based alignment. The framework architecture describes the alignment algorithm integrated with the computation and the validation sessions along with the recommendation process. Further every module is described separately in this chapter.. 4.2. Framework Architecture. Figure 4.1 shows the architecture for our session based framework.. Figure 4.1 Session based framework for aligning large ontologies. We have divided our framework in three processes in which two sessions are integrated into each other. The Computation session [more details in section 4.3] is used to compute. - 18 -.

(49) Linköping University Department of Computer and Information Science. the suggestions after applying alignment algorithms including several matchers, combinations and filters. The output of a computation session will be used in a validation session [more details in section 4.4] where users have an option either to accept or reject the suggestions. The conflict checker will be applied on accepted suggestions in order to remove the conflicts and then a Partial Reference Alignment (PRA), i.e. a list of current mappings, will be generated for further use in the future iterations and in recommendation process. PRA will be used at different steps of ontology alignment [6]. At preprocessing, PRA will be used to divide the ontologies into mappable parts containing mappings. It can also be used to compute similarities between terms and to filter mapping suggestions. The PRA based matchers can be created after using the underlying properties of the mappings in PRA. The recommendation process [more details in section 4.5] will be working independently using the results from both of the sessions and recommending better matchers and filters for further computation of the suggestions. The framework architecture presented in figure 4.1 is the extension of work showed in figure 2.3. The presented architecture is using the alignment strategies of SAMBO by adding the notions of computation and validation sessions and separating the suggestions after user actions. The alignment strategy defined for SAMBO in figure 2.3 is making possible to reuse only accepted suggestions but new framework introducing PRA and making it possible to use the rejected suggestions also. The recommendation process is totally new work in this system which will work parallel independently.. 4.3. Computation Session. Figure 4.2 show the flow of computation session in the system. The computation session gets two source ontologies as input and has matchers and filters for aligning algorithms. The pre-processor in this process is used to determine whether it is the first computation of suggestions or there is any PRA available. The PRA will be available from validation session but if it is computing for the first time then there is no PRA available. The preprocessor is also used to find out whether there is any saved session data from previous work or not. If any saved data is found then users can load that session or can start a new session. The matchers are used to implement the strategies based on linguistic matching, structure-based matching, constraint-based approaches, instance-based strategies and strategies that use the auxiliary information or a combination of these. The matchers calculate the similarity values between the terms from different source ontologies. The suggestions are then generated by combining and filtering the results determined from one or more matchers. The user can select different matchers, and can set threshold value to obtain the results. The use of different matchers, combinations and filters we get results in different ways and then we get different alignment suggestions. The suggestions list generated from computation process will be used as an input in validation session.. - 19 -.

(50) Linköping University Department of Computer and Information Science. Figure 4.2 Computation Session. 4.4. Validation Session. Figure 4.3 show the flow of validation session. The validation process uses the suggestion list generated from the computation process as an input. These suggestions will be presented to the user. The user will perform an action on each and every suggestion by accepting or rejecting the suggestion. The acceptance or rejection of presented suggestion may have influence on other suggestions. The conflict checker algorithm is used to detect the unclassifiable concepts and can be used to remove redundancy on user request. These algorithms will only be applied on accepted suggestions and then these will be called Partial Reference Alignment (PRA). The rejected suggestions and PRA can be used in recommendation process and in computation session for re-computing the suggestions. The user can save the session at any time; in that case the system will be able to store the user information along with applied matchers and filters for the process and the list of generated suggestions. Next time when the user will load the session he will get all the information back. All the previously stored information will be available to him for further usage.. - 20 -.

(51) Linköping University Department of Computer and Information Science. Figure 4.3 Validation Session. 4.4. Recommendation Process. Figure 4.4 shows the flow of recommendation process. The recommendation process is a part of future work for this system and we are just introducing the process. We have not implemented this part in this thesis work. This process will work independently along with computation and validation sessions. The user will only be able to see and use the output of this process. This process will use multiple computation sessions with different combinations of matchers and filters, resulting in multiple suggestion lists and will compare the suggestions lists with rejected suggestions and PRA, that were generated in validation sessions. After making comparison this process will recommend the matchers and filter combinations to the user to get better results. The process will generate an XML file containing recommended settings and the computation session will use that file and will show all the recommendations to the user before starting computation and the user may use these settings or can define his/her own.. - 21 -.

(52) Linköping University Department of Computer and Information Science. Figure 4.4 Recommendation Process. 4.5. The flow of System. Figure 4.5 shows the sequence diagram for the system. A user will select the ontology files from ontology source which will be uploaded by the system. Now the user has a chance to select the matchers, combination and filters from their respective sources. After starting the computation process the system will apply all the matchers, combination and filters on uploaded ontologies in order to compute suggestions. All the computed suggestions will be presented to the user for further action. After finishing computation, user will go to validate the suggestions which he may accept or reject. The rejected suggestions and the Partial Reference Alignment (PRA) will be available for further use in recommendation process for next possible computations, on same ontologies along with recommendation settings. In the end alignment results will be shown to the user.. - 22 -.

(53) Linköping University Department of Computer and Information Science. Figure 4.5 the sequence diagram. - 23 -.

(54) Linköping University Department of Computer and Information Science. 5. Implementation. 5.1. Technology. For implementing session based functionality, we have used open source java technologies, JSP and Servlet. The figure 5.1 shows the general architecture of JSP/ Servlet framework. We are using XML to store all the information for the session at the path specified by the user.. Figure 5.1: JSP Model Architecture [22]. JSP and Servlet can be used together for rapid development of platform independent applications which have enhanced performance, separated business logic with user interface design, and ability to extend into enterprise applications [23]. We are using SAMBO as base system for this thesis work which is implemented in Java technology so this provides additional benefit for using this technology for implementation.. 5.2. Information Storage. In this thesis work, we are storing all information and session data in XML format on user specified path on the system. The traditional database systems like SQL Server or MySQL can be used but we preferred XML because we are storing session data on user machine and the user can store generated files anywhere on the system, no installation is required. We have given opportunity to the users to specify where to store the data. We discourage to store session data on the server because it may occupy a lot of space. Our system is using different XML files based on the needs of the system. Here follows the description of all the files.. 5.2.1 User Information The system is using users (.) xml file shown in figure 5.2 for storing the user name and. - 24 -.

(55) Linköping University Department of Computer and Information Science. password along with data path for session storage for the specific user. This file is available on the server. The system verifies the username and password from this file.. Figure 5.2: users.xml. 5.2.2 Session Information We have implemented the functionality that the system will always try to store the session data when the user clicks on the exit button. For storing the session information containing the user name, session id and other used information for specifically stored session, the system creates an xml file with the user name running the session shown in figure 5.3.. Figure 5.3: tester.xml. This file contains user name, session type, ontology files, and the selected color values for both files, threshold value, the matchers and the weight values that were being used at the time of session storage. When the user will start the system and enters user name and password, the system will check whether there is any stored session information available or not based on above discussed xml file. If the system found any information then the screenshot given in figure 5.4 will be showed to the user and at this point user can continue his previous session or can start a new session with same or newly selected ontologies.. Figure 5.4 user options. - 25 -.

(56) Linköping University Department of Computer and Information Science. If user clicks on load session button, the system will load all the stored information and will enable user to use previously stored session for further usage if interested and return the user to the point where he was in his last session. In case of new session, this stored information will remain saved on the system and user will upload new or same ontology files for further processing. All the files generated to store session information and session data will be removed every time when user will click on finish button after finalizing the mapping suggestions.. 5.2.3 Session Data At the same time, two other files will also be created containing the information of processed suggestions shown in figure 5.5 and a history list shown in figure 5.6. The suggestions list contains all the suggestions that are not processed yet by the user, and the history list will contain all the action details of a user on the suggestions. These files contain suggestion pairs and the system reads these files and stores all the pair values for processing. The pairs from SuggestionsList (.) xml will be loaded in unprocessed vector list and the pairs from HistoryList (.) xml will be loaded into processed vector list.. Figure 5.5: tester_SuggestionsList.xml. History list given in figure 5.6 also stores the action taken on that suggestion pair with assigned new name and the user comments. If users have skipped any suggestion then it will be named as null.. Figure 5.6: tester_HistoryList.xml. Both files will be stored under the alias of user name before the file descriptive name. The recommendation process will recommend the computation settings for relations in the form of xml shown in figure 5.7. This process will separately store a file containing computation recommendations for relations. Recommendations include threshold value, matcher names and weight values.. - 26 -.

(57) Linköping University Department of Computer and Information Science. Figure 5.7 RelationRecommendations.xml. The figure 5.8 shows how system will display the information provided in RelationRecommendation (.) xml. At this point the user can start computation with recommended settings or can define his/her own.. Figure 5.8 Screen shot for allowing user to use computation recommendations for aligning relation. The recommendation process will also recommend the computation settings for aligning concepts in the form of xml shown in figure 5.9. This process will separately store a file containing computation recommendations for concepts. Recommendations include threshold value, matcher names and weight values.. Figure 5.9 ConceptRecommendations.xml. The figure 5.10 shows how system will display the information provided in ConceptRecommendations (.) xml. At this point the user can start computation with recommended settings or can define his/her own. The system stores some other temporary files along with all these files at different steps. The system stores temporary xml file [tester_temp (.) xml] containing all the information shown in tester (.) xml. This file serves at specific location in the system when the user completes the computation session but does not start validation session, at that point this file will be generated. The system interface moves to validation session but the information should be stored about computation session because the user has not started processing validation session in order to reload the session at the entry point for validation session. The other file is history file for relations to serve the same purpose. When the user clicks on the exit button, these files will be replaced with actual information data files.. - 27 -.

(58) Linköping University Department of Computer and Information Science. Figure 5.10 Screenshot for allowing user to use computation recommendations for aligning concepts. 5.3. Implementation. We have defined very few new classes but have implemented new functions in already defined classes of SAMBO to implement the required functionality and also defined the publicly used variables to support the implementation. The description of all implementation functions is given in section 9.1, appendix A and the description of newly defined classes and defined public variables is given in section 9.2, appendix B at the end in appendices section.. - 28 -.

(59) Linköping University Department of Computer and Information Science. 6. Testing the System. 6.1. Description. In order to verify the working and performance of a system, testing is required. We tested the system in order to verify that our proposed solution is providing session based functionality for aligning large ontologies by using different ontologies.. 6.2. Procedure. The testing procedure includes several test cases containing different types of ontologies along with different matchers, weight and threshold values.. 6.2.1 Test Cases We are presenting few sample test cases with combination of different ontologies. The table 6.1 shows the pair of ontologies used in every test case along with the type of ontology. Test Case. Ontology 1. Ontology 2. 1. ear_MA.owl. ear_MeSH.owl. 2. Behavior_GO.owl. behavior_SO.owl. 3. defense_GO.owl. defense_SO.owl. 4. dd_anatomy.owl. dd_anatomy2.owl. 5. eye_MA.owl. eye_MeSH.owl. 6. nose_MA.owl. nose_MeSH.owl. Description. MA: Mouse Anatomy Mesh: Medical Subject Headings GO: Gene Ontology SO: Sequence Ontology GO: Gene Ontology SO: Sequence Ontology Dictyostelium Discoideum Anatomy [24] MA: Mouse Anatomy Mesh: Medical Subject Headings MA: Mouse Anatomy Mesh: Medical Subject Headings. Table 6.1 Test cases. We used these test cases in order to check that either the system is providing session based alignment approach and is generating PRA for an iterative process. On storing an active session the system has to generate XML files containing the session information and data along with the user information as described in detail in section 5.2 under information storage.. 6.2.2 Matchers, Weight and Threshold We have used all possible combinations of matchers to generate the suggestions from test case ontologies defined in table 6.1 and then saved and loaded the session successfully. We have used fixed weight value 1.0 for all but have applied changed threshold value also. Here are including only 0.6 as threshold value that is most commonly used. The table 6.2 shows the available and possibly used matchers applied on ontology pairs along with weight and threshold values.. - 29 -.

(60) Linköping University Department of Computer and Information Science. To Align. Relations Concepts. Matcher %ame. TermBasic TermBasic TermWN. Weight Value. Threshold. 1.0 1.0 1.0. 0.6 0.6. Table 6.2 Matchers and Threshold applied on Test cases. 6.3. Results. While clicking on exit button during an active validation session the system stores the session information on a given path mentioned by the user in users (.) xml. On saving the session, the system creates some files containing required session data. The first file that system creates is tester (.) xml which is shown in figure 5.3 containing the general information about the user session. Secondly, the system generates suggestions list shown in figure 5.5 containing all the suggestions requiring users action and history list shown in figure 5.6 containing all the suggestions which have been processed by the user. When the user login, the system checks on the data path given in users (.) xml shown in figure 5.2 that if there is any stored session information available, if any stored session is available then system shows the information as showed in figure 5.4 asking the user whether to load the previously stored session or may start new session. If user goes to load the stored session then system takes the user at the point where he saved the session. The system has also been tested by taking the inputs from recommendation process shown in figure 5.7 and the resulted screen output is given in figure 5.8. The shown XML files are generated from using test case 2. All the test cases will generate the same kind of files but with different data.. - 30 -.

(61) Linköping University Department of Computer and Information Science. 7. Conclusion and Future Work. 7.1. Conclusion. In this thesis work we have extended the research in [4] and [6] by developing a session based system for aligning large ontologies. We have described and used SAMBO as base system for our work and have enhanced the functionality of this system to align large ontologies with session based approach. We have introduced the notion of computation and validation sessions for aligning relations and concepts. The validation session starts after the completion of the computation session. The information obtained in the sessions can be reused in iterative alignment process. The integrated recommendation process has been proposed in this work which will be running in parallel with computation and validation sessions. This process will run multiple computation sessions based on different alignment strategies and then will evaluate and suggest the best suitable alignment strategy to the user. The user may have an option to use recommended strategy or define own alignment strategy.. 7.2. Future Work. We have proposed a recommendation process in this thesis work and the implementation for recommendation process is a part of future work for this system. Recommendation process will apply different alignment strategies based on different matchers, combination and filters, and then this will recommend the best strategy by evaluating the results produced. At the moment this system is supporting multiple session storage aligning the same two ontologies for a single user but in the future there is a need for multiple session storage for aligning multiple pairs of large ontologies for a single user.. - 31 -.

(62) Linköping University Department of Computer and Information Science. 8. Reference List. [1]. What Is The Semantic Web? At webopedia. See link: http://www.webopedia.com/DidYouKnow/Internet/2007/Semantic_Web.asp Link visited on August 17, 2010 at 1045 AM. [2]. Semantic Web Organization, see link: www.semanticweb.org Link visited on July 01, 2010 at 1055 AM. [3]. NCBO BioPortal. http://bioportal.bioontology.org/ Link visited on August 17, 2010 at 1115 AM. [4]. Lambrix P, Tan H, SAMBO - A System for Aligning and Merging Biomedical Ontologies, Journal of Web Semantics, Special issue on Semantic Web for the Life Sciences, 4(3):196-206, 2006.. [5]. Tan H, Lambrix P, A method for recommending ontology alignment strategies, Proceedings of the 6th International Semantic Web Conference, LNCS 4825, 494-507, Busan, Korea, 2007. © Springer-Verlag.. [6]. Lambrix P, Liu Q, Using partial reference alignments to align ontologies, Proceedings of the 6th European Semantic Web Conference - ESWC09, LNCS 5554, 188-202, Heraklion, Greece, 2009. © Springer-Verlag.. [7]. Tim Berners-Lee, James Hendler and Ora Lassila , the Semantic Web, Scientific American Magazine, May 2001. [8]. Neches R., Fikes R., Finin T., Gruber T., Senator T., and Swartout, W. Enabling technology for knowledge engineering, Al Magazine 12(3):26-56, 1991. [9]. Tom Gruber, Ontology in the Encyclopedia of Database Systems, Ling Liu and M. Tamer Ozsu (Eds.), Springer-Verlag, 2009. See also http://tomgruber.org/writing/ontology-definition-2007.htm. [10]. A review document on Ontology provided by IQlue, a division of siOnet Ltd. See link: www.iqlue.com/Ontology.pdf Link visited on July 01, 2010 at 1130 AM. [11]. Robert Stevens, Carole A. Goble and Sean Bechhofer, Knowledge-based representation of Bioinformatics, Oxford Journals 2000. See also http://bib.oxfordjournals.org/cgi/reprint/1/4/398. [12]. OWL Web Ontology Language http://www.w3.org/TR/owl-features/. Link visited on September 09, 2010 at 1235 AM. [13]. The Gene Ontology, see link http://www.geneontology.org/ Link visited on August 09, 2010 at 1413 PM. [14]. Dominique Estival, Chris Nowak and Andrew Zschorn, Towards Ontology-Based Natural Language Processing, Human Systems Integration Group Defense Science and Technology Organization. See link http://acl.ldc.upenn.edu/W/W04/W04-0609.pdf. [15]. WordNet; A lexical database for English, see link http://wordnet.princeton.edu/ Link visited on August 09, 2010 at 1725 PM. - 32 -.

(63) Linköping University Department of Computer and Information Science [16]. Natalya F. Noy and Deborah L. McGuinness, Ontology Development 101: A Guide to Creating Your First Ontology. See Link http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html. [17]. Research in Ontology Alignment. See Link http://faculty.washington.edu/gennari/OntoAlign.html Link visited on August 09, 2010 at 1800 PM. [18]. Lambrix P, Tan H, Jakoniene V, Strömbäck L, Biological Ontologies, chapter 4 in Baker, Cheung, (eds), Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, 85-99, Springer, 2007. ISBN-10: 0-387-48436-1, ISBN-13: 978-0-387-48436-5.. [19]. Onto Web Consortium (2002) Deliverables 1.3 (A survey on Ontology tools) and 1.4 (A survey on methodology for developing, maintaining, evaluating and reengineering Ontologies). See also: www.ontoweb.org. [20]. Related projects at the ontology alignment source. See link http://www.atl.external.lmco.com/projects/ontology/ Link visited on August 09, 2010 at 1235 PM. [21]. Lambrix P, Tan H, A Tool for Evaluating Ontology Alignment Strategies, Journal on Data Semantics, LNCS 4380, VIII: 182-202, 2007. © Springer-Verlag.. [22]. Servlet and JSP pages best practices. See link: http://java.sun.com/developer/technicalArticles/javaserverpages/servlets_jsp/ Link visited on July 01, 2010 at 1140 AM. [23]. Java Server Pages Overview at http://java.sun.com/products/jsp/overview.html Link visited on July 01, 2010 at 1205 AM. [24]. The Open Biological and Biomedical Ontologies bin/detail.cgi?dictyostelium_discoideum_anatomy Link visited on September 14, 2010 at 1330 PM. - 33 -. at. http://obofoundry.org/cgi-.

(64) Linköping University Department of Computer and Information Science. 9. Appendices. 9.1. Appendix A: Implemented Functions Description. Class or Page Bame Private Functions Public Functions Private Variables Public Variables Package Bame Description. Class or Page Bame Private Functions Public Functions. Public Variables Package Bame Description. Class or Page Bame Private Functions Public Functions Public Variables Package Bame Description. Class or Page Bame Private Functions Public Functions. startSession.jsp. This JSP page has been created to facilitate the user to start a new session in case he/she have an option either load a session or start a new session.. LockSessionServlet.java createXmlTree(); getHistoryInfo(); getHistoryXML();. This Servlet is created to Lock the current active session. The instance of this call has been called every time the user clicks on exit button to check if there is any information to store or not.. LoadSessionServlet.java setRequestedAttributes(); displayClassStarterForm();. This Servlet is used to load the session information and process the requests accordingly. Many of the functions are same as defined in LoadFileServlet.java. The functionality of this class in case of loading a session to some extent is same as the functionality of MainServlet.java in normal process. The newly added functions are given above.. SessionManager.java loadSuggestionsFromXML();. - 34 -.

(65) Linköping University Department of Computer and Information Science. loadProcessedSuggestionsFromXML(); getSuggestionsXML(); getHistoryXML(); loadRecommendationsFromXML(); Public Variables Package Bame Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. se.liu.ida.sambo.session This class is performing the functionality to store xml files and to read stored session data from different XML files and load the required information in different global variables for further processing.. checkSavedSession() Synchronized Boolean Login.jsp This function is used to check if there is any saved session available. This function just check the existence of xml file named as user name (eg. Tester.xml) and returns true or false. If any file exists then loadSession() function will be called otherwise createFileUploadForm() function will be called.. LoadSession() synchronized String email, String filePath, HttpServletRequest req String str; Login.jsp This function is used to load the saved information from XML file with user name (e.g tester.xml) into system. This function will read values from xml and will assign those values to different variable for further functioning.. createRecommendationForm() Public SettingsInfo settings; int step; String startform; se.liu.ida.sambo.ui.web.FormHandler This function is used to create recommendation form and returns html form code in string. The submission of this form. - 35 -.

(66) Linköping University Department of Computer and Information Science. will call Main Servlet. This function checks whether there are any recommendations available or not; if available then this calls another function createRecommendationMatcherForm() by passing step value to this. For checking recommendations this function only check for the xml files (e.g ComputationRecommendations.xml or ValidationRecommendations.xml) if xml files exist then recommendations are available otherwise no recommendations.. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame. createRecommendationMatcherForm() Private int step; String selectStr; se.liu.ida.sambo.ui.web.FormHandler This function creates a form based on the recommended settings provided by recommendation process as xml file (e.g ComputationRecommendations.xml or ValidationRecommendations.xml). This function calls another function loadRecommendationsFromXML() to load the recommended settings from xml files depending upon for which step and this function passes the step value to that function and in return gets the values in different variables and use them to create form.. getAllParameters() Private HttpServletRequest req; MainServlet This function is extracting all the required values from Servlet Request parameter and stores them in different variables for further use in the system globally.. getSuggestionsXML() Public String filename; void se.liu.ida.sambo.Merger.MergeManager;. - 36 -.

(67) Linköping University Department of Computer and Information Science. with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. This function is used to create the xml file (e.g tester_SuggestionsList.xml) containing all the suggestions that are not processed yet or requires any action. The filename contains the file name for newly generated xml file along with the storage path.. loadSuggestions() Public int step; double[] weights; double threshold; Vector generalSuggestionVector; se.liu.ida.sambo.Merger.MergeManager; This function is used to load the suggestions that were not being processed in previous session based on the step, used weights and the threshold value.. Function Bame Function Scope Input Parameters (type) Output Parameter Member of Class Bame with Package Description. loadProcessedSuggestions() Public int step, double[] weights, double threshold Void se.liu.ida.sambo.Merger.MergeManager;. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. reFinalizeSlotSuggestions() Public. Function Bame Function Scope. This function is used to load the suggestions that were taken care or being processed in previous session based on the step, used weights and the threshold value.. Void se.liu.ida.sambo.Merger.MergeManager; This function is calling another function historyStack.removeAllElements() to clear the values in the list.. loadSlotSugs() Public. - 37 -.

(68) Linköping University Department of Computer and Information Science. Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. double[] weight, double threshold; Vector se.liu.ida.sambo.algos.matching.MatchingAlgos This function calls another function slotSimValues.loadPairList to load the unprocessed pair list and this function returns vactor list to loadSuggestions() function in MergeManager class.. loadProcessedSlotSugs() Public double[] weight, double threshold; Vector se.liu.ida.sambo.algos.matching.MatchingAlgos This function calls another function slotSimValues. loadProcessedPairList() to load the processed pair list and this function returns vactor list to loadProcessedSuggestions() function in MergeManager class.. loadClassSugs() Public double[] weight, double threshold; Vector se.liu.ida.sambo.algos.matching.MatchingAlgos This function calls another function classSimValues.loadPairList() to load the unprocessed pair list and this function returns vactor list to loadSuggestions() function in MergeManager class.. loadProcessedClassSugs() Public double[] weight, double threshold; Vector se.liu.ida.sambo.algos.matching.MatchingAlgos This function calls another function classSimValues. loadProcessedPairList() to load the processed pair list and this function returns vactor list to loadProcessedSuggestions(). - 38 -.

(69) Linköping University Department of Computer and Information Science. function in MergeManager class.. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. loadPairList() Public double[] weight, double threshold; Vector list; se.liu.ida.sambo.algos.matching.algos.SimValueConstructor. This function loads unprocessed suggestion pairs in list vector. This function compare the values with loaded suggestions vector from XML and use only those which are matching pairs and returns list to another functions loadClassSugs() or loadSlotSugs() depending which function is calling.. loadProcessedPairList() public double[] weight, double threshold; Vector list; se.liu.ida.sambo.algos.matching.algos.SimValueConstructor. This function loads processed suggestion pairs in list vector. This function compare the values with loaded processed suggestions vector from XML and use only those which are matching pairs and returns list to another functions loadProcessedClassSugs() or loadProcessedSlotSugs() depending which function is calling.. getHistoryXML() Public String filename; Void se.liu.ida.sambo.session.SessionManager; This function is used to create the xml file containing all the processed suggestions by reading the history track. This function calls getCurrentHistory() function of MergeManager class and then stores all the information in XML file. This function generate file with provided filename at given path.. - 39 -.

(70) Linköping University Department of Computer and Information Science. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. loadSuggestionsFromXML() Public String filePath; Void se.liu.ida.sambo.session.SessionManager; This function has been provided a file at specific path to read the stored suggestions and then this function loads all the suggestions in globally declared vector.. loadProcessedSuggestionsFromXML() Public String filePath; Void se.liu.ida.sambo.session.SessionManager; This function has been provided a file at specific path to read the stored suggestions and then this function loads all the suggestions in globally declared vector. loadRecommendationsFromXML() Public Int step; Void se.liu.ida.sambo.session.SessionManager; This function is used to load the recommendations from xml file depending on for which step. This function will check the recommendation xml file according to the provided step value.. createXmlTree() Public String filename; Void LockSessionServlet.java This function is used to create the xml file with the user name provided in filename variable along with path. This file stores the information such as used matcher, matcher values, weight. - 40 -.

(71) Linköping University Department of Computer and Information Science. values, threshold value, step value along with user name.. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. getHistoryXML() Public String filename, MergeManager merge, SettingsInfo settings; Void LockSessionServlet.java. Function Bame Function Scope Input Parameters. getHistoryInfo() Public History history, SettingsInfo settings, BufferedWriter bufferedWriter; Void LockSessionServlet.java. Output Parameter (type) Member of Class Bame with Package Description. This function is used to store all the processed suggestions in xml file. This function further calls another function getHistoryInfo().. This function is used to write the attribute of processed pairs like name, comments, new name etc and returns to the calling function.. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. setRequestedAttributes() Public int step, HttpServletRequest request; Void LoadSessionServlet.java. Function Bame Function Scope Input Parameters Output Parameter (type) Member of Class Bame with Package Description. displayClassStarterForm() Public PrintWriter out, SettingsInfo settings; Void LoadSessionServlet.java. This function is used to set all the required parameters to Http Servlet Request based on the step value.. This function is used to show the start form for class step.. - 41 -.

(72) Linköping University Department of Computer and Information Science. 9.2. Appendix B: Implemented Classes and Public Variables. Class Bame Package Bame. Commons.java se.liu.ida.sambo.session; List of public Parameters BO Bame Type Description 1 colorOnt1 java.lang.String Color value for Ontology 1 2 colorOnt2 java.lang.String Color value for Ontology 2 3 Ctime java.lang.String Current Time 4 currentPosition Int The current page position of user 5 DATA_PATH java.lang.String File storage path for the user 6 hasProcessStarted Bool If the process has started! 7 isFinalized Int If user have finalized! 8 isFinished Bool If user have finished! 9 isLoadedSession Bool If loaded session or new session! 10 LATime java.lang.String Last Accessed Time 11 Matchers_Available java.lang.String[] Array of available matchers 12 OWL_1 java.lang.String Ontology file 1 13 OWL_2 java.lang.String Ontology file 2 14 RecommendedThresholdValue java.lang.String Recommended Threshold Value 15 Session_Type java.lang.String Defining at which stage the system is right now. 16 STEP_VALUE java.lang.String At which step the system is right now 17 Strings java.lang.Object[] Array of Objects used to store temporary values at different stages 18 strProcessedSuggestionsAction java.lang.Object[] Used to store the actions of processed suggestions 19 strProcessedSuggestionsCommen java.lang.Object[] Used to store the Comments t of processed suggestions 20 strProcessedSuggestionsName java.lang.Object[] Used to store the new assigned name of processed suggestions 21 strProcessedSuggestionsPair java.lang.Object[] Used to store the pair of processed suggestions 22 strRecommendedMatchers java.util.ArrayList Array List of Recommended <java.lang.String> Matchers 23 strRecommendedWeight java.util.ArrayList Array List of Recommended <java.lang.String> Weight values 24 THRESHOLD_VALUE java.lang.String used Threshold value. - 42 -.

(73) Linköping University Department of Computer and Information Science. 25. UsedMatchersList. 25. UsedWeightValuesList. 26. USER_NAME. java.util.ArrayList <java.lang.String> java.util.ArrayList <java.lang.String> java.lang.String. 27. vList. java.util.Vector. 28. XMLProcessedSuggestionVector. java.util.Vector. 29. XMLSuggestionVector. java.util.Vector. - 43 -. ArrayList to store used matchers in session ArrayList to store used Weight Values in session User name of the currently logged in user Vector List used for temporary storage of list values A vector used to store the processed suggestion list from XML file A vector used to store the suggestion list from XML file.

(74) Linköping University Department of Computer and Information Science. 9.3. Appendix C: Setup Guide for the System. 1. Install and configure Apache Tomcat 1.1 The current system has been tested under Apache Tomcat/6.0.26 or 6.0.18. Bote: Documentation and binaries for Tomcat are available at http://tomcat.apache.org 1.2 After installation, locate the “Connector” in <Tomcat home>/ conf/ server.xml with port="8080" setting, and modify the port value. (e.g. 8086). Bote: This is to resolve the problem of port confliction with DIG Reasoner. 1.3 Copy xercesImpl.jar and xml-apis.jar from <Project Folder>\web \WEBINF\lib to <Tomcat home>\common\ endorsed Botes: a) This is to resolve the Xerces conflict problem in jakarta-tomcat-5.5.x b) We may need to manually create the folder “endorsed” at <Tomcat home> if not exists. 2. Install SAMBO 2.1 Make a directory “sambo” under <Tomcat home>\webapps\ 2.2 Copy things from “<Project Folder>/build/web” to the “sambo” directory. 3. Configure SAMBO 3.1 In order to use matcher TermWB, we need to install WordNet and specify its location a) Install WordNet 2.1 for Windows Bote: Available at http://wordnet.princeton.edu/wordnet/download/ b) Open the file wordnet.xml under <Tomcat home>/ webapps /sambo /config, set "dictionary_path" as the location of WordNet dictionary file (e.g. <param name="dictionary_path" value="Y:\WordNet-2.1\dict"/>) 3.2 In order to reason about the ontology, we need to run a DIG Reasoner and specify its address in SAMBO. a) Start Racer.exe in the <Project Folder>. BOTE: we run this under academic license. b) The address of reasoner will be shown in the opened window. (e.g. http://130.236.182.207:8080/) 4. Run SAMBO 4.1 (Re)start the web server a) Open e.g. "http://130.236.184.146:8086/sambo/" or "http://localhost:8086/sambo/" in your web browser. 4.2 Enter your data path against your user name to store the session data files in users.xml under the path “<Project Folder>/build/web/xml/” 4.3 When logging in, use the user account “tester” with password “tester”.. - 44 -.

No results found