• No results found

Integration of Biological Data

N/A
N/A
Protected

Academic year: 2021

Share "Integration of Biological Data"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköping Studies in Science and Technology

Dissertation No. 1035

Integration of Biological Data

by

Vaida Jakonienė

Department of Computer and Information Science

Linköpings universitet

SE-581 83 Linköping, Sweden

(2)
(3)

Data integrationis animportant pro edureunderlying many resear htasks in the life s ien es, as often multiple data sour es have to be a essed to olle ttherelevantdata. Thedatasour esvaryin ontent,dataformat,and a ess methods, whi h often vastly ompli ates the data retrieval pro ess. As a result, the task of retrieving data requires a great deal of eort and expertise on the part of the user. To alleviate these di ulties, various information integrationsystemshavebeen proposedinthearea. However,a numberof issuesremainunsolvedand newintegrationsolutions areneeded. Theworkpresentedinthis thesis onsiders dataintegration atthree dif-ferentlevels. 1) Integration ofbiologi aldata sour esdeals withintegrating multiple datasour esfromaninformation integration systempointof view. We study properties ofbiologi al datasour es and existingintegration sys-tems. Based on the study, we formulate requirements for systems integrat-ingbiologi aldatasour es. Then, wedene aquerylanguage thatsupports queries ommonly used by biologists. Also, we propose a high-level ar hi-te ture for an information integration system that meets a sele ted set of requirementsandthatsupportsthespe ied querylanguage. 2) Integration of ontologies deals withndingoverlappinginformation between ontologies. Wedevelop andevaluate algorithmsthatuselife s ien eliteratureand take the stru ture of the ontologies into a ount. 3) Groupingof biologi al data entriesdealswithorganizingdataentriesintogroupsbasedonthe omputa-tionofsimilarityvaluesbetweenthedataentries. Weproposeamethodthat oversthemainsteps and omponentsinvolvedinsimilarity-basedgrouping pro edures. The appli ability of the method is illustrated by a number of test ases. Further, we develop an environment that supports omparison and evaluationof dierent grouping strategies.

The work is supported by the implementation of: 1) a prototype for a system integrating biologi al data sour es, alled BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-basedgrouping ofbiologi al data, alledKitEGA.

(4)
(5)

Many people have supported my graduate work and made this PhD thesis possible.

Iamgratefultomysupervisor,Asso iateProfessorPatri kLambrix,for his supportand guidan e duringthis work. His onstru tive ommentsand our many onversations brought insightandhelpedtoshapethis thesis. His en ouragement,patien e, and devotion asatea herhelped meto growasa resear her. I amgladthatIhad the opportunityto work withhim.

Iwouldliketoexpressmyappre iationtoProfessorNahidShahmehrifor providingvaluable omments,pointingoutimportantaspe tsoftheresear h worldand givingsupportduring this work.

The members of IISLAB (Laboratory for Intelligent Information Sys-tems) reated a stimulating and supportive working environment. I am thankfulfortheirfriendshipoveralltheseyears. Spe ially,Iwanttomention my student olleagues: Shanai Ardi, Ioan Chisalita,Claudiu Duma, Almut Herzog, DennisMa iuszek,He Tan,Eduard Tur anand Cé ile Åberg.

This work would have been mu h harder without the support of my family,relativesand friends. Iwouldespe iallylike to expressmygratitude to my Mum and Dad for aring so mu h about me, and for wel oming me with su h warmth when Ireturned home to Lithuania. I would also like to thankmysisterforbeingsu hagenerousandjoyfulperson. Iamverylu ky tohaveher. ThefriendsImadeinLinköpingmademystayinSwedenmu h more enjoyable. In parti ular, I want to thank my dearest friends Akvile, Aleksandraand Joe. Igreatly valuetheir ompany andour onversations.

Thisresear h work was funded byCUGS (the national graduate s hool in omputer s ien e). I also a knowledge the nan ial support of the EU Network of Ex ellen e REWERSE (Sixth Framework Programme proje t 506779).

Vaida Jakoniene Linköping, September 2006

(6)
(7)

Thisthesis ontains revisedversionsof the following papers.

1. Lambrix P, Jakoniene V.Towards transparent a ess to multiple bio-logi aldatabanks. Pro eedingsoftheFirstAsia-Pa i Bioinformati s Conferen e,pp 53-60, Adelaide,Australia, 2003.

2. JakonieneV,LambrixP.Informationintegrationsystemsforbiologi al datasour es: requirements andopportunities. Submitted.

3. Jakoniene V, Lambrix P.Ontology-based integration for bioinformat-i s. Pro eedingsoftheVLDBWorkshoponOntologies-basedte hniques for DataBases and Information Systems - ODBIS 2005, pp 55-58, Trondheim, Norway,2005.

4. Tan H, Jakoniene V, Lambrix P, Aberg J, Shahmehri N. Alignment of Biomedi alOntologiesusing LifeS ien e Literature. Pro eedings of the International Workshop on Knowledge Dis overy in Life S ien e Literature,pp 1-17,Singapore,2006. LNBI3886.

5. JakonieneV,RundqvistD, Lambrix P. Amethodfor similarity-based groupingofbiologi aldata. Pro eedingsofthe3rdInternational Work-shop on Data Integration in the Life S ien es - DILS06, pp 136-151, Hinxton, UK, 2006. LNBI4047.

6. JakonieneV,LambrixP.AToolforEvaluatingStrategiesforGrouping of Biologi al Data. Submitted.

(8)
(9)

Related Papers

The followingare relatedresear h arti lesnot in luded inthethesis.

1. DomsA,JakonieneV,LambrixP,S hroederM,Wä hterT.Ontologies and Text Mining asa Basis for aSemanti Web for theLife S ien es. ReasoningWeb,Se ondInternationalSummerS hool,Springer-Verlag, pp 164-183,2006. LNCS 4126.

2. JakonieneV. AStudy in Integrating MultipleBiologi al Data Sour es. Li entiate thesisNo 1149, Linköpings universitet,Sweden, 2005.

3. Lambrix P,Tan H,Jakoniene V, Strömbä k L. Biologi al Ontologies. Chapter inBakerCJO, Cheung KH(eds) Semanti Web: Revolution-izing Knowledge Dis overy in the Life S ien es, Springer, 2006. To appear.

4. Strömbä k L, Jakoniene V, Tan H, Lambrix P. Representing, storing anda essingmole ularintera tiondata: areviewofmodelsandtools. Briengs in Bioinformati s,2006. Invited ontribution. To appear.

Other

1. Ba kofen R,Badea M, Barahona P,Berndtsson M, Burger A, Dawel-bait G,DomsA,FagesF, HotaranA,JakonieneV, KrippahlL, Lam-brix P, M Leod K, Nutt W, Olsson B, S hroederM, S hroi A, Soli-man S, Tan H, Tilivea D, Will S. Requirements and spe i ation of use ases. REWERSE Deliverable A2-D3,2005.

2. Ba kofen R,Badea M, Barahona P,Burger A,DawelbaitG, DomsA, FagesF,Hotaran A,JakonieneV,KrippahlL,Lambrix P,M LeodK, MöllerS,NuttW,OlssonB,S hroederM,SolimanS,TanH,TiliveaD, Will S.Usageof bioinformati stoolsand identi ation of information sour es. REWERSEDeliverable A2-D2,2005.

3. JakonieneV,NilssonR.Abstra tBookof theFourth Swedish Bioinfor-mati s Workshop for PhD studentsandPostDo s,Linköping, Sweden, 2003.

(10)
(11)

Introdu tion ...1 Motivation ...3 ProblemStatement ...5 Contributions ...7 Paper Summaries ... 9 RelatedWork ...11 FutureWork ...14 Referen es ...16

(12)
(13)
(14)
(15)

1 Motivation

Resear hers inareas, su h as, medi ine, agri ultureand environmental s i-en es, intensively use the available biologi al data to answer dierent re-sear h questions or to solve various tasks [CGG03 ℄. One of the main goals is to understand how various organisms fun tion asbiologi al systems. To a hieve this goal, it is important to explore fun tions and intera tions of genome-en oded omponents. This type of knowledge may be used for dif-ferent purposes. For instan e, it is used to identify genes responsible for a disease, to develop drugs enabling treatment of diseases and to predi t organisms' responsesto adrug.

Thesigni an e of these areas, theworldwide interest and theavailable toolsandte hniques ausedthegenerationofanenormousamountof biolog-i al data, su h asDNAand protein sequen es, generegulatory and protein intera tion networks, and se ondary and tertiary stru tures of mole ules. Thisdataisspreadoveralargenumberofautonomousdatasour esthatare oftenpubli lyavailableontheWeb. Forinstan e,858datasour esarelisted inthe2006 DatabaseIssueoftheNu lei A idsResear h[NAR℄journal. As the data sour es are developed and supported independently by dierent groups and organizations, they arehighlyheterogeneous invarious aspe ts. For example,the data sour esvaryin thetype ofthe stored data,the data format, and a essmethods. Further, thereis aterminology dis repan y at the s hema and data levels. In addition to data sour es, a large number of bio-ontologiesdes ribingdomainknowledgearepubli lyavailableinthearea [LTJ06℄. For instan e, OBO[OBO℄, an umbrellaweb addressfor ontologies overing the genomi s and proteomi s domains, lists 29 orthogonal ontolo-gies. Some of the ontologies have rea hed the status of de fa to standard and areusedextensivelyto annotate thedatasour es.

Data integration is an important pro edure underlying many resear h tasks inthe life s ien es,asoftenmultiple data sour es have to bea essed to olle t the relevant data. For instan e, to nd publi ations des ribing a given disease that relates to a ertain type of sequen es may require anal-ysis of data sour es for publi ations, diseases and sequen es together with some otherdatasour es ombining thesetypesofinformation [LMN04℄. To supporthealth areappli ationsbyusingresultsinfun tionalgenomi s,the integration of lini aldataandgenomi data isimportant [MIN04℄.

(16)

steps are performed to a quirethe data: datasour esthat ontain relevant dataaresele ted,queriesoverea hdatasour eareformulatedandde isions are madeon howto ombine theresults. To nd relevant data sour es, the user has to be a quainted with the ontent of dierent data sour es. To formulate a query and de ide on how to exe ute the query, theuser has to be familiar with the ways thedata sour es support data retrieval and how dataatdierentsour esrelatetoea hother. Toexe utethequery,theuser has to know thelo ation of the datasour es thatare spread overtheWeb, thedierent querylanguagesand dataformats. During queryexe ution the user mayneed to translate thedata between dierent formatsand ombine the results. A mistake inany of these steps may either result in ine ient query exe ution or not nding results. The pro ess is also time onsuming sin e a large amount of datais usually pro essed. Data retrieval may take a long time, e.g. when tools are used to a quire the results. As biologi al data sour es hange oftenand datasour es appearand disappear, theuser hasto beaware ofthese hanges.

To alleviate these di ulties various information integration solutions havebeenproposed. Spe ializedintegrationsolutions fo usonsolvinga sin-gletaskbasedonasetofrelevantdatasour es. In ontrast,general purpose information integration systems aim to support a broad rangeof tasks and integration of various data sour es. Su h systems may provide a ommon interfa e through whi h a user a esses multiple data sour es. In this ase the lo ation and dierent query languages of the data sour es are hidden fromtheuser. Othertypesofinformation integrationsystemseven hidethe integrated data sour es from the user. During query pro essing, these sys-tems handlealsothesele tion ofdata sour esthatarerelevant tothequery. However,new integration solutions areneededto better supportlifes ien e resear hers intheir tasks. A numberof open issues remain inthe available integration solutions. For instan e, itmaybe di ultto integratenew data sour esinto theexistingsystemsorto reusethesystemsfor newtasks. Fur-thermore, solutions arela kingfor managingin ompleteand in orre tdata, and for handling semanti heterogeneity. For solving some of theproblems spe ialized solutionshavetobedeveloped whileinother asesdevelopments inother areas ouldbe adapted.

This thesis fo uses on data integration at three dierent levels. This in ludes integration ofbiologi al data sour es, integration ofontologies and integrationorgroupingofbiologi aldataentries. Integrationofbiologi al

(17)

when they want to use biologi al data sour es to nd relevant information for their resear h andanalyzes waysof dealing withtheseproblems in om-binationfromaninformationintegrationsystempointofview. Further,two spe i tasks in integrating biologi al data are dealt with. Integration of ontologies deals with nding overlapping information between ontologies. This in ludes nding relationships, alled alignments, between the related terms intheontologies. Grouping of biologi al data entries deals with organizing data entries into groups based on the omputation of similarity values between the dataentries. Groupingof data entries is an abstra tion of the problemof nding entriesthat represent thesame entityin dierent datasour es thatisa basi operation for integrating thedataentries.

2 Problem Statement

Theworkpresentedinthisthesisaimstodevelopapproa hesandte hniques thatalleviatethe hallengesmetwhenusingandintegrating biologi aldata, and in parti ular, the heterogeneity present at dierent levels in the data and data sour es. The thesis fo uses on the identi ation and analysis of the available knowledge about data and data sour es, and thedevelopment of me hanisms thatusetheavailable knowledgefor integration ofbiologi al data. To a hieve these goals,we fo usonthefollowing tasksinthethesis.

2.1 Integration of biologi al data sour es

In this thesis we deal witha few aspe ts in the ontext of integrating bio-logi al data sour es: requirements and query languages for information in-tegrationsystems,andtheuseofontologies forintegratingthedatasour es. Despite the fa tthat anumberofinformation integration solutionsare pro-posed inthe life s ien es, not somu h resear h hasbeen performed on the requirements for su h systems. Su h a study of requirements is needed as the omplexityofthelifes ien es,thetaskstobesolved,thestyleofthe s i-enti resear h andthe properties oftheavailable datasour esposespe ial requirements for information integration systems in the area. Further, the dieren e in fo us of the existing information integration systems together withdierent design anddevelopment hoi es ledto thefa tthatoften sys-tems support a unique query language. The variety of the available query languages makes it di ult to sele t between the query languages and to

(18)

portanttoknowasubsetofquerylanguageoperatorsthatshouldbepresent inanyquerylanguageforintegratingbiologi aldatasour es,forinstan e,to support the development of new integration solutions. In addition, during the re ent years some solutions were proposedfor using ontologies in infor-mationintegration systems. However,this isstill doneinalimitedwayand onlyasmallpartofthepossibleontology-basedknowledgeis urrentlyused.

Inthis thesiswe fo uson:

Study of requirements for systems providing integrated a ess to bio-logi aldatasour eswithfo usonsystemsprovidingvirtualintegration of datasour es, i.e. preservingautonomy ofdata sour es.

Spe i ation of a query language that allows formulation of dierent typesof queries ommonlyusedby biologists.

Spe i ationofahigh-levelar hite tureforaninformationintegration systemthatmeetsasele tedsetofrequirementsandthatsupportsthe spe ied querylanguage.

Designanddevelopmentofaprototypefor theinformationintegration system. The systemshould onformto thehigh-levelar hite tureand enable deeper exploration of issues related to query pro essing over multiple biologi aldatasour es.

Identify types of ontologi al knowledge publi ly available in the area oflifes ien esandstudyhowthisknowledge ouldbeusedtoenhan e urrent integrationapproa hes.

2.2 Integration of ontologies

The task of aligningontologies isnot well explored and is onsidered to be one ofthemajorissuesinthe lifes ien es[CGG03℄. Anumber ofalignment strategiesareproposed, butfurtherresear handdevelopment ofnew strate-gies areneeded[LT06a,LT06b℄. Forinstan e,not mu hworkhasbeendone on ontology alignment using life s ien e literature asa resour e for nding alignments. Alsonot manystrategiesuseinformationaboutthestru tureof theontologies.

Inthis thesiswe fo uson:

(19)

Studyhowthestru tureofontologies ould beusedinontology align-ment.

2.3 Grouping of biologi al data entries

Manytoolsfor analyzing biologi aldatausesome formofgrouping and are used in, for instan e, dataintegration, data leaning, predi tion of protein fun tionality,and orrelation ofgenes basedonmi roarray data. A number ofaspe tsinuen ethequalityofthegroupingresults: thedatasour es,the grouping attributes and the algorithms implementing the grouping pro e-dure. Manymethods exist,but itis oftennot lear whi h methods perform best for whi h grouping tasks. The study of the properties, and the evalu-ation and the omparison ofthe dierent aspe ts that inuen e thequality of the grouping results, would give us valuable insight inhow thegrouping pro edures ouldbeusedinthebestway. Itwouldalsoleadto re ommenda-tionsonhowtoimprovethe urrentpro eduresanddevelopnewpro edures. To be able to perform su h studies and evaluations we need environments that allow usto ompare and evaluate dierent grouping strategies.

Inthis thesiswe fo uson:

Spe i ation ofa methodthat overs themainsteps and omponents that shouldbein luded inenvironments.

Designanddevelopmentofaprototypeforanenvironment supporting the evaluation of similarity-based grouping pro edures. The environ-ment should be basedon thedened method.

3 Contributions

The main ontributions ofthe thesis arethefollowing: Integration of biologi al data sour es

Study ofbiologi al data sour es. Theresults arepresentedinpaper1 and 2. Paper2 extendsthework done inpaper1.

Identi ation of requirements for information integration systems for biologi al data sour es. Paper 2 presents and dis usses the

(20)

require-•

Study of urrent information integration systems for biologi al data sour es with respe t to the identied requirements. The work is in- luded inpaper2.

Proposal for a query language and ar hite ture for the BioTRIFU 1

system. The ontributions appearinpaper1.

Asafeasibilitystudy andto getan overviewofissuesrelatedto query pro essingovermultiplebiologi aldatasour es,asubsetofthedened query language and the ideas in luded in the ar hite ture denition were implemented in a prototype. The prototype supports the main steps and omponents needed to integrate two data sour es that an bea essed at dierent lo ations. For detailswe refer to [Jak05 ℄.

Identi ation of ontologi al knowledge and its use in information in-tegration systems for biologi al data sour es. Paper 3 dis usses the results.

Proposal of an ontology-based approa h for information integration systemsforbiologi aldatasour es. Theapproa hispresentedinpaper 3.

Integration of ontologies

Development and evaluations of algorithms for ontology alignment. Thealgorithms uselifes ien eliteratureandtakethestru ture ofthe ontologies into a ount. The ontributions aredes ribed inpaper4. The ontology alignment algorithms were implemented and in orpo-rated into theSAMBOsystem[LT06a℄.

Groupingof biologi al data entries

Proposal of amethod for similarity-basedgrouping of biologi aldata. The method isintrodu ed inpaper5.

As afeasibilitystudy,two grouping tasks wereimplemented and ana-lyzedthrough anumberof test ases.

Development and implementation of KitEGA 2

, an environment for evaluating strategies for similarity-based grouping of biologi al data. The environment is based on the proposed method. The tool and its 1

TheRightInformationForyoUinBioinformati s 2

(21)

usearepresentedinpaper6.

The urrent implementation of KitEGA supports the spe i ation of test ases throughtheuseofplug-insanduserinterfa es,and provides anumberofuserinterfa essupportinganalysisofthegroupingresults.

4 Paper Summaries

In this se tion we give short summaries of the six papers in luded in this thesis. Papers 1, 2 and 3 deal with integration of biologi al data sour es, with paper 3 fo using on ontology-based integration. Paper 4 deals with integration of ontologies. Papers 5 and 6 deal with grouping of biologi al data entries.

Paper 1: Towards transparent a ess to multiple biologi al data-banks

In paper1 we dis uss ommon problems met by the users of biologi al data sour es. The dis ussion in ludes a study of urrent biologi al data sour es. Basedontheobservations,thepaperproposesabasequerylanguage that ontains operators that should be present in any query language for biologi al data sour es. Further, the paper presents an ar hite ture for a systemsupportingsu halanguageand enablingtransparentandintegrated a essto biologi al datasour es.

Paper2: Informationintegrationsystemsforbiologi aldatasour es: requirements and opportunities

Inpaper2requirementsforinformationintegrationsystemsintheareaof bioinformati sareidentied. Thispaperextendsthestudyof problemsand requirements identied in paper 1. First, we study biologi al data sour es and identify their properties that make querying multiple biologi al data sour es a di ult task. Then, we formulate requirements for information integration systems for biologi al data sour es. We also dis uss how well urrentinformationintegrationsystemssatisfytheserequirementsand iden-tify opportunitiesfor futureresear h.

Paper 3: Ontology-based integration for bioinformati s

Inpaper3wearguethatthe urrentapproa hesforintegratingbiologi al data sour es should be enhan ed by ontologi al knowledge. We identify

(22)

(ontologies,ontologyalignments,annotations,mappingsbetweendatavalues and ontologi al terms) and propose an approa h to use this knowledge to support integrateda essto multiple biologi al data sour es. We alsoshow that urrent ontology-based integration approa hes only over parts of our approa h.

Paper 4: Alignment of biomedi al ontologies usinglife s ien e lit-erature

In paper 4 we propose strategies for aligning ontologies based on life s ien e literature. We propose a basi algorithm aswell asextensions that takethestru tureofthe ontologiesinto a ount. Weevaluate thestrategies and ompare them with strategies implemented in the alignment system SAMBO. We also evaluate the ombination of the proposed strategies and theSAMBO strategies.

Paper5: Amethodforsimilarity-basedgroupingofbiologi aldata

In paper5 a method for similarity-based grouping is proposed. As the main steps the method ontains spe i ation of grouping rules, pairwise grouping between entries, a tual grouping ofsimilar entries, andevaluation and analysisofthe results. Often,dierent strategies anbeusedinthe dif-ferentsteps. Themethodenables explorationof theinuen eof the hoi es and supports evaluation of the results withrespe t to given lassi ations. Thegroupingmethodisillustratedbytest asesbasedondierentstrategies and lassi ations. The results showthe omplexity of thesimilarity-based grouping tasks and give deeper insights in the sele ted grouping tasks, the analyzeddata sour e,andthe inuen eofdierent strategiesontheresults.

Paper6: A Toolforevaluatingstrategies forgroupingof biologi al data

In paper 6 we present KitEGA, an environment supporting the evalua-tion of grouping strategies. Based on the method presentedin paper5, we propose a framework for omparative evaluation of strategies for grouping data based on the method, and present its urrent implementation. Fur-ther, we illustrate the useof KitEGA by omparing grouping strategies for

(23)

5 Related Work

5.1 Integration of biologi al data sour es

Requirements for general purpose information integration systems for bio-logi aldatasour esontheWebweredis ussedin[DOB95 ℄,[Kar96 ℄,[Won02 ℄ and[HK04℄. Thersttwopaperswerewrittenade adeago. Sin ethen,the area oflife s ien es hasevolved fast: manymore datasour esand tools are publi lyavailableandnewtaskshavetobesolved. Whilesomeoftheearlier dened requirements for information integration systems are still valid in the hanged environment, other requirements need to be re onsidered and new requirements need to be spe ied. The more re ent paper [Won02 ℄ ar-guesforageneralpurposeinformation integrationsystemthatsupports ore fun tionality needed for information integration in life s ien es. Therefore, the denedrequirementsdonot oversomeoftheissuesspe i tothearea. The authors of [HK04℄point out a few highlevelrequirementsfor informa-tionsystemsemphasizingtheneedtoautomateamaximumnumber oftasks while minimizing the amount of timeand intera tions for theuser. The re-quirementsprovided in[HK04 ℄areinlinewiththerequirements spe iedin paper2. Inpaper2therequirementsarespe iedatamoredetailedlevelby lookingat dierentinformation integration aspe tsandfo usingonsystems providing virtualintegration of data sour es.

Withintheareaoflifes ien es severalintegration approa heshavebeen proposedand systemshavebeen implemented. Thisin ludessystemsbased ondatabasete hnology,i.e. virtualandmaterialized(datawarehouses) inte-grationapproa hes. Also,systemsbasedontheSemanti Web,webservi es, grid and agents te hnologies aredeveloped. In this thesis we fo usedon is-sues related to virtual integration. For an overview of su h systems see paper 2. For solving spe ialized tasks, the use of warehouses is a widely adopted integration solution (e.g. [TRM05℄). During the re ent years Se-manti Web te hnologies are being used for resolving s alability, hetero-geneity and reusability problems in the life s ien es. In these approa hes biologi aldataandknowledgeisrepresentedusingSemanti Weblanguages, e.g. XML, RDF and OWL [Muk05℄. A number of studies are ondu ted to explore integrateduseof datarepresentedintheseformats, e.g. [CYS05℄ and[SLD06℄. Also,the useofontologies isproposedtoresolvesemanti het-erogeneityproblems andtosupportknowledgedis overy basedonbiologi al data [Gar05℄. Further, work is ongoing in applying web servi es and grid

(24)

are example proje ts based on these te hnologies. Also, agent te hnology is shown to be useful for meeting integration hallenges inthelife s ien es. The authors in [KBB04 ℄argue thatadvan ed ommuni ation supported by agent te hnology an omplement theSemanti Weband gridte hnologies.

Someoftheavailableinformationintegrationsystemsuseontology-based te hnologies to support querying (e.g. BACIIS[MWL03 ℄, KIND [LGM03℄, SEMEDA [KPL03 ℄ and TAMBIS[GSN01 ℄). A ommon feature is that the integrateds hemasusedinthesesystemsareseenasontologies. In ontrast, in the approa h des ribed in paper 3, we expe t ontologies to be agreed uponandsharedbymanyusers[Lam04 ℄. Asinourapproa h,theintegrated s hemasin ludedomainknowledgeandinformationondatastru turesatthe data sour es. All the systems use the maintained ontology to des ribe the ontent of datasour es. Though it is not expli itly stated, ross-referen es between data sour es are probably used to join the retrieved data items. KIND uses two ontologies des ribing stati and pro ess knowledge, respe -tively. The ontologies ombine domain knowledge from neuroanatomy and neurophysiology. In SEMEDA ontrolled vo abularies an be usedto spe -ify semanti s of data type values. Also, data sour e ontent des riptions an be rened with integrated s hema terms. Ontologi al annotations and mappings between ontology terms arenot taken into a ount inany of the systems.

5.2 Integration of ontologies

Dierent strategies anbeusedtoperformalignmentof ontologies. [LT06b℄ des ribes a general strategy for aligning two ontologies. One of the main omponent types is a mat her responsible for omputing similarities be-tween the termsfromthedierentsour e ontologies. The mat hers an im-plement strategies based on linguisti mat hing, stru ture-based strategies, onstraint-based approa hes, instan e-based strategies, strategies that use auxiliaryinformation ora ombination ofthese. Byusingdierent mat hers and ombining and ltering theresultsindierentways we obtain dierent alignment strategies. Tools forontologyalignmentaredis ussed in[LT06a℄. Someontologyalignmentandmergingsystemsprovidealignment strate-gies using literature, su has ArtGen[MW02℄, FCA-Merge [SM01℄ and On-toMapper[PPF02 ℄. Also,therearesystemsthatimplement alignment

(25)

algo-existen e of previously aligned on epts. For instan e, An hor-PROMPT [NM01℄ determines the similarity of on epts by thefrequen y of their ap-pearan e along the paths between previously aligned on epts. The paths may be omposed of any kind of relations. Also SAMBO as des ribed in [LT05 ℄providessu h a omponent where thesimilaritybetween on epts is augmented based on their lo ation in the is-a hierar hy relative to already aligned on epts. In ontrast, the methods proposed in this thesis do not require previously aligned on epts.

OntoMapperimplements themost similarapproa hto thestrategies de-s ribed in paper 4. OntoMapper provides an ontology alignment algorithm using Bayesian learning. A set of do uments (abstra ts of te hni al papers taken from ACM'sdigital library and Citeseer) is assigned to ea h on ept in the ontologies. Two raw similarity s ores matri esfor theontologies are omputed by the Rainbow text lassier. The similarity between the on- epts is al ulatedbased onthese two matri esusing theBayesian method. When analyzing stru ture of the ontologies, OntoMapper does not require previously aligned on epts andtakesthedo uments fromthesub- on epts intoa ountwhen omputingthesimilaritybetweentwo on epts. However, asthis is hard- oded inthe method,it isnot lear howthestru ture of the ontologies inuen esthe resultof the omputation.

In ontrast tomost otherapproa hes,[CTL06℄usesthestru tural infor-mationnotto omputesimilaritybetweenontologi alterms,butasamethod for ltering wrong results generated bymat hers. The approa h givesgood results whenmanyinitial suggestionsareavailableandthetimeforltering is oftenonly asmall fra tionofthetimefor thesimilarity omputation.

5.3 Grouping of biologi al data entries

There aretwokindsofrelatedwork: evaluationsofgroupingalgorithmsand tools for supporting evaluation ofgrouping algorithms.

A number of evaluations of dierent kinds of grouping algorithms have been performed. For instan e, regarding lustering of gene expression data [YHR01℄proposesameasuretoestimatethepredi tivepowerofa lustering algorithm and ompares twopartitionalandthreehierar hi al lustering al-gorithms basedonthismeasure. [DD03 ℄proposesthreevalidationstrategies and ompares sixalgorithms. Also [GSS03 ℄proposes a newvalidation mea-sure and ompares four lustering methods. Five bi lustering methods for

(26)

ations isthefa tthattheyfo uson lustervalidationfortheevaluationand omparisonofalgorithms. Theyusesyntheti andrealdatasour es. Someof thepapersalsoaimtoproposenewvalidationmeasures. Further,inallthese evaluations, most of theevaluated algorithms needed to bere-implemented for thepurposeof theevaluations.

[CRF03℄presentsthe Se ondString Toolkit for name-mat hing methods whi h ould be used, for instan e, in dupli ate dete tion. Several distan e fun tions for strings are implemented. The algorithms are ompared on a dataset regardingnon-interpolated averagepre ision.

Asystemthatgoessome wayintoprovidingan environment for luster-ingandvalidationistheMa haonClusterValidationEnvironment[BAC05 ℄. Thissystemisintendedfor lusteringof mi roarraydataandevaluatingthe qualityoftheobtained lusters. Thesystemfo useson lustervalidationfor new data sets and therefore uses internal measures based on ompa tness and isolation. The system implements several lustering algorithms, met-ri s (distan e), and internal measures [BA03℄. The user an hoose among these to run a luster taskon a data set. The results are shown asa tree. The highestlevelnodesrepresent the hosen lusteralgorithmswith parti -ular parameter sele tion. The next level represents the results of applying dierentvaliditymeasures to the lusters generated bythealgorithm.

The framework and system (KitEGA) that we propose in papers 5 and 6 aims to go one step further. KitEGA is a platform for evaluating and omparingsimilarity-basedgroupingstrategies. Evaluators anplugintheir own algorithms related to the grouping strategies and the evaluation mea-sures, aswellastheir owndatasets. KitEGA providesthenthesupportfor running the algorithms, and summarizing andanalyzing theresults.

6 Future Work

6.1 Integration of biologi al data sour es

Asweobservedinse tion5.1thefo usoftheresear honintegratingdatain the lifes ien es is reorienting from the useof lassi al database approa hes to the useof web and Semanti Web te hnologies. [Muk05℄ mentions hal-lenges to make the best use of the new te hnologies. First, most of the biologi al dataand knowledgeshould be available intheSemanti Web. To a hieve this, tools supporting automati extra tion of biologi al data from

(27)

Semanti Web areresear h prototypes. Further studies are needed on how to extend these prototypes into systems supporting real-world appli ations for ee tive retrieval of information and dis overy of hidden knowledge on the Semanti Web. For instan e, to guarantee s alability, inferen e engines availableforquerying theSemanti Webandgraph theorybasedalgorithms usedtoexploreasso iationsbetweenobje tsontheSemanti Webmayhave to bere onsidered.

Paper2enumerates other hallenges forinformation integrationsystems for the life s ien es. To allow users to view and spe ify dierent types of information, more powerful modules for supporting intera tion between the usersandinformationintegrationsystemsareneeded. Also,theneedfor fur-ther resear h on how to resolve semanti heterogeneity is emphasized. For instan e, theavailable approa hes, like theontology-based dataintegration approa h proposed in paper 4, ould be tested in the ontext of the real Semanti Web. Also,paper2 statestheneed fortools supportingthe devel-opmentandmaintenan e ofinformation integrationsystems. Su h tools are essential to ope withthes ale anddynami s of thelifes ien es.

6.2 Integration of ontologies

Alignmentandmergingofontologiesisanimportantresear htopi andnew systems and strategies for ontology alignment should be developed. More studiesareneededthatexplorewhi hstrategiesworkwellforwhi htypesof ontologiesandasystemasKitAMO[LT06 ℄ anprovideagoodenvironment to perform these studies. In the future we will see an in rease of available alignments between ontologies. This will provide a type of ontologi al in-formation that an be used in, for instan e, data integration as dis ussed in paper 3. Further, there areeorts to promote interoperability of ontolo-gies, su h as theOBOFoundry where it is required thatthe ontologies use relations whi h are unambiguously dened following the pattern of deni-tions dened in the OBO Relation Ontology [SCK05 ℄. The results of su h eortswillprovideinformationthatshouldbetaken into a ountduringthe alignment pro ess.

There are a number of issues related to thealgorithms in paper 4 that wouldbeinteresting tofurtherinvestigate. Alimitation ofouralgorithmsis thatabstra tsofresear harti lesareonly lassiedtoone on ept. Wewant to extendourstrategiesbyallowingabstra tstobe lassiedto0,1ormore

(28)

Regardingthestru turetheontologiesinthe urrentexperimentsare reason-ablysimpletaxonomies. Wewanttoinvestigatewhetherthestru ture-based strategies lead to similar results for other types of ontologies. Further, our mat hers ould beenhan edto use synonymsand domain knowledge.

6.3 Grouping of biologi al data entries

Similarity-based grouping of data entries is not a trivial task. In order to nd themost suitable grouping strategies for given tasks, tools areneeded to supportthe evaluation and omparison of dierent grouping pro edures. An example of su h system isKitEGA (paper6). We intendto extend the urrentKitEGAimplementation inseveralways. Wewillextendthesystem to fully omply with our framework. Further, we will provide a number of libraries for omponentsthatare ommon. This ould in lude,for instan e, dierentevaluationmeasuresorgroupingmethods. WewillalsouseKitEGA for studies indataintegration.

Referen es

[BA03℄ Bolshakova N, Azuaje F. Cluster validation te hniques for genome expressiondata. Signal Pro essing, 83:825-833, 2003.

[BAC05℄ BolshakovaN,AzuajeF,CunninghamP.Anintegratedtoolfor mi- roarraydata lusteringand lustervalidityassessment.Bioinformati s, 21(4):451-455,2005.

[CGG03℄ Collins F, Green E, Guttma her A, Guyer M. A Vision for the Futureof Genomi sResear h.Nature,422:835-847, 2003.

[CRF03℄ CohenW,RavikumarP,FienbergS.A omparisonofstringmetri s for mat hing names and re ords. Pro eedings of the KDD Workshop on Data Cleaning and Obje t Consolidation,2003.

[CTL06℄ ChenB, Tan H, Lambrix P. Stru ture-based ltering for ontology alignment. Pro eedings of the IEEE WETICE Workshop on Semanti Te hnologies in Collaborative Appli ations, 2006.

[CYS05℄ Cheung KH, Yip KY, Smith A, Deknikker R,Masiar A,Gerstein M. YeastHub: a semanti web use ase for integrating data in the life

(29)

[DD03℄ Datta S, Datta S. Comparisons and validation of statisti al lus-tering te hniques for mi roarray gene expression data. Bioinformati s, 19(4):459-466,2003.

[DOB95℄ Davidson S, Overton C, Buneman P. Challenges in Integrating Biologi alDataSour es.JournalofComputationalBiology,2(4):557-572, 1995.

[Gar05℄ Gardner SP. Ontologies and semanti data integration. Drug Dis- overy Today,10(14):1001-1007, 2005.

[GSN01℄ GobleCA, StevensR,Ng G,Be hhoferS,PatonN,BakerP,Peim M, Brass A. Transparent a ess to multiple bioinformati s information sour es.IBM SystemsJournal, 40(2), 2001.

[GSS03℄ Gat-Viks I, Sharan R, Shamir R. S oring lustering solutions by their biologi alrelevan e. Bioinformati s,19(18):2381-2389, 2003.

[HK04℄ Hernandez T, Kambhampati S. Integration of biologi al sour es: Current systems and hallenges. ACM SIGMOD Re ord, 33(3):51-60, 2004.

[Jak05℄ Jakoniene V. A Study in Integrating Multiple Biologi al Data Sour es.Li entiate thesisNo1149,Linköpingsuniversitet,Sweden,2005.

[Kar96℄ KarpP.Astrategy fordatabase interoperation. Journal of Compu-tational Biology,2(4):573-586, 1996.

[KBB04℄ KarasavvasKA, Baldo kR,Burger A.Bioinformati s integration andagent te hnology.Journal of Biomedi al Informati s,37(3):205-219, 2004.

[KPL03℄ Köhler J, Philippi S, Lange M. SEMEDA: ontology based seman-ti integrationofbiologi aldatabases.Bioinformati s,19(18):2420-2427, 2003.

[Lam04℄ Lambrix P. Ontologies in Bioinformati s and Systems Biology. Chapter8inDubitzkyW,AzuajeF(eds)Arti ialIntelligen e Methods andTools for SystemsBiology, Springer,pp 129-146, 2004.

(30)

Media-Crit hlow T (eds) Bioinformati s: Managing S ienti Data, Morgan Kaufmann Publishers,pp 335-370, 2003.

[LMN04℄ La roix Z, Murthy H, Naumann F, Ras hid L. Links and Paths through Life S ien e Data Sour es. Pro eedings of the International Workshop on Data Integration in the Life S ien es, pp 203-211, 2004. LNCS2994.

[LT05℄ LambrixP,TanH.AFrameworkforAligningOntologies.Pro eedings ofthe Workshopon Prin iplesandPra ti e of Semanti WebReasoning, pp17-31, 2005. LNCS3703.

[LT06a℄ Lambrix P, Tan H. SAMBO - A System for Aligning and Merg-ing Biomedi al Ontologies. Journal of Web Semanti s, Spe ial issue on Semanti Webfor the Life S ien es,2006.

[LT06b℄ Lambrix P, Tan H. Ontology alignment and merging. Chapter in Burger A,Davidson D, Baldo kR (eds) Anatomy Ontologies for Bioin-formati s: Prin iples and Pra ti e, Springer,2006. To appear.

[LT06 ℄ Lambrix P, Tan H. A Tool for Evaluating Ontology Alignment Strategies.Journal on Data Semanti s, VIII, 2006.Toappear.

[LTJ06℄ Lambrix P, Tan H, Jakoniene V, Strömbä k L. Biologi al Ontolo-gies. Chapter inBaker CJO, Cheung KH (eds) Semanti Web: Revolu-tionizing Knowledge Dis overy in the Life S ien es, Springer, 2006. To appear.

[MIN04℄ Martin-San hez F, Iakovidis I, Norager S, Maojo V, de Groen P, VanderLeiJ,JonesT,Abraham-Fu hsK,ApweilerR,Babi A,BaudR, BretonV,Cinquin P,Doupi P,DugasM, Eils R,Engelbre ht R,Ghazal P, Jehenson P, Kulikowski C, Lampe K, DeMoor G, Orphanoudakis S, RossingN,Sara hanB,SousaA,SpekowiusG, ThireosG,ZahlmannG, ZvarovaJ,HermosillaI,Vi enteF.Synergybetweenmedi alinformati s andbioinformati s: fa ilitatinggenomi medi inefor futurehealth are. Journal of Biomedi al Informati s, 37:30-42, 2004.

[Muk05℄ Mukherjea S.Information retrievalandknowledgedis overy utilis-ingabiomedi alSemanti Web.BriengsinBioinformati s,6(3):252-62,

(31)

[MW02℄ Mitra P, Wiederhold G. Resolving terminologi al heterogeneity in ontologies.Pro eedingsofthe ECAIWorkshoponOntologies and Seman-ti Interoperability,2002.

[MWL03℄ Miled ZB, Webster YW, Liu Y, Li N. An Ontology for Seman-ti Integration of Life S ien e Web Databases. International Journal of Cooperative Information Systems, 12(2):275-294,2003.

[NAR℄ NAR.Nu lei A ids Resear h.http://nar.oupjournals.org

[NM01℄ NoyN,MusenM. An hor-PROMPT: UsingNon-Lo al Context for Semanti Mat hing. Pro eedings of the IJCAI Workshop on Ontologies andInformation Sharing,pp63-70, 2001.

[OBO℄ OBO. Open Biomedi al Ontologies.http://obo.sour eforge.net/

[PBZ06℄ Preli¢A,BleulerS,ZimmermannPh,WilleA,BühlmannP, Gruis-sem W, Hennig L, Thiele L, Zitzler E. A systemati omparison and evaluation of bi lustering methods for gene expression. Bioinformati s, 22(9):1122-1129, 2006.

[PPF02℄ Prasad S, Peng Y, Finin T. Using Expli it Information To Map Between Two Ontologies. Pro eedings of the AAMAS Workshop on On-tologies in Agent Systems,2002.

[SCK05℄ Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall CJ, Neuhaus F, Re tor A, Rosse C. Relations in Biomedi al Ontologies.Genome Biology, 6(5):R46, 2005.

[SLD06℄ Stephens S, LaVigna D, DiLas io M, Lu iano J. Aggregations of Bioinformati s Data Using Semanti Web Te hnology. Journal Web Se-manti s,4(3), 2006.

[SM01℄ StummeG, Mäd he A. FCA-Merge: Bottom-up mergingof ontolo-gies.Pro eedings ofthe International Joint Conferen es on Arti ial In-telligen e, pp225-230,2001.

[SRG03℄ StevensRD,RobinsonAJ,GobleCA.MyGrid: personalised bioin-formati sonthe information. Bioinformati s,19(1):i302-i304, 2003.

[TRM05℄ TriÿlS,RotherK,MüllerH,SteinkeT,Ko hI,PreissnerR,F röm-mel C, Leser U. Columba: An Integrated Database of Proteins,

(32)

Stru -[VS05℄ Vyas H, Summers R. Interoperability of bioinformati s resour es. VINE: The journal of informationand knowledge management systems, 35(3):132-139,2005.

[WL02℄ WilkinsonMD,LinksM. BioMOBY:anopensour e biologi alweb servi esproposal.Briengs in Bioinformati s,3(4):331-41, 2002.

[Won02℄ Wong L.Te hnologies for integratingBiologi al Data. Briengs in Bioinformati s,3(4):389-404,2002.

(33)
(34)

ex-Department of Computer and Information Science

Linköpings universitet

Dissertations

Linköping Studies in Science and Technology

No 14

Anders Haraldsson: A Program Manipulation

System Based on Partial Evaluation, 1977, ISBN

91-7372-144-1.

No 17

Bengt Magnhagen: Probability Based Verification

of Time Margins in Digital Designs, 1977, ISBN

91-7372-157-3.

No 18

Mats Cedwall: Semantisk analys av

process-beskrivningar i naturligt språk, 1977, ISBN

91-7372-168-9.

No 22

Jaak Urmi: A Machine Independent LISP

Compil-er and its Implications for Ideal Hardware, 1978,

ISBN 91-7372-188-3.

No 33

Tore Risch: Compilation of Multiple File Queries

in a Meta-Database System 1978, ISBN

91-7372-232-4.

No 51

Erland Jungert: Synthesizing Database Structures

from a User Oriented Data Model, 1980, ISBN

91-7372-387-8.

No 54

Sture Hägglund: Contributions to the

Develop-ment of Methods and Tools for Interactive Design

of Applications Software, 1980, ISBN

91-7372-404-1.

No 55

Pär Emanuelson: Performance Enhancement in a

Well-Structured Pattern Matcher through Partial

Evaluation, 1980, ISBN 91-7372-403-3.

No 58

Bengt Johnsson, Bertil Andersson: The

Human-Computer Interface in Commercial Systems, 1981,

ISBN 91-7372-414-9.

No 69

H. Jan Komorowski: A Specification of an

Ab-stract Prolog Machine and its Application to Partial

Evaluation, 1981, ISBN 91-7372-479-3.

No 71

René Reboh: Knowledge Engineering Techniques

and Tools for Expert Systems, 1981, ISBN

91-7372-489-0.

No 77

Östen Oskarsson: Mechanisms of Modifiability in

large Software Systems, 1982, ISBN

91-7372-527-7.

No 94

Hans Lunell: Code Generator Writing Systems,

1983, ISBN 91-7372-652-4.

No 97

Andrzej Lingas: Advances in Minimum Weight

Triangulation, 1983, ISBN 91-7372-660-5.

No 109

Peter Fritzson: Towards a Distributed

Program-ming Environment based on Incremental

Compila-tion,1984, ISBN 91-7372-801-2.

No 111

Erik Tengvald: The Design of Expert Planning

Systems. An Experimental Operations Planning

System for Turning, 1984, ISBN 91-7372-805-5.

No 155

Christos Levcopoulos: Heuristics for Minimum

Decompositions of Polygons, 1987, ISBN

91-7870-133-3.

No 165

James W. Goodwin: A Theory and System for

Non-Monotonic Reasoning, 1987, ISBN

91-7870-183-X.

No 170

Zebo Peng: A Formal Methodology for Automated

Synthesis of VLSI Systems, 1987, ISBN

91-7870-225-9.

No 174

Johan Fagerström: A Paradigm and System for

Design of Distributed Systems, 1988, ISBN

91-7870-301-8.

No 192

Dimiter Driankov: Towards a Many Valued Logic

of Quantified Belief, 1988, ISBN 91-7870-374-3.

No 213

Lin Padgham: Non-Monotonic Inheritance for an

Object Oriented Knowledge Base, 1989, ISBN

91-7870-485-5.

No 214

Tony Larsson: A Formal Hardware Description and

Verification Method, 1989, ISBN 91-7870-517-7.

No 221

Michael Reinfrank: Fundamentals and Logical

Foundations of Truth Maintenance, 1989, ISBN

91-7870-546-0.

No 239

Jonas Löwgren: Knowledge-Based Design Support

and Discourse Management in User Interface

Man-agement Systems, 1991, ISBN 91-7870-720-X.

No 244

Henrik Eriksson: Meta-Tool Support for

Knowl-edge Acquisition, 1991, ISBN 91-7870-746-3.

No 252

Peter Eklund: An Epistemic Approach to

Interac-tive Design in Multiple Inheritance

Hierar-chies,1991, ISBN 91-7870-784-6.

No 258

Patrick Doherty: NML3 - A Non-Monotonic

For-malism with Explicit Defaults, 1991, ISBN

91-7870-816-8.

No 260

Nahid Shahmehri: Generalized Algorithmic

De-bugging, 1991, ISBN 91-7870-828-1.

No 264

Nils Dahlbäck: Representation of

Discourse-Cog-nitive and Computational Aspects, 1992, ISBN

91-7870-850-8.

No 265

Ulf Nilsson: Abstract Interpretations and Abstract

Machines: Contributions to a Methodology for the

Implementation of Logic Programs, 1992, ISBN

91-7870-858-3.

No 270

Ralph Rönnquist: Theory and Practice of

Tense-bound Object References, 1992, ISBN

91-7870-873-7.

No 273

Björn Fjellborg: Pipeline Extraction for VLSI Data

Path Synthesis, 1992, ISBN 91-7870-880-X.

No 276

Staffan Bonnier: A Formal Basis for Horn Clause

Logic with External Polymorphic Functions, 1992,

ISBN 91-7870-896-6.

No 277

Kristian Sandahl: Developing Knowledge

Man-agement Systems with an Active Expert

Methodolo-gy, 1992, ISBN 91-7870-897-4.

(35)

of Reasoning about Plans, 1992, ISBN

91-7870-979-2.

No 292

Mats Wirén: Studies in Incremental Natural

Lan-guage Analysis, 1992, ISBN 91-7871-027-8.

No 297

Mariam Kamkar: Interprocedural Dynamic

Slic-ing with Applications to DebuggSlic-ing and TestSlic-ing,

1993, ISBN 91-7871-065-0.

No 302

Tingting Zhang: A Study in Diagnosis Using

Clas-sification and Defaults, 1993, ISBN 91-7871-078-2.

No 312

Arne Jönsson: Dialogue Management for Natural

Language Interfaces - An Empirical Approach,

1993, ISBN 91-7871-110-X.

No 338

Simin Nadjm-Tehrani: Reactive Systems in

Phys-ical Environments: Compositional Modelling and

Framework for Verification, 1994, ISBN

91-7871-237-8.

No 371

Bengt Savén: Business Models for Decision

Sup-port and Learning. A Study of Discrete-Event

Man-ufacturing Simulation at Asea/ABB 1968-1993,

1995, ISBN 91-7871-494-X.

No 375

Ulf Söderman: Conceptual Modelling of Mode

Switching Physical Systems, 1995, ISBN

91-7871-516-4.

No 383

Andreas Kågedal: Exploiting Groundness in

Log-ic Programs, 1995, ISBN 91-7871-538-5.

No 396

George Fodor: Ontological Control, Description,

Identification and Recovery from Problematic

Con-trol Situations, 1995, ISBN 91-7871-603-9.

No 413

Mikael Pettersson: Compiling Natural Semantics,

1995, ISBN 91-7871-641-1.

No 414

Xinli Gu: RT Level Testability Improvement by

Testability Analysis and Transformations, 1996,

ISBN 91-7871-654-3.

No 416

Hua Shu: Distributed Default Reasoning, 1996,

ISBN 91-7871-665-9.

No 429

Jaime Villegas: Simulation Supported Industrial

Training from an Organisational Learning

Perspec-tive - Development and Evaluation of the SSIT

Method, 1996, ISBN 91-7871-700-0.

No 431

Peter Jonsson: Studies in Action Planning:

Algo-rithms and Complexity, 1996, ISBN

91-7871-704-3.

No 437

Johan Boye: Directional Types in Logic

Program-ming, 1996, ISBN 91-7871-725-6.

No 439

Cecilia Sjöberg: Activities, Voices and Arenas:

Participatory Design in Practice, 1996, ISBN

91-7871-728-0.

No 448

Patrick Lambrix: Part-Whole Reasoning in

De-scription Logics, 1996, ISBN 91-7871-820-1.

No 452

Kjell Orsborn: On Extensible and

Object-Rela-tional Database Technology for Finite Element

Analysis Applications, 1996, ISBN 91-7871-827-9.

No 459

Olof Johansson: Development Environments for

Complex Product Models, 1996, ISBN

91-7871-855-4.

No 461

Lena Strömbäck: User-Defined Constructions in

Unification-Based Formalisms,1997, ISBN

91-7871-857-0.

No 462

Lars Degerstedt: Tabulation-based Logic

Program-ming: A Multi-Level View of Query Answering,

1996, ISBN 91-7871-858-9.

No 475

Fredrik Nilsson: Strategi och ekonomisk styrning

-En studie av hur ekonomiska styrsystem utformas

och används efter företagsförvärv, 1997, ISBN

91-7871-914-3.

No 480

Mikael Lindvall: An Empirical Study of

Require-ments-Driven Impact Analysis in Object-Oriented

Software Evolution, 1997, ISBN 91-7871-927-5.

No 485

Göran Forslund: Opinion-Based Systems: The

Co-operative Perspective on Knowledge-Based

Deci-sion Support, 1997, ISBN 91-7871-938-0.

No 494

Martin Sköld: Active Database Management

Sys-tems for Monitoring and Control, 1997, ISBN

91-7219-002-7.

No 495

Hans Olsén: Automatic Verification of Petri Nets in

a CLP framework, 1997, ISBN 91-7219-011-6.

No 498

Thomas Drakengren: Algorithms and Complexity

for Temporal and Spatial Formalisms, 1997, ISBN

91-7219-019-1.

No 502

Jakob Axelsson: Analysis and Synthesis of

Hetero-geneous Real-Time Systems, 1997, ISBN

91-7219-035-3.

No 503

Johan Ringström: Compiler Generation for

Data-Parallel Programming Langugaes from Two-Level

Semantics Specifications, 1997, ISBN

91-7219-045-0.

No 512

Anna Moberg: Närhet och distans - Studier av

kommunikationsmmönster i satellitkontor och

flexi-bla kontor, 1997, ISBN 91-7219-119-8.

No 520

Mikael Ronström: Design and Modelling of a

Par-allel Data Server for Telecom Applications, 1998,

ISBN 91-7219-169-4.

No 522

Niclas Ohlsson: Towards Effective Fault

Prevention - An Empirical Study in Software

Engi-neering, 1998, ISBN 91-7219-176-7.

No 526

Joachim Karlsson: A Systematic Approach for

Pri-oritizing Software Requirements, 1998, ISBN

91-7219-184-8.

No 530

Henrik Nilsson: Declarative Debugging for Lazy

Functional Languages, 1998, ISBN 91-7219-197-x.

No 555

Jonas Hallberg: Timing Issues in High-Level

Syn-thesis,1998, ISBN 91-7219-369-7.

No 561

Ling Lin: Management of 1D Sequence Data

-From Discrete to Continuous, 1999, ISBN

91-7219-402-2.

No 563

Eva L Ragnemalm: Student Modelling based on

Collaborative Dialogue with a Learning

Compan-ion, 1999, ISBN 91-7219-412-X.

No 567

Jörgen Lindström: Does Distance matter? On

geo-graphical dispersion in organisations, 1999, ISBN

91-7219-439-1.

(36)

Evaluation of a Distributed Mediator System for

Data Integration, 1999, ISBN 91-7219-482-0.

No 589

Rita Kovordányi: Modeling and Simulating

Inhib-itory Mechanisms in Mental Image Reinterpretation

- Towards Cooperative Human-Computer

Creativi-ty, 1999, ISBN 91-7219-506-1.

No 592 Mikael Ericsson: Supporting the Use of Design

Knowledge - An Assessment of Commenting

Agents, 1999, ISBN 91-7219-532-0.

No 593

Lars Karlsson: Actions, Interactions and

Narra-tives, 1999, ISBN 91-7219-534-7.

No 594

C. G. Mikael Johansson: Social and

Organizational Aspects of Requirements Engineering Methods

-A practice-oriented approach, 1999, ISBN

91-7219-541-X.

No 595

Jörgen Hansson: Value-Driven Multi-Class

Over-load Management in Real-Time Database Systems,

1999, ISBN 91-7219-542-8.

No 596

Niklas Hallberg: Incorporating User Values in the

Design of Information Systems and Services in the

Public Sector: A Methods Approach, 1999, ISBN

91-7219-543-6.

No 597

Vivian Vimarlund: An Economic Perspective on

the Analysis of Impacts of Information Technology:

From Case Studies in Health-Care towards General

Models and Theories, 1999, ISBN 91-7219-544-4.

No 598

Johan Jenvald: Methods and Tools in

Computer-Supported Taskforce Training, 1999, ISBN

91-7219-547-9.

No 607

Magnus Merkel: Understanding and enhancing

translation by parallel text processing, 1999, ISBN

91-7219-614-9.

No 611

Silvia Coradeschi: Anchoring symbols to sensory

data, 1999, ISBN 91-7219-623-8.

No 613

Man Lin: Analysis and Synthesis of Reactive

Systems: A Generic Layered Architecture

Perspective, 1999, ISBN 91-7219-630-0.

No 618

Jimmy Tjäder: Systemimplementering i praktiken

- En studie av logiker i fyra projekt, 1999, ISBN

91-7219-657-2.

No 627

Vadim Engelson: Tools for Design, Interactive

Simulation, and Visualization of Object-Oriented

Models in Scientific Computing, 2000, ISBN

91-7219-709-9.

No 637

Esa Falkenroth: Database Technology for Control

and Simulation, 2000, ISBN 91-7219-766-8.

No 639

Per-Arne Persson: Bringing Power and

Knowledge Together: Information Systems Design

for Autonomy and Control in Command Work,

2000, ISBN 91-7219-796-X.

No 660

Erik Larsson: An Integrated System-Level Design

for Testability Methodology, 2000, ISBN

91-7219-890-7.

No 688

Marcus Bjäreland: Model-based Execution

Monitoring, 2001, ISBN 91-7373-016-5.

No 689

Joakim Gustafsson: Extending Temporal Action

Logic, 2001, ISBN 91-7373-017-3.

No 720

Carl-Johan Petri: Organizational Information

Pro-vision - Managing Mandatory and Discretionary Use

of Information Technology, 2001,

ISBN-91-7373-126-9.

No 724

Paul Scerri: Designing Agents for Systems with

Adjustable Autonomy, 2001, ISBN 91 7373 207 9.

No 725

Tim Heyer: Semantic Inspection of Software

Arti-facts: From Theory to Practice, 2001, ISBN 91 7373

208 7.

No 726

Pär Carlshamre: A Usability Perspective on

Re-quirements Engineering - From Methodology to

Product Development, 2001, ISBN 91 7373 212 5.

No 732

Juha Takkinen: From Information Management to

Task Management in Electronic Mail, 2002, ISBN

91 7373 258 3.

No 745

Johan Åberg: Live Help Systems: An Approach to

Intelligent Help for Web Information Systems,

2002, ISBN 91-7373-311-3.

No 746

Rego Granlund: Monitoring Distributed

Team-work Training, 2002, ISBN 91-7373-312-1.

No 757

Henrik André-Jönsson: Indexing Strategies for

Time Series Data, 2002, ISBN 917373-346-6.

No 747 Anneli Hagdahl: Development of IT-suppor-ted

In-ter-organisational Collaboration - A Case Study in

the Swedish Public Sector, 2002, ISBN

91-7373-314-8.

No 749

Sofie Pilemalm: Information Technology for

Non-Profit Organisations - Extended Participatory

De-sign of an Information System for Trade Union Shop

Stewards, 2002, ISBN

91-7373-318-0.

No 765

Stefan Holmlid: Adapting users: Towards a theory

of use quality, 2002, ISBN 91-7373-397-0.

No 771

Magnus Morin: Multimedia Representations of

Distributed Tactical Operations, 2002, ISBN

91-7373-421-7.

No 772

Pawel Pietrzak: A Type-Based Framework for

Lo-cating Errors in Constraint Logic Programs, 2002,

ISBN 91-7373-422-5.

No 758

Erik Berglund: Library Communication Among

Programmers Worldwide, 2002,

ISBN 91-7373-349-0.

No 774

Choong-ho Yi: Modelling Object-Oriented

Dynamic Systems Using a Logic-Based Framework,

2002, ISBN 91-7373-424-1.

No 779

Mathias Broxvall: A Study in the

Computational Complexity of Temporal

Reasoning, 2002, ISBN 91-7373-440-3.

No 793

Asmus Pandikow: A Generic Principle for

Enabling Interoperability of Structured and

Object-Oriented Analysis and Design Tools, 2002,

ISBN 91-7373-479-9.

No 785

Lars Hult: Publika Informationstjänster. En studie

av den Internetbaserade encyklopedins

bruksegen-skaper, 2003, ISBN 91-7373-461-6.

No 800

Lars Taxén: A Framework for the Coordination of

Complex Systems´ Development, 2003, ISBN

91-7373-604-X

No 808

Klas Gäre: Tre perspektiv på förväntningar och

(37)

informa-tionsystem, 2003, ISBN 91-7373-618-X.

No 821

Mikael Kindborg: Concurrent Comics -

program-ming of social agents by children, 2003,

ISBN 91-7373-651-1.

No 823

Christina Ölvingson: On Development of

Infor-mation Systems with GIS Functionality in Public

Health Informatics: A Requirements Engineering

Approach, 2003, ISBN 91-7373-656-2.

No 828

Tobias Ritzau: Memory Efficient Hard Real-Time

Garbage Collection, 2003, ISBN 91-7373-666-X.

No 833

Paul Pop: Analysis and Synthesis of

Communication-Intensive Heterogeneous

Real-Time Systems, 2003, ISBN 91-7373-683-X.

No 852

Johan Moe: Observing the Dynamic

Behaviour of Large Distributed Systems to Improve

Development and Testing - An Emperical Study in

Software Engineering, 2003, ISBN 91-7373-779-8.

No 867

Erik Herzog: An Approach to Systems

Engineer-ing Tool Data Representation and Exchange, 2004,

ISBN 91-7373-929-4.

No 872

Aseel Berglund: Augmenting the Remote Control:

Studies in Complex Information Navigation for

Digital TV, 2004, ISBN 91-7373-940-5.

No 869

Jo Skåmedal: Telecommuting’s Implications on

Travel and Travel Patterns, 2004, ISBN

91-7373-935-9.

No 870

Linda Askenäs: The Roles of IT - Studies of

Or-ganising when Implementing and Using Enterprise

Systems, 2004, ISBN 91-7373-936-7.

No 874

Annika Flycht-Eriksson: Design and Use of

On-tologies in Information-Providing Dialogue

Sys-tems, 2004, ISBN 91-7373-947-2.

No 873

Peter Bunus: Debugging Techniques for

Equation-Based Languages, 2004, ISBN 91-7373-941-3.

No 876

Jonas Mellin: Resource-Predictable and Efficient

Monitoring of Events, 2004, ISBN 91-7373-956-1.

No 883

Magnus Bång: Computing at the Speed of Paper:

Ubiquitous Computing Environments for

Health-care Professionals, 2004, ISBN 91-7373-971-5

No 882

Robert Eklund: Disfluency in Swedish

human-human and human-machine travel booking

dialogues, 2004. ISBN 91-7373-966-9.

No 887

Anders Lindström: English and other Foreign

Lin-quistic Elements in Spoken Swedish. Studies of

Productive Processes and their Modelling using

Fi-nite-State Tools, 2004, ISBN 91-7373-981-2.

No 889

Zhiping Wang: Capacity-Constrained

Production-inventory systems - Modellling and Analysis in

both a traditional and an e-business context, 2004,

ISBN 91-85295-08-6.

No 893

Pernilla Qvarfordt: Eyes on Multimodal

Interac-tion, 2004, ISBN 91-85295-30-2.

No 910

Magnus Kald: In the Borderland between Strategy

and Management Control - Theoretical Framework

and Empirical Evidence, 2004, ISBN

91-85295-82-5.

No 918

Jonas Lundberg: Shaping Electronic News: Genre

Perspectives on Interaction Design, 2004, ISBN

91-85297-14-3.

No 900

Mattias Arvola: Shades of use: The dynamics of

interaction design for sociable use, 2004, ISBN

91-85295-42-6.

No 920

Luis Alejandro Cortés: Verification and

Schedul-ing Techniques for Real-Time Embedded Systems,

2004, ISBN 91-85297-21-6.

No 929

Diana Szentivanyi: Performance Studies of

Fault-Tolerant Middleware, 2005, ISBN 91-85297-58-5.

No 933

Mikael Cäker: Management Accounting as

Con-structing and Opposing Customer Focus: Three Case

Studies on Management Accounting and Customer

Relations, 2005, ISBN 91-85297-64-X.

No 937

Jonas Kvarnström: TALplanner and Other

Exten-sions to Temporal Action Logic, 2005, ISBN

91-85297-75-5.

No 938 Bourhane Kadmiry: Fuzzy Gain-Scheduled Visual

Servoing for Unmanned Helicopter, 2005, ISBN

91-85297-76-3.

No 945

Gert Jervan: Hybrid Built-In Self-Test and Test

Generation Techniques for Digital Systems, 2005,

ISBN: 91-85297-97-6.

No 946

Anders Arpteg: Intelligent Semi-Structured

Infor-mation Extraction, 2005, ISBN 91-85297-98-4.

No 947 Ola Angelsmark: Constructing Algorithms for

Constraint Satisfaction and Related Problems

-Methods and Applications, 2005, ISBN

91-85297-99-2.

No 963

Calin Curescu: Utility-based Optimisation of

Re-source Allocation for Wireless Networks, 2005.

ISBN 91-85457-07-8.

No 972

Björn Johansson: Joint Control in Dynamic

Situa-tions, 2005, ISBN 91-85457-31-0.

No 974 Dan Lawesson: An Approach to Diagnosability

Analysis for Interacting Finite State Systems, 2005,

ISBN 91-85457-39-6.

No 979

Claudiu Duma: Security and Trust Mechanisms for

Groups in Distributed Services, 2005, ISBN

91-85457-54-X.

No 983

Sorin Manolache: Analysis and Optimisation of

Real-Time Systems with Stochastic Behaviour,

2005, ISBN 91-85457-60-4.

No 986

Yuxiao Zhao: Standards-Based Application

Inte-gration for Business-to-Business Communications,

2005, ISBN 91-85457-66-3.

No 1004 Patrik Haslum: Admissible Heuristics for

Auto-mated Planning, 2006, ISBN 91-85497-28-2.

No 1005 Aleksandra Tesanovic: Developing

Re-usable and Reconfigurable Real-Time Software

us-ing Aspects and Components, 2006, ISBN

91-85497-29-0.

No 1008 David Dinka: Role, Identity and Work: Extending

the design and development agenda, 2006, ISBN

91-85497-42-8.

No 1009 Iakov Nakhimovski: Contributions to the Modeling

and Simulation of Mechanical Systems with

De-tailed Contact Analysis, 2006, ISBN

91-85497-43-X.

No 1013 Wilhelm Dahllöf: Exact Algorithms for Exact

Sat-isfiability Problems, 2006, ISBN 91-85523-97-6.

No 1016 Levon Saldamli: PDEModelica - A High-Level

Language for Modeling with Partial Differential

Equations, 2006, ISBN 91-85523-84-4.

No 1017 Daniel Karlsson: Verification of Component-based

Embedded System Designs, 2006, ISBN

91-85523-79-8.

(38)

No 1018 Ioan Chisalita: Communication and Networking

Techniques for Traffic Safety Systems, 2006, ISBN

91-85523-77-1.

No 1019 Tarja Susi: The Puzzle of Social Activity - The

Significance of Tools in Cognition and

Coopera-tion, 2006, ISBN 91-85523-71-2.

No 1021 Andrzej Bednarski: Integrated Optimal Code

Generation for Digital Signal Processors, 2006,

ISBN 91-85523-69-0.

No 1022 Peter Aronsson: Automatic Parallelization of

Equation-Based Simulation Programs, 2006, ISBN

91-85523-68-2.

No 1023 Sonia Sangari: Some Visual Correlates to Focal

Accent in Swedish, 2006, ISBN 91-85523-67-4.

No 1035 Vaida Jakoniene: Integration of Biological Data,

2006, ISBN 91-85523-28-3.

Linköping Studies in Information Science

No 1

Karin Axelsson: Metodisk systemstrukturering- att

skapa samstämmighet mellan

informa-tionssyste-markitektur och verksamhet, 1998.

ISBN-9172-19-296-8.

No 2

Stefan Cronholm: Metodverktyg och

användbar-het - en studie av datorstödd metodbaserad

syste-mutveckling, 1998. ISBN-9172-19-299-2.

No 3

Anders Avdic: Användare och utvecklare - om

an-veckling med kalkylprogram, 1999.

ISBN-91-7219-606-8.

No 4

Owen Eriksson: Kommunikationskvalitet hos

in-formationssystem och affärsprocesser, 2000. ISBN

91-7219-811-7.

No 5

Mikael Lind: Från system till process - kriterier för

processbestämning vid verksamhetsanalys, 2001,

ISBN 91-7373-067-X

No 6

Ulf Melin: Koordination och informationssystem i

företag och nätverk, 2002, ISBN 91-7373-278-8.

No 7

Pär J. Ågerfalk: Information Systems Actability

-Understanding Information Technology as a Tool

for Business Action and Communication, 2003,

ISBN 91-7373-628-7.

No 8

Ulf Seigerroth: Att förstå och förändra

systemutvecklingsverksamheter - en taxonomi

för metautveckling, 2003, ISBN91-7373-736-4.

No 9

Karin Hedström: Spår av datoriseringens värden

-Effekter av IT i äldreomsorg, 2004, ISBN

91-7373-963-4.

No 10

Ewa Braf: Knowledge Demanded for Action

-Studies on Knowledge Mediation in Organisations,

2004, ISBN 91-85295-47-7.

No 11

Fredrik Karlsson: Method Configuration

-method and computerized tool support, 2005, ISBN

91-85297-48-8.

No 12

Malin Nordström: Styrbar systemförvaltning - Att

organisera systemförvaltningsverksamhet med

hjälp av effektiva förvaltningsobjekt, 2005, ISBN

91-85297-60-7.

No 13

Stefan Holgersson: Yrke: POLIS - Yrkeskunskap,

motivation, IT-system och andra förutsättningar för

polisarbete, 2005, ISBN 91-85299-43-X.

References

Related documents

Sensitive data: Data is the most import issue to execute organizations processes in an effective way. Data can only make or break the future of any

Search terms that was used were for example big data and financial market, machine learning, as well as Computational Archival Science..

Is it one thing? Even if you don’t have data, simply looking at life for things that could be analyzed with tools you learn if you did have the data is increasing your ability

Part of R&D project “Infrastructure in 3D” in cooperation between Innovation Norway, Trafikverket and

The methodology can be thus summarized in three macro-steps: first, we compare the goodness of satellite observations against the ground-based ones by computing several continuous

When PCA was applied to the dataset of globally varying basal activity values however (section ”Global Parameter Variation: Basal Activity (β)”), the analysis of sensitivity

The integration mechanism showed to be useful in the small environment used in the case study, by using it as an addon to Thesis Genealogy. The case study showed to be successful

Rimmels poängterande om att information on equal terms is the foundation stone of the market kan anses ytterligare öka förståelsen för varför bolagen i stor utsträckning