Scalability and Semantic Sustainability in Electronic Health Record Systems

(1)

Scalability and Semantic Sustainability in Electronic Health Record Systems

Linköping studies in science and technology. Dissertations. No. 1499

Erik Sundvall Linköping University

2013

(2)

Scalability and Semantic Sustainability in Electronic Health Record Systems

Author: Erik Sundvall

Supervisors: Ph.D. Håkan Örman, Prof. Magnus Borga and late Prof. Hans Åhlfeldt Institution: Department of Biomedical Engineering, Linköping University

PhD defence: 2013-‐02-‐15, Linköping, Sweden. Opponent: Prof. Dipak Kalra

ISBN: 978-‐91-‐7519-‐699-‐2

Linköping studies in science and technology. Dissertations, No. 1499 ISSN 0345-‐7524

Front cover lighting and photo: Thor Balkhed, bildamedia.se Back cover: Visualization of word frequencies in included papers.

Cover design: Maria & Erik Sundvall kindly assisted by Martin Pettersson at LiU-‐Tryck Make your own word frequency visualizations at www.imt.liu.se/~erisu/2013/phd/cloud/

Errata and resources related to this thesis are linked from www.imt.liu.se/~erisu/2013/phd/

Printed by LiU-‐Tryck, Linköping 2013.

Thank you LiU-‐Tryck for providing the flexibility needed to match reality.

(3)

Abstract

This work is a small contribution to the greater goal of making software systems used in healthcare more useful and sustainable. To come closer to that goal, health record data will need to be more computable and easier to exchange between systems.

Interoperability refers to getting systems to work together and semantics concerns the study of meanings. If Semantic interoperability is achieved then information entered in one information system is usable in other systems and reusable for many purposes.

Scalability refers to the extent to which a system can gracefully grow by adding more resources. Sustainability refers more to how to best use available limited resources. Both aspects are important.

The main focus and aim of the thesis is to increase knowledge about how to support scalability and semantic sustainability. It reports explorations of how to apply aspects of the above to Electronic Health Record (EHR) systems, associated infrastructure, data structures, terminology systems, user interfaces and their mutual boundaries.

Using terminology systems is one way to improve computability and comparability of data. Modern complex ontologies and terminology systems can contain hundreds of thousands of concepts that can have many kinds of relationships to multiple other concepts. This makes visualization challenging. Many visualization approaches designed to show the local neighbourhood of a single concept node do not scale well to larger sets of nodes. The interactive TermViz approach described in this thesis, is designed to aid users to navigate and comprehend the context of several nodes simultaneously. Two applications are presented where TermViz aids management of the boundary between EHR data structures and the terminology system SNOMED CT.

The amount of available time from people skilled in health informatics is limited.

Adequate methods and tools are required to develop, maintain and reuse health-‐IT solutions in a sustainable way. Multiple levels of modelling including a fixed reference model and another layer of flexible reusable ‘archetypes’ for domain specific data structures, is an approach with that aim used in openEHR and the ISO 13606 standard.

This approach, including learning, implementing and managing it, is explored from different angles in this thesis. An architecture applying Representational State Transfer (REST) to archetype-‐based EHR systems, in order to address scalability, is presented.

Combined with archetyping this architecture also aims at enabling a sustainable way of continuously evolving multi-‐vendor EHR solutions. An experimental open source implementation of it, aimed for learning and prototyping, is also presented.

Manually changing database structures used for storage every time new versions of archetypes and associated data structures are needed is likely not a sustainable activity.

Thus storage systems that can handle change with minimal manual interventions are desirable. Initial explorations of performance and scalability in such systems are also reported

Graphical user interfaces focused on EHR navigation, time-‐perspectives and highlighting of EHR content are also presented – illustrating what can be done with computable health record data and the presented approaches.

Desirable aspects of semantic sustainability have been discussed, including:

sustainable use of limited resources (such as available time of skilled people), and reduction of unnecessary risks. A semantic sustainability perspective should be

(4)

inspired and informed by research in complex systems theory, and should also include striving to be highly aware of when and where technical debt is being built up. Semantic sustainability is a shared responsibility.

The combined results presented contribute to increasing knowledge about ways to support scalability and semantic sustainability in the context of electronic health record systems. Supporting tools, architectures and approaches are additional contributions.

Short summary in Swedish:

Långsiktigt hållbar utveckling av semantik och teknik i patientjournalsystem

Syftet med denna avhandling är ytterst att göra informationssystem som används i hälso-‐ och sjukvård, särskilt patientjournaler, mer användbara och lättarbetade. Om systemen vore lättare att utveckla och underhålla skulle fler resurser kunna läggas på att tillföra nya och mer användarvänliga funktioner.

Om journalsystem och datorprogram kan ”begripa” vad olika saker i journalen är och betyder så kan de vara till större hjälp, t.ex. genom att visa bättre patientöversikter och bidra med beslutsstöd. En del i att göra journalinnehållet begripligt och hanterbart för datorer är att använda sig av terminologisystem som t.ex. ICD-‐10 och SNOMED CT. En annan viktig del är datastrukturerna där man stoppar in text, mätvärden, koderna från terminologisystem etc. De flesta journalsystem har någon sorts mallar som

datastrukturer. Projektet openEHR har tagit fram ett sätt att dela specifikationer av datastrukturer mellan olika journalsystem så att man lättare kan dela och återanvända dem och den journaldata som matats in i dem. Dessa specifikationer kallas ”arketyper”

och arketyp-‐metoden beskrivs även i standarden ISO 13606.

Om två olika journalsystem använder samma datastruktur, t.ex. med hjälp av samma arketyper, så kan de utväxla patientdata mellan varandra (de uppnår s.k. semantisk interoperabilitet). Begreppet ”Semantic sustainability” definieras i avhandlingen som ett förhållningssätt som är bredare än semantisk interoperabilitet. Det syftar till att

möjliggöra långsiktigt hållbar utveckling av semantik (betydelse) i journalsystem och genom att hantera risker och resurser förståndigt. Förhållningssättet baserar sig på forskning och erfarenheter från systemutveckling och hantering av komplexa system och är avsett att stödja beslutsfattare, och de som utvecklar och underhåller

journalsystem, relaterade system och strukturer.

För att datorsystem ska kunna växa vid ökad användning ,utan att hamna i

återvändsgränder avseende prestanda, så bör vissa designprinciper för skalbarhet följas.

Avhandlingen presenterar en systemarkitektur baserad på sådana principer och på arketyp-‐metoden. Denna arkitektur gör det möjligt att bygga system med delsystem från flera olika leverantörer. Skalbarheten i några lagringslösningar redovisas också.

Slutligen redovisas prototyper av gränssnitt för patientöversikter och journalläsning.

(5)

Dedication

This thesis is dedicated to the wonderful souls that have supported and believed in me.

My mother always supported my desire to understand and explore the world around me. She lived long enough to see her first grandchild, but cancer took her away before she could see me graduate

as a master of science. She would have loved to be with us this day.

Professor Hans Åhlfeldt who supervised my PhD studies from the start until 2010 when cancer took him away too. He believed in me already as a master thesis student and he welcomed me back to the university as a PhD student after my time in the ‘.com-‐bubble’. He encouraged me, trusted my judgment and often let me explore also the non-‐obvious paths that I found interesting. Many of the

works in this thesis are based on that freedom to explore my own ideas and would not have come into existence without it. Hans did not rush important things that needed time, he provided me a safe haven and allowed my ideas, and me, to develop, grow and mature at a sustainable pace. He allowed me to spend time on things that we together believed could be beneficial contributions to

the world. You should have been here today. I miss you!

My wonderful wife Maria and our fantastic children, Benjamin, Jonathan and Samuel – you are always there to inspire and support me! Right now I want to thank you especially for also supporting me through the ridiculously stressful final study and thesis writing. I love you!

Professor Hans Åhlfeldt

Photographed by Martin Eneling

I want to thank all helpful co-‐authors of publications, collaboration partners and colleagues through the years. I also want to thank those that have helped

in thesis supervision, design, proofreading, printing and other support.

You all contributed in different ways to making this thesis possible!

(6)

Contents

Abstract ... 3

Dedication ... 5

Contents ... 6

1. Introduction ... 9

1.1. Delimitations ... 9

1.2. Electronic Health Records ... 9

1.3. Getting people to agree with each other and with information systems ... 10

1.3.1. Why bother about semantic interoperability? ... 10

1.3.2. Reason #1 -‐ Shared development can save resources ... 11

1.3.3. Reason #2 -‐ Otherwise safe conversions can not always be guaranteed ... 14

1.4. Research Questions ... 16

1.4.1. Semantic Sustainability ... 17

1.4.2. Technical scalability ... 18

1.4.3. Scaling learning, innovation, implementation and dissemination ... 18

2. Background ... 19

2.1. Information visualization ... 19

2.1.2. Visualizations of temporal data ... 20

2.1.3. Information foraging and information scent ... 21

2.2. Terminology systems and ontologies ... 22

2.2.1. Post-‐coordination ... 24

2.3. openEHR and archetypes ... 25

2.4. Architecture for distributed networked systems ... 29

2.4.1. Scalability -‐ the ability to scale ... 30

2.4.2. Sharding ... 30

2.4.3. The CAP theorem ... 30

2.4.4. Distributed database systems and MapReduce ... 31

2.4.5. Representational State Transfer (REST), URIs, and HTTP ... 31

2.4.6. Performance, caching, and reducing number of requests ... 32

2.4.7. Solutions and design patterns complementing REST ... 32

2.5. Boundaries between systems ... 33

2.6. Prototyping and end user innovation ... 33

3. Summary of included publications ... 35

3.1. Visualizing and binding terminology systems to EHR information models (Papers I, II and III) ... 35

3.1.1. First TermViz design ... 35

3.1.2. Semantic Zooming and Details on Demand ... 36

3.1.3. Integrating TermViz ... 37

3.1.4. Termviz 2.0 ... 37

3.1.5. TermViz.js ... 39

3.1.6. Layout Algorithms ... 40

3.1.7. Modification and innovation by end users ... 40

3.2. EHR architecture (Paper IV) ... 42

3.2.1. Learning, rapid prototyping and end user innovation ... 42

3.2.2. Design overview ... 43

3.2.3. Results and Discussion ... 50

3.2.4. Conclusions ... 53

3.3. Storing and retrieving archetype-‐based EHR content (Paper V) ... 54

3.4. Learning, understanding, and prototyping ... 56

3.5. Approaches to learning openEHR (Paper VI) ... 57

(7)

3.5.1. Mail survey ... 57

3.5.2. Approaching openEHR as a beginner ... 58

3.5.3. Is It Unnecessarily Complicated? ... 59

3.5.4. Conclusions and Suggestions for Learning Environments ... 59

3.6. Summaries, overview and navigation in EHRs ... 60

3.7. EHR Navigation using Google Earth (Paper VII) ... 61

3.7.1. Design description ... 62

3.7.2. Facets and aggregation ... 63

3.7.3. Time as a fourth dimension ... 63

3.7.4. Intended use ... 64

3.7.5. Discussion and Conclusion ... 64

3.8. Highlighting in EHRs: a pilot study (‘Paper VIII’) ... 65

3.8.1. Experimental setting and background ... 66

3.8.2. Methods and measurements ... 69

3.8.3. Results ... 70

3.8.4. Discussion ... 72

3.8.5. Replication and modification possibilities ... 74

3.8.6. Conclusions ... 74

4. General discussion ... 75

4.1. Contributions and scientific value ... 75

4.2. Complexity and systems of systems ... 78

4.3. Semantic sustainability and technical debt ... 79

4.3.1. Technical debt in archetype management ... 82

4.4. Responsibility ... 83

5. Summary and Conclusions ... 85

6. Future work ... 87

7. Bibliography / References ... 89

Included Papers ... 93

(8)

(9)

1. Introduction

This thesis can be seen as a journey that starts with several seemingly separate threads that along the way join each other to form a complex weave that reflects the interdisciplinary nature of the research. The ‘classical’ medical informatics focused threads are:

• Semantic interoperability and information reuse

• Terminology systems and ontologies

• Electronic Health Record data structures and infrastructure These are interwoven with more general threads related to information technology, development, usability and design:

• Information visualization, information navigation and human cognition

• Prototyping and enabling end user innovation

• Scalable technical infrastructures and sustainable, maintainable processes

To some readers some of the expressions in the lists above are unfamiliar, but don’t worry, the introduction and background chapters aim at explaining them and their context.

1.1. Delimitations

Even though the thesis and application area spans many areas, the coverage of some related areas has been deliberately limited:

• The thesis and the associated prototypes do not focus on security issues such as authentication and access control.

• Regarding graphical user interfaces for patient summaries, the designs and studies have focused on ‘single-‐patient’ use cases rather than population-‐focused cases such as epidemiology.

1.2. Electronic Health Records

Patient records are a mixture of many kinds of information; notes by clinicians and other care providers, test-‐ and laboratory results. They are used for

supporting patient care, research and education. They also assist communication, management and are legal records of medical actions. Records in computerized form can be called, for example, electronic patient record (EPR), electronic medical record (EMR), computer-‐based patient record (CPR) or the term used in this thesis: Electronic Health Record (EHR). [Ginneken 1997]

Reasons for computerizing records can be to reduce some problems with purely paper-‐based records:

• Practical problems are for example that they can only be at one place at a time, can be lost and that handwriting can be hard to read.

• Computerized control and validation of input and use of standardized structuring may reduce problems with incomplete or variable content.

(10)

• To reuse or analyse information in paper records, they often first need to be manually transcribed – a costly time-‐consuming process that may also introduce errors.

Additional benefits from EHRs can be gained if the EHR system can be made to

‘understand’, or rather process and compute the entered data. Programs can then be made to give active reminders, warnings or advice [Ginneken 1997]. Search functions and user interfaces providing summaries to give an overview of a record are other examples of what can be provided.

Detailed and structured EHR content can be analysed and used to produce statistics and new medical knowledge. An interesting, somewhat controversial, example is the ‘quiet revolution’ in comparative effectiveness research supported by EHRs. Randomised controlled trials provide a rigorous but often costly and time-‐consuming way to gain knowledge. An alternative is to preform

observational studies based on analysing large amounts of existing EHRs. By using techniques like pairwise patient matching, statistically valid conclusions can be drawn. Another benefit is that knowledge can be gained also about groups that would not be included in other clinical trials due to being pregnant, too young or too sick. [Begley 2011]

1.3. Getting people to agree with each other and with information systems

The terms semantic interoperability, information structures (or data structures) and terminology systems and their importance are briefly explained in the following subsections. The background chapter then goes into further detail.

1.3.1. Why bother about semantic interoperability?

Interoperability refers to getting systems to work together and semantics concerns the study of meanings. Semantic interoperability between two

information systems can, among other things, mean that information entered in one system can be used by the other system and its users just as well as if the information originated from the same system. The European SemanticHEALTH report [semHealth2009] details and exemplifies different interoperability levels in useful ways.

A partial answer to the question ‘Why bother about semantic interoperability?’ is

‘Because computers are only good at some things!’. In the next two sections we’ll look in detail at two reasons why semantic interoperability is desirable; 1. Shared development and 2. Safe conversions.

(11)

1.3.2. Reason #1 -‐ Shared development can save resources

= Assistant instructions (computer software)

= Patient data

= Assistant (computer)

= Storage Figure 1. Computers assisting humans with information processing

Computers can bee seen as assistants, but assistants that do not truly understand the meaning of human language.

• The assistants (computers) can follow formal instructions (computer software) exactly

• They can count, compare, sort, copy, move, rearrange, fetch and store data

• But they cannot do any guessing or estimation, they can not take initiatives, they don't understand your language and the real meaning of text. If they have very cleverly designed detailed instructions (software), it may appear as if they do

‘understand’ some meaning, but that is an illusion.

If we structure parts of the data in a consistent way, then instructions can be written so that helpful assistance (search, statistics, decision support etc.) can be based on the data. One way to make some kinds of data efficiently processable is to use terminology systems and ontologies to define meaning – this is further described in section 2.2.

Hello! !"##$?

(12)

!"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,,

7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03

,033

(%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 B(A9 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="*

078$*(%"';

@(,," 2&" (1(*,301(* 207)3(%" A$)%*"3,(4,… !"#

$%&"'( ') &$*+(%,(%" -%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ')

&$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 B?889 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:=

>(%?,," 2. >30% ="* 078$*(%"'; @(,," 2&" (1(*,301(*

207)3(%" A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… CA$3"=$889 CA$3"=$88 CA$3"=$88 !"#

$%&"'( ') &$*+(%,(%" -%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… CA$3"=$88 9!"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="* 078$*(%"';

@(,," 2&" (1(*,301(*

207)3(%" A$)%*"3,(4,…

Patient: D"33( @)220* Patient ID: 19121212-1212 Date & Time: 2010 Sep 28, 13:47

Signature: Dr. E$7(>$'F / drsomb145 Main diagnosis, ICD: G=$3(%" / A00.1

Medication perscription:

Medication: ... G=$3(%" / 123654734543 Dose: ...

...

Medication: ...

!"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,,

7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03

,033

(%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 B(A9 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="*

078$*(%"';

@(,," 2&" (1(*,301(* 207)3(%" A$)%*"3,(4,… !"#

$%&"'( ') &$*+(%,(%" -%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ')

&$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 B?889 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:=

>(%?,," 2. >30% ="* 078$*(%"'; @(,," 2&" (1(*,301(*

207)3(%" A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… CA$3"=$889 CA$3"=$88 CA$3"=$88 !"#

$%&"'( ') &$*+(%,(%" -%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30% ="*

078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… !"# $%&"'( ') &$*+(%,(%" -%.*

/0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&" (,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2. >30%

="* 078$*(%"'; @(,," 2&" (1(*,301(* 207)3(%"

A$)%*"3,(4,… CA$3"=$88 9!"# $%&"'( ') &$*+(%,(%"

-%.* /0*1'0*12 ,033 +"*301 ,(4,5 678$*(%"*'(9 2&0:&"

(,, 7"03 ,033 (%0&;2)*'+"33<30);2( $:= >(%?,," 2.

>30% ="* 078$*(%"';

@(,," 2&" (1(*,301(*

207)3(%" A$)%*"3,(4,…

Figure 2. A text-‐based health record entry, not formally structured, is from many computing perspectives to the computer like hieroglyphs are to most humans. But parts of

the data can be structured in ways that can be efficiently processed by computers.

A crucial point in this thesis is the attempt to discuss what can scale and be sustained reasonably. This includes trying to avoid routes that are likely to lead to dead ends because of costs, lack of human resources or technical limitations. In the above illustration with computers as assistants the following applies:

• It is inexpensive to copy existing instructions within and between locations and organisations. It is also fairly cheap and easy to employ more assistants (in reality corresponding to getting more or stronger computers).

• However, it is error-‐prone and expensive to create new detailed

instructions in the assistants’ formal (programming) languages. Such software development requires cooperation within and between teams skilled in both software development and healthcare processes and knowledge.

The ‘instructions’ to computers in healthcare include Electronic Health Record (EHR) systems and associated decision support rules, epidemiological queries, functions that generate patient summaries etc.

(13)

A way to save development resources is to reuse the same ‘instructions’ in several places, but to do that efficiently requires humans to agree on structure and meaning (semantics) of information pieces – what we call semantic

interoperability. A method of defining and sharing structure and meaning is to use computable detailed clinical models – for example using ’archetype’-‐based approaches that are further described in section 2.3.

Agreement is also needed for reliable statistics and efficient epidemiological studies.

This does not necessarily mean that you need identical graphical interfaces or paper forms, but ‘assistant readable’ identification of information pieces is needed ‘under the hood’ in the systems. (This will be described as identifiers and paths in later chapters.) There will of course be an awful lot of forms, instructions etc. and several

versions of them as new needs are discovered and knowledge increases. Changes, improvements and long-‐term maintenance may be even harder than the initial creation of structures. Thus successful proposed solutions must handle change and multiple versions well.

That the lack of semantic interoperability has an impact on reuse costs was illustrated by Hripcsak et al [Hripcsak 93] that for good reasons included the words ‘Desperately seeking data’ in their description of problems when linking a knowledge-‐based system (KBS) to a clinical database. The KBS provided alerts, interpretations, research screening, and quality assurance functions. The system used Arden Syntax Medical Logic Modules (MLMs) that were intended to make medical rules (examples of assistant instructions in our illustration above) reusable between different EHR systems. The MLMs contained both reusable medical knowledge-‐based logic rule sections and location specific data slot sections used to find and access actual data in the local databases.

Figure 3. If the entered and transmitted patient data (encircled with dots in the figure) has well defined meaning

and agreed structure, then instructions can be reused. This is an essential feature of semantic interoperability.

(14)

Defining and maintaining KBS-‐database links consumed a lot more resources than the medical logic in the MLMs in terms of coding, maintenance, and performance.

Experiences like ‘Desperately seeking data’ and others [Ahmadian 2011] make it very interesting to also look at ways of standardizing the way data is stored in addition to standardizing ‘instructions’ like MLM-‐based decision support rules.

Similar issues arise repeatedly when trying to reuse data for other kinds of decision making [Mawilmada 2012] and for computer instructions, for example when creating summaries or drawing a graphical EHR overview on a screen.

1.3.3. Reason #2 -‐ Otherwise safe conversions can not always be guaranteed

When two differently designed systems need to exchange information they need to agree on some things in order to succeed. One way is to agree on a common message format (the arrow marked MSG, in the centre of Figure 4) and create export and import algorithms (rules and software programs) that for example convert from the format of system A to the agreed message format, and then in system B an import algorithm that converts the message to the internal format of B. This has been an approach used in many applications of the HL7 standard and is also one of the target use cases for the ISO 13606 standard [Kalra 2006].

Another approach is to aim at agreeing on the semantics and structure inside the systems so that conversion needs are reduced (arrows marked SYS in Figure 4).

SYS MSG SYS

EHR system A

EHR system B

export/import

Figure 4. Considering conversion feasibility is important when specifying what to agree on.

Focusing on agreed message formats (MSG above) has been common, but when more kinds of data are shared, then the ‘export/import’-‐converters get increasingly complex to implement and maintain. An alternative approach is to agree on semantics and structure

inside the systems (SYS above).

If information exchange is wanted frequently, then automated conversions between the systems are desired. It must then be possible to write a conversion algorithm (implemented as software). Consider the following examples in Table 1.

(15)

Table 1. Different kinds of data conversions. Some can be done by software others can’t.

Type 1. Same kind of information, but captured in different ways;

Resolvable by computer systems

For many non-‐changing such patterns and data structures it is possible to implement automated export and import mechanisms.

Example: Body weight A: Weight at birth: 3300g

B: Weight: 3.3 kg

Type 2. Same kind of information, but captured in different ways;

Resolvable by medically competent human but not by computer systems Example: Medical history in two different systems

A:

• Chief Complaint

• History of the present illness

• Past medical history

• Family diseases

• Social history

• Substance use (tobacco, alcohol, drugs)

• Diet

• Exercise

B:

• Chief Complaint

• Medical History

• Social History

Type 3. Same kind of information, but captured in different ways

Not resolvable even by medically competent human (but often useful for a human anyway) Example: Aggregations using different intervals (cigarettes/week)

A: 0, 1-‐5, 5-‐10, 11-‐15, 16-‐30, 31-‐50, 51-‐100, 101+

B: 0, 1-‐3, 4-‐7, 8-‐14, 15-‐28, 29-‐56, 57+

Type 4. Different kinds of information or missing information

Not resolvable even by medically competent human (not reusable for certain purposes) Example: Substance use

A:

• Alcohol yes/no

• Tobacco yes/no

B:

• Cigarettes yes/no

• Snuff (snus) yes/no

If all differences in data to be exchanged often are of type 1 above, then it is possible, and can, in a short time perspective, also be most resource efficient, to just focus on a common message format instead of trying to align the inner semantics of the systems. Constructing export/import conversion software will consume resources in the beginning and later when systems or message formats are updated (as in the ‘Desperately seeking data’ problem described in section 1.3.2).

Less common exchange needs that are not possible to convert automatically can then instead be converted to, and transferred as, a text document that is then reinterpreted and entered in suitable form into the receiving system by a human.

(16)

Interesting scalability questions arise if the information structures to be exchanged increase in number of types or increase in complexity. How much human effort can be put into conversion activities? What is possible for an organization? What is possible for a network of multiple organizations? Even if there will be enough financial incentives and resources to put into new

conversion and cross-‐mapping projects, will there be enough skilled people in all the organizations to keep the pace up?

Since this thesis discusses scalability, the main focus regarding semantic interoperability has been on systems that are designed to support agreement regarding the semantics and structure inside the systems (arrows marked SYS in Figure 4). During the thesis work the most research-‐accessible approach has been openEHR, which can be used to build entire EHR systems and configure their semantics using ‘archetypes’, see section 2.3 for details.

1.4. Research Questions

This thesis work started focusing on the question: ‘How can patient information in an electronic health record (EHR) be displayed best in order to gain a quick overview of a patient's history and other important facts?’ However it soon

became clear that even if the answer to that question would be found, it would be a huge challenge to get those results implemented at a broad scale unless other aspects were considered first. The focus was thus broadened to different aspects of scalability and sustainability.

Promising approaches and examples of overviews and patient summaries can be found in isolated systems and research projects. But it is harder to find any obvious path to efficiently spreading the methods and implementations so that they can be used more widely in healthcare – ideally one would like to spread the use of effective information technology the same way that one wants to spread knowledge about effective medical treatments. This thesis has been driven by a desire to show how overviews and patient summaries and their further

dissemination can be made to work ‘for real’ beyond single implementations or prototypes. Time and resources have not sufficed to get all the way through yet, but promising ways have been explored that may accelerate the research and dissemination of visualizations and graphical user interfaces for clinical systems based on shared data structure definitions.

You have probably already noticed that a recurring theme in this thesis is scalability questions of different kinds, sometimes subtly implicated and

sometimes more outspoken. As scientists we should be thinking about the future and be interested in what happens if you extend or extrapolate an approach or paradigm far into the future. We sometimes deliberately try to overload or break experimental system setups to find the limits before corresponding disasters occur in vital deployed systems.

Some may look upon this as some kind of semi-‐mad science in the style of Tom Dickson that puts different things in a powerful kitchen blender in the Internet video series ‘Will it Blend?’. How different medical informatics related

(17)

components can ‘blend’ and be mixed to form composite systems together has been described by Rector [Rector 2004, Rector 2001] and others and is further detailed in section 2.4. This thesis partly builds on that and instead focuses on the questions ‘Will it scale?’ and ‘Is it sustainable?’

Scalability refers to how gracefully a system can grow by adding more resources [Henderson 2006]. Sustainability refers more to how to best use available

resources. Both aspects are important.

1.4.1. Semantic Sustainability

I propose that we start¹ to define and use the term ‘semantic sustainability’, when appropriate, in order to better highlight important aspects of medical informatics that have even wider implications than ‘semantic interoperability’.

Sustainable systems should be ‘capable of being sustained’² over time. Systems need to be maintainable over time as needs change, knowledge increases and technical systems evolve. Some important aspects to consider in a definition of semantic sustainability are:

• Not wasting development resources, and thus risk running out of limited resources like appropriately skilled human resources or financial support that should be prioritized to more important tasks. Shared development, as mentioned in semantic interoperability reason #1 earlier, is one way to reduce such waste.

• Not generating avoidable risks. For example non-‐computable

conversions as described in semantic interoperability reason #2 earlier.

• Minimizing problems for future systems and generations. This includes aiming for versioning-‐capable systems that can evolve gracefully and still maintain computable semantics of old previously entered data.

• Systems should support data reuse and reduce wasteful use of their users’ time. A typical example problem is when non-‐interoperable structures force users to do redundant duplicated data entry of the same data into several systems.

• The sustainable ‘semantic ecology’ should allow for diversity of

innovation sources and maintenance arrangements avoiding to become too dependent on too few providers of systems, development or

innovation.

1 When first preparing the thesis I did not find the term ‘semantic sustainability’ applied to health informatics and decided to ‘adopt’ it. When doing a new internet search right before publication, however I found the term used on a slide in an epSOS presentation at http://www.epractice.eu/files/5.Thorp-‐Buhr.ppt. There it seems to be referring mostly to semantic interoperability. Here I try to widen the perspective even further.

2 “Sustainable” in Merriam-‐Webster dictionary:

1: capable of being sustained

2: of, relating to, or being a method of harvesting or using a resource so that the resource is not depleted or permanently damaged