EHR Data Methodologies in Clinical Research:
Perspectives from the Field
Michael G. Kahn MD, PhD Professor, Pediatric Epidemiology
University of Colorado Michael.Kahn@ucdenver.edu
11 December 2014
Funding by: PCORI Contract ME-1303-5581 (Kahn), NCATS UL1TR001082 (Sokol), AHRQ R01HS022956 (Schilling)
1
Session 1: Semantic Harmonization; Definition; Content; Ontologies
Common Data Models for Sharing EHR data across Settings
Disclosures
Presentation based on EDM Forum commissioned paper:
2
A common data model is critical!
EHR-1
Local
Data
Warehouse
Other
EHR-2
Clinical
Registries
Other
EHR-3
Limited Data Set Common Data Model Common Terminology
Common Query
Limited Data Set Common Data Model Common TerminologyLimited Data Set Common Data Model Common Terminology
What is a data model & why should I
care?
• A data model determines:
– What data elements can be stored
– What relationships between data
can be represented
– Technical stuff: data type,
allowed ranges, required versus
optional (missingness)
• You should care because it
determines:
– How easy can data be recorded, extracted
& queried
– Contributes to data quality
Visit-centric versus Patient-centric
Query: “For various age groups, how many
medications where filled?”
6
Four-table join
Query: “For various age groups, what is the
average number of prescriptions per visit?”
7
Two-table join
Three-table join +
Date comparisons
SAFTINet Asthma Cohort Definition
• Adults (ages 18 and over) as of Jan 1, 2009 receiving care in
selected sites who:
– Have had at least 2 visits separated by at least 30 days coded as 493.xx
in the 18 months prior to July 1, 2011, OR
– A single diagnosis of 493.xx AND two filled prescriptions for an asthma
maintenance medication separated by at least 30 days in the past 12
months.
• Exclusion criteria: Patients with other concomitant chronic lung
disease
– Cystic fibrosis
– COPD, emphysema, chronic bronchitis
– Alpha-1-antitrypsin deficiency
– Pulmonary fibrosis
– Active TB
Key questions for a data model
• From Jeff Brown regarding FDA Sentinel
Initiative*:
1. What does the system need to do?
2. What data are needed to meet
system needs?
3. Where will the data be stored?
4. Where will the data be analyzed?
5. Is a common data model needed,
and if so, what will the model look like?
9
*Brown JS, Lane K, Moore K, Platt R. Defining and evaluating possible database models to implement the FDA Sentinel initiative. U.S. Food and Drug Administration; May 2009 2009.
Eight dimensions of data models
Modified from Moody and Shanks*
Dimension
Original definition
Recasted definition for CER
1. Completeness Does the data model contain all user requirements?
Can the data model store and retrieve data to meet investigator CER needs?
2. Integrity Does the data model conform
to the business rules and processes to guarantee data integrity and enforce policies?
Does the data model enforce meaningful data relationships and constraints that uphold the intent of the data’s original purpose, i.e., clinical care, billing?
3. Flexibility Does the data model deal with changes in business and/or regulatory change?
Can new data elements and
relationships be added if project scope or if regulatory rules (e.g., patient
identification) changes? 4. Understandability Are the concepts and
structures in the data model easily understood?
Do the concepts, structures and relationships make sense to
investigators, data managers, and statisticians?
10
Dimension
Original definition
Recasted definition for CER
5. Correctness Does the data model conform to the rules of the data
modeling technique?
Does the model conform to good data modeling practices such as limited data storage redundancy?
6. Simplicity Does the data model contain
the minimum possible entities and relationships?
Are concepts represented as
straightforwardly as possible? Are all data element necessary?
7. Integration Is the data model consistent with the rest of the
organization’s data?
Do all of the various data domains, such as demographics, observations, labs and medications “hang together” in a consistent and logical fashion? 8. Implementability Can the data model be
implemented within existing time, budget, and technology constraints?
Can the data model be implemented and maintained by current and future partners given anticipated budgets, time, and technical constraints?
11
Eight dimensions of data models
Modified from Moody and Shanks*
Major common data models
Name
Developing entity
Initial Purpose
Observational
Medical
Outcomes Project
(OMOP)
Foundation of the
NIH, now
Reagan-Udall Foundation
Comparative Drug Outcomes Studies
i2b2
Partners Healthcare
Informatics framework for clinical
and biological data integration.
Widely used across NCATS CTSAs
HMORN Virtual
Data Warehouse
(VDW)
HMO Research
Network
Distributed data warehouse to allow
comparative studies across
collaborating sites: HMORN, CRN,
Oregon CTRI
Mini-Sentinel FDA
Derivative of VDW focused on
large-scale drug surveillance
PCORnet PCORI
Derivative of Mini-Sentinel focused
on PCOR research
A common data model is critical!
EHR
Local
Data
Warehouse
Other
EHR
Clinical
Registries
Other
EHR
Limited Data Set Common Data Model Common Terminology
Common Query
Limited Data Set Common Data Model Common TerminologyLimited Data Set Common Data Model Common Terminology
Crossing the CER chasm !!
Public domain tools to help “Cross the CER Chasm”
• Data profiling with OHDSI White Rabbit
• Data transformation documentation with OHDSI Rabbit in a Hat
14
Public domain tools to help “Cross the CER Chasm”
• Data profiling with OHDSI White Rabbit
• Data transformation documentation with OHDSI Rabbit in a Hat
15
Ensuring Data Consistency/Comparability
EHR Data Methodologies in Clinical Research:
Perspectives from the Field
Michael G. Kahn MD, PhD Professor, Pediatric Epidemiology
University of Colorado Michael.Kahn@ucdenver.edu
11 December 2014
Funding by: PCORI Contract ME-1303-5581 (Kahn), NCATS UL1TR001082 (Sokol), AHRQ R01HS022956 (Schilling)
17