• No results found

14 - Session 4: Longitudinal and other temporal issues for long-term studies

N/A
N/A
Protected

Academic year: 2021

Share "14 - Session 4: Longitudinal and other temporal issues for long-term studies"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

Session 4: Longitudinal and

Other Temporal Issues for

Long-Term Studies

J. Michael Gaziano, MD, MPH

Scientific Director, MAVERIC, VA Boston Healthcare System

Chief, Division of Aging, Brigham and Women’s Hospital

Professor of Medicine, Harvard Medical School

December 11, 2014

(2)

Using Big Data in the VA

• VA Healthcare System

• Large-scale research programs nested in the

clinical system

– Genetic Mega Cohort: Million Veteran Program

– Pragmatic Randomized Trial: HCTZ v.

Chlorthalidone

• Using the longitudinal big data

• Summary and Lessons Learned

(3)

Nesting Population Research

in the VA Healthcare System

• VA is an ideal setting for

nested large-scale population

research

Stable and willing veteran

population of 8 to 10 million

Research infrastructure with

diverse expertise

Outstanding electronic medical

record; fully integrated; data

reaching back as far as 20 years

(4)

Million Veteran Program (MVP)

Enroll up to one million users of the

VHA into an observational

mega-cohort

o

Blood collection for storage in

biorepository for future research

o

Collect health and lifestyle

information

o

Access to electronic medical record

o

Ability to recontact participants

(5)

Distribution of MVP Sites

(6)

Cohort

Identification

Centrally

Phone

Enrollment

& Consent

Randomize

Intervention

Delivered

by mail

Data Capture

By EHR &CMS

Study DB

Analysis

Clinical

Decision

Support

Care providers using EMR

Study team using traditional scientific tools

Pragmatic Trial of HCTZ v.

(7)
(8)

System Architecture

8

Access Authorization by Governance System

Vendor

Molecular Lab

Query

Mart

Query

Portal

Analysis

Environment

Consent

Manager

Study Mart Study Mart Study Mart

Data

Warehouse

VA

Non VA

Clinical Data

NDI, CMS

Survey Data

Molecular data

Researcher

(9)

Current State: Logistics for Data &

Environment

9

Ex

ternal

to

VI

N

C

I

GenISIS Scientific env

MVP Projs MAV-VINCI xfer zone

MVP participant roster CDW filtered by MVP roster VINCI Researcher MAVERIC “high-level” VINCI user MAVERIC VINCI CDW copy VINCI CDW copy Support by VINCI services

completed clinical data-set

completed clinical data-set

VINCI

eg. MVP LOIs

(10)

Million Veteran Program (MVP)

Data Universe

10

VA - Clinical

VINCI, VIReC,

Self-reported

MVP surveys

Biospecimen

Non-VA

NDI, CMS, etc.

Molecular

Data

MVP

Participant

(11)

VA Data Sources

Corporate Data Warehouse

Databases

National Patient Care

Databases

Vital Status

Decision Support System

National Data Extract

Beneficiary Identification

Records Locator (BIRLS)

death file

New England VISN-1

Pharmacy files

Outpatient Clinic File (OPC)

Patient Treatment File (PTF)

Inpatient and Outpatient

Hospitalizations

Clinic Inpatient and Outpatient

Visits

Diagnosis (ICD-9) codes

Procedure (CPT) codes

Pharmacy data and laboratory

data

Pharmacy Benefit Management

(PBM) system database

OEF/OIF and OND Roster

VA Clinical Assessment

Reporting and Tracking (CART)

Veterans Affairs Surgical Quality

Improvement Program (VASQIP)

Veterans Affairs Central Cancer

Registry (VACCR)

11

Special

Data

Access w/

Data

Steward

National

Data

Systems

(NDS)

(12)

Other Data Sources

MVP Data

Self-Reported Survey Data:

 Lifestyle Survey Data (Personal

Information, Well-Being, Activity,

Health, Military Experience,

Dietary Intak, Medication,

Habits)

 Baseline Survey (Health, Military

Experiences, family medical

history)

Genetic Information

Vital Status: Social Security Death

Master Files, National Death Index,

State Vital Statistic Registry

Non-VA Data

National Death Index (NDI)

Centers for Medicare and Medicaid

Services (CMS)

State Mortality Data

(13)

3-Tier 7-Step Phenotyping Process

Tier I

Algorithm

(

T1A

)

Initial cohort

(Likely cases, possible cases, likely non-cases)

Structured Data

Literature Search Expert Consultation Unstructured Data

Structured Data

Phenomic

Database

Data Processing Pipelines (NLP, data curation, extraction, augmentation, etc)

Refined Algorithm (

T2A

)

(Synthesize T1A and phenomic database to derive T2A)

• Development of a probabilistic model • Assignment of quantitative “ caseness” • Evaluate T2A • Formulate T3A Prior Knowledge

Tier II

Algorithm

(

T2A

)

Tier III

Algorithm

(

T3A

)

T1A T3A

Step 1: Define initial working algorithm (T1A) Step 5: Derive T2A

Step 2: Create study cohort and apply T1A Step 6: Evaluate T2A to formulate T3A

Step 3: Create Annotation Data Set Step 7: Develop probabilistic model and assign caseness

Step 4: Create Phenomic Database through Data Processing Pipelines

Deposit resulting algorithms to a central Phenotype Library

(14)

MAVERIC Phenotyping Activities

Phenotypes

•Disease

– Myocardial infarction (MI) – Stroke

– Unstable angina with revascularization – Acute congestive heart failure

– Death from cardiovascular disease – Vascular procedure

– Posttraumatic stress disorder (PTSD) – Schizophrenia

– Bipolar disorder – Traumatic brain injury – Depression

– Vascular dementia – Cognitive impairment – Type 2 diabetes mellitus

Algorithm Development

•CPT codes •ICD-9 codes •Laboratory values •Medications

•Natural Language Processing (NLP)

• Laboratory values – High-density lipoproteins (HDLs) – Low-density lipoproteins (LDLs) – Total cholesterol – Albumin – Serum creatinine – Triglycerides • Physical traits – Blood pressure • Demographics – Smoking – Alcohol consumption – Race – Combat exposure

Validation Methods

• Chart review by content experts

(15)

List of Validated Phenotypes

A. Disease Exposures and Outcomes- Algorithm

Generation

B. Characterization of Longitudinal Laboratory

and Clinical Values

C. Non-Disease Exposures and Outcomes

Algorithm Generation

(16)

A. Disease Exposures and

Outcomes-Algorithm Generation

Active Cancer Chronic Kidney Disease Crohns Disease Acute Kidney Injury Chronic kidney disease/End stage renal disorder Depression Alcohol Abuse or Dependence

Chronic liver disease, including Fatty Liver

Disease and Cirrhosis Diabetes

Alzheimer’s and non-Alzheimer's dementias Chronic Obstructive Pulmonary Disease Drug Induced Liver Injury Anemia Clostridium difficile Erectile Dysfunction Anxiety disorders

Cognitive disorder due to late effects of

cerebrovascular disease Falls in the Elderly Bipoloar disease/mania episodes Community Acquired Pnuemonias Fractures

BMI categorization Congestive Heart Failure

Head and Neck cancer diagnoses and tumor staging (stage III and IV)

Bradycardia Coronary Artery Disease Hepetitis C infection Cerebrovascular Disease Coronary Heart Disease Hypertension

(17)

A. Disease Exposures and Outcomes- Algorithm

Generation

Hy's Law and Elevated Liver Function Tests Osteoporotic Fractures Substance Abuse Disorders Incontinence and Catheter Use Peripheral Vascular Disease Suicidality

Intentional and Unintentional Injuries/Poisoning Personality Disorder Systemic Lupus Erythematosus Lower extremity peripheral vascular occlusive

disease Pneumonia Thrombocytopenia Major Bleeding Events- intracranial, gi, etc Post-traumatic stress disorder Transient Ischemic Attack Metabolic syndrome Prostate Cancer Traumatic brain injury MRSA infection Rheumatoid arthritis and severity index

Multiple myeloma and MGUS Revascularization Myocardial Infarction Schizophrenia

(18)

B)

Characterization of longitudinal laboratory and clinical

values, including but not limited to

:

• Blood Pressure

• Pulse/Heart Rate

• Lipids, (TC,HDL,LDL, non-HDL, Trigs)

• HbA1C, glucose

• Albumin

• Hb, PLT, HCT

• PCR for MRSA

• PSA

• Imaging Studies

(19)

C) Non-Disease Exposures and Outcomes

Algorithm Generation

AntidepressantsMedication Dosing For Erythropoeitin Stimulating Agents

Antiepileptic DrugsMRIs (with and without contrast)

AntipsychoticsNSAIDs

Bare Metal Stent placement/Drug Eluting Stent

Placement (# stents, stent revisions)Opioids

CABGPPI use and discontinuation

Chemotherapy dosing algorithmsProton Pump Inhibitors

Erythropoiesis Stimulating Agent Selective COX-2 Inhibitors

(20)

MAVERIC Phenotyping Activities

Phenotypes

•Disease

– Myocardial infarction (MI) – Stroke

– Unstable angina with revascularization – Acute congestive heart failure

– Death from cardiovascular disease – Vascular procedure

– Posttraumatic stress disorder (PTSD) – Schizophrenia

– Bipolar disorder – Traumatic brain injury – Depression

– Vascular dementia – Cognitive impairment – Type 2 diabetes mellitus

Algorithm Development

•CPT codes •ICD-9 codes •Laboratory values •Medications

•Natural Language Processing (NLP)

• Laboratory values – High-density lipoproteins (HDLs) – Low-density lipoproteins (LDLs) – Total cholesterol – Albumin – Serum creatinine – Triglycerides • Physical traits – Blood pressure • Demographics – Smoking – Alcohol consumption – Race – Combat exposure

Validation Methods

• Chart review by content experts

(21)

eMERGE Phenotyping Activities

Phenotypes • Disease – Atrial fibrillation – Cataracts – Crohn’s disease – Dementia – Diabetic retinopathy – Drug induced liver injury – Hypothyroidism

– Multiple sclerosis

– Peripheral arterial disease (PAD) – Rheumatoid arthritis

– Severe early childhood obesity – Type 2 diabetes mellitus

• Laboratory values

– Red blood cell indices – White blood cell indices – Lipids

– High-density lipoproteins (HDLs) – Hemoglobin A1c

• Medication response

– Poor metabolizers of Clopidogrel – Warfarin dose response

• Physical traits – Normal ECG – Height Algorithm Development •CPT codes •ICD-9 codes •Laboratory values •Medications

•Natural Language Processing (NLP)

Validation Methods

•Multisite validation •Chart review by content experts

(22)

CHARGE Consortium

Phenotypes

•Disease

– Myocardial infarction (MI) – Stroke

– Transient ischemic attack (TIA) – Heart failure

– Peripheral vascular disease (PVD) – Dementia – Diabetes – Hypertension – Atrial fibrillation – Depression •Laboratory values – Fasting lipids – Fasting glucose

– Glucose tolerance test •Physical traits – Blood pressure – Height, weight •Demographics – Smoking

Algorithm Development

•None – phenotypes collected during prospective cohort studies

Validation

•Phenotype standardization across the 5 cohorts

(23)
(24)
(25)

MVP Recruitment and

Enrollment

• Invitational Mailing/Appointment Mailing

Invitation letter, Baseline Survey, MVP Brochure

Appointment letter, Informed consent language

• Walk-in recruitment

• Study visit

Informed consent/HIPAA, Blood collection

• Thank you Mailing

(26)

Blood Draw

• 4 ice packs must be in the freezer the day

before bloods will be drawn

• After obtaining consent, scan barcode on

EDTA blood tube to enter blood ID into

blood collection form

• Draw blood filling the tube

• Rescan tube

(27)
(28)

Processing

(29)

VA Central Biorepository

(30)

0 - 24 24 - 48 48 - 72 72 - 96 > 96 % of total 1.01 85.84 12.01 0.54 0.42 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

P

e

rce

n

t

of T

ot

al

Sample Transit Time (hrs) from

collection to storage

MVP

Biosample quality

measurements

Good Lipemic Underfill ed

Hemolyz

ed Clotted Lysed Other % of total 93.37 2.83 2.46 0.58 0.57 0.16 0.03 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

P

e

rc

e

nt

o

f t

o

tal

Biosample Quality

(31)

31

Current Lab Activities

• Receiving and Processing - 400-600 per day

• Shipping Samples for Sequencing and Genotyping:

Assay Type

Shipments to-date

Targeted Shipments

Whole Genome

sequencing

1886

1370 + 516

Whole Exome

sequencing

24260

24126

SNP Genotyping

206,603

~200,000

(32)

MVP Recruitment to Date

Invitation mailings sent

2.6 Million

Expressed interest by mail

19.4% (11.2%/8.2%)

Optout

13%

Completed Baseline Surveys

456,000

Consented Veterans

325,000

Specimens in Lab

323,000

Unscheduled (proportion)

40%

Upcoming appointments

11,000

(33)
(34)

Race

78% 20% 1% 1% 80% 18% 1% 1% 78% 21% 1% 1% 80% 18% 2% 1% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Caucasian African American Asian Native American

Race

Data Source: VA

(35)

Data

Generation

Data

Transmission

Data

Ingestion

Data

Indexing

Data

Storage

Data

Analysis

QC 1 (Disk)

QC 2 (Data)

Planned Genomic Data Pipeline for

Genotype data

At Vendor: 1. Sample QC 2. Data preparation as per VA requirements Tracking

1. Sample send outs 2. Data transfers

1. Disk QC

2. Data uptake into GenISIS Storage Systems

1. Data QC

2. Indexing & Meta data extraction

1. Data Storage 2. Data integration with honest broker 3. Data harmonization

1. Study marts 2. Data analysis

(36)

Planned Genomic Data Pipeline for

Sequence Data

(37)

Using Big Data in the VA

• VA Healthcare System

• Large-scale research programs nested in the

clinical system

– Genetic Mega Cohort: Million Veteran Program

– Pragmatic Randomized Trial: HCTZ v. Chlorthalidone

• Using the big data

– Biochemical pipelines

– Phenomic data

• Summary and Lessons for clinical care

(38)
(39)
(40)

40

CC

F/U Depression

The patient indicates that his symptoms have improved

significantly, but not as much as he expected. He is still sleeping a lot (about 12 hours per day) and finds it hard to concentrate on looking for work. He denies suicidal ideation. His PHQ-9 score is 16 today.

The patient has a history of binge eating episodes. He is an emotional eater and often feels out of control, but continues to eat after job search disappointments. He often binges at night and has done this 3-4 times per week for the past several years.

Professional Diagnosis: Axis I:

Major Depression, partial response to meds Binge eating disorder

Axis II:

Deferred Axis III:

(41)

VA Data Sources

• Corporate Data Warehouse

Databases

• National Patient Care Databases

• Vital Status

• Decision Support System

• National Data Extract

• Beneficiary Identification Records

Locator (BIRLS) death file

• New England VISN-1 Pharmacy

files

• Pharmacy Benefit Management

(PBM) system database

• Outpatient Clinic File (OPC)

• Patient Treatment File (PTF

)

• Clinic Inpatient and Outpatient

Visits

• Inpatient and Outpatient

Hospitalizations

• Diagnosis (ICD-9) codes

• Procedure (CPT) codes

• Pharmacy data and laboratory

data

• OEF/OIF and OND Roster

• VA Clinical Assessment Reporting

and Tracking (CART)

• Veterans Affairs Surgical Quality

Improvement Program (VASQIP)

• Veterans Affairs Central Cancer

Registry (VACCR

)

(42)

Other Data Sources

MVP Data

• Self-Reported Survey Data:

 Lifestyle Survey Data

(Personal Information,

Well-Being, Activity,

Health, Military

Experience, Dietary

Intake, Medication,

Habits)

 Baseline Survey (Health,

Military Experiences,

family medical history)

Non-VA Data

• National Death Index (NDI)

• Centers for Medicare and

Medicaid Services (CMS)

• Social Security Death

Master Files

• State Mortality Data

• Cancer Registries

(43)

Examples of Data Issues

• Types of data

– ICD codes

– Procedure codes

– Lab data

– Medication data

– Imaging data

(44)

Various Levels of Data Processing

44

Basic Cleaning

Data quality and logic checks of

raw data elements and values

checking logics on value ranges and type

Curation

Data standardization and

harmonization

laboratory data element naming convention

Simple

Phenotyping

Defining algorithms based on

prior knowledge based on

structured data elements

requires subject matter experts working together with

EMR data experts

Complex

Phenotyping

Deriving complex algorithms

combining both structured and

unstructured databases

i.

Further development and validation of complex

phenotyping algorithms will be completed based on

each funded project

ii.

This is a deeper phenotyping requiring

processing the unstructured database by expert data

programmers and analyst using and possibly building a

specific data mining pipeline.

(45)

Using Big Data in the VA

• VA Healthcare System

• Large-scale research programs nested in the

clinical system

– Genetic Mega Cohort: Million Veteran Program

– Pragmatic Randomized Trial: HCTZ v. Chlorthalidone

• Using the big data

– Biochemical pipelines

– Phenomic data

• Summary and Lessons for clinical care

(46)

Summary and Lessons for Research

• Using data for Research

– Don’t boil the ocean

– Develop structured data

model

– Computing environment

– Validate!

– Missing data is OK

– Research question defines

level of quality

– Research lab results not

necessarily from a certified

lab

(47)

Summary and Lessons for Clinical care

• Using Big data for Clinical

care

– Clinical question defines data

quality

– Real time need

– Missing data

– Centralize processes (pros

and cons)

(48)

48

Data

Sources

Curation

Zone

Landing

Zone

Cleaning

Zone

Query mart

Study mart

Phenotyping

Zone

- VA & Non-VA Sources - Access to Data Source - Ideal Sources Identification - Throughput - Identity checks - Assign MPI (DIVA ID) - Honest Broker Integration - Data Integrity checks - Algorithms for basic data cleaning - MVP backend Data Dictionary - Data Validation - Data Harmonization - Deriving phenotype terms based on standards - Phenotyping algorithm development - Data and metadata associations - Ontology development - Data Dictionary/Meta data Manager - Terminology based aggregate query - Metadata driven study data request - Access controlled Study specific data marts

(49)

CDW/Vista

VINCI

CDW

CMS

NDI

Access Query Operation

Study Datamart

GenISIS

Bio-repository GenomicMVP Enclave

Clinical

Recruitment

Operations

(50)

VA Data Sources

Corporate Data Warehouse

Databases

National Patient Care

Databases

Vital Status

Decision Support System

National Data Extract

Beneficiary Identification

Records Locator (BIRLS)

death file

New England VISN-1

Pharmacy files

Outpatient Clinic File (OPC)

Patient Treatment File (PTF)

Inpatient and Outpatient

Hospitalizations

Clinic Inpatient and Outpatient

Visits

Diagnosis (ICD-9) codes

Procedure (CPT) codes

Pharmacy data and laboratory

data

Pharmacy Benefit Management

(PBM) system database

OEF/OIF and OND Roster

VA Clinical Assessment

Reporting and Tracking (CART)

Veterans Affairs Surgical Quality

Improvement Program (VASQIP)

Veterans Affairs Central Cancer

Registry (VACCR)

50

Special

Data

Access w/

Data

Steward

National

Data

Systems

(NDS)

(51)

Sources of Data Most Commonly

Used

Nationwide VA electronic medical record system (VISTA)

Inpatient & Outpatient clinic visit and hospitalization data 1997- present

Pharmacy & Lab data 2002- present

Mortality data (VA Benefits, Social Security Death Index, National Death Index)

Medicare data

Perioperative data (NSQIP)

Economic data (HERC)

Clinical note extraction

(52)

E & P Subcommittee Tasks

52

• Develop Strategy for Data Organization

• Define Data Access Issues

• Derive Process for Phenotyping

• Develop Phenotype Knowledge Base

• Catalogue Existing Phenotypes

(53)
(54)

MVP Nested Cohorts

Specific Data

mart

MVP

Enrollees

MVP

Respondents

All veteran

users

(55)
(56)

Handling different data sources

56

MVP

i.

Baseline Questionnaire

ii.

Lifestyle Questionnaire

VA

VIReC – http://vaww.virec.re search.va.gov/Intro/ Working-with-VA-Data.htm

i.

CDW

ii.

PBM Database

iii.

VA/CMS

iv.

BIRLS Death File

v.

VHA Vital Status File (VSF)

vi.

Special Data Registries and

Databases (Examples: CART, NSQIP,

Cancer Registry, etc.)

vii.

Others

Non-VA

i.

CMS

ii.

NDI

(57)

Baseline Survey

Collects basic demographic,

health, and lifestyle information

including:

Health status

Military experience

Medical history

Healthcare utilization

Family pedigree

(58)

Thank You / Lifestyle Survey

• Thank you letter

• Lifestyle Survey

Depression scale/SF12V

Occupational history

Combat history

Exercise habits

Mental health

Environmental exposures

Dietary questionnaire

(59)

MVP Study Visit

• Study visit procedures

Obtain informed consent and HIPAA

Collect blood specimen

Walk-ins given printed baseline form

Given MVP pin

• Consent and HIPAA are faxed to a teleform server at

West Haven

• Blood specimen in shipped to the VA Central

Biorepository

(60)
(61)

BioBank Base Content

-Revised

Original Exome & Indel

(238K)

Pharmacogenomic/ADME

(2K)

eQTLs (23K)

New Exome & Indels (26K)

New LOF & Indels (70K)

GWAS – Published (246K)

GWAS – AA booster (50K

)

MVP Modules Added

HLA/KIR (9k)

Psychiatric (26K)

Other disease/condition

(42K)

(Immuno, Cardio, Cancer,

Blood types, Diabetic,

ApoE, Addiction,

Nephrology, Obesity,

Stroke, Asthma etc)

References

Related documents

Tracer test measurements were conducted in 3 reaches of Tullstorps brook in the summer of 2015. In 2019, tracer tests were conducted in these same 3 reaches and they are the subject

Governing the Grey Zone: Why Hybrid Regimes in Europe’s Eastern Neighborhood Pursue Partial Governance Reforms.. Göteborg Studies in Politics 153, edited by Bo Rothstein,

In the first case, donor assistance ends up being used for partisan purposes; in the second case, it risks being used for private

The second noticeable trend is that a significant number of the countries that in the past decade have moved away from different types of authoritarianism have not transformed

'private' garden space defined by planting and timber fencing with access to 'communal' space maintenance gate focal shelter multifunctional outdoor space patio extended from

I den här studien används den kvantitativa innehållsanalysen i syfte att kartlägga i vilken utsträckning begreppet no go-zoner används i Aftonbladet, Expressen,

Firstly, I wanted to keep the variable colors in a figurative and decorative picture; to draw a dreamy world on the tapestry and express the feeling of fantasia- a most simple

I teknikövningen som jag nyss nämnde blir detta ett tydligt exempel på att öva något långsamt, då mitt resultat visar att jag varit noga med att öva ett moment noggrant innan