Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Advertisement

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 15, 2020
Open Peer Review Period: Jan 15, 2020 - Feb 17, 2020
Date Accepted: Mar 25, 2020
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study

Hane CA, Nori VS, Crown WH, Sanghavi DM, Bleicher P

Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study

JMIR Med Inform 2020;8(6):e17819

DOI: 10.2196/17819

PMID: 32490841

PMCID: 7301255

Use of Clinical Notes and Machine Learning to Predict Onset of Dementia

  • Christopher A Hane; 
  • Vijay S Nori; 
  • William H Crown; 
  • Darshak M Sanghavi; 
  • Paul Bleicher; 

ABSTRACT

Background:

Clinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer’s Disease and related dementia (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important under-utilized source of information in machine learning models due to the cost of collection and complexity of analysis.

Objective:

This study investigates using de-identified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of ADRD risk.

Methods:

The models use two years of data to predict a future outcome of ADRD onset. Notes data are provided in a de-identified format with specific terms and sentiments. Terms in notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians.

Results:

When using notes, AUC improved from 85% to 94% and positive predictive value (PPV) increased from 45% to 68% in the model at disease onset. Models with notes improved in both AUC and PPV in years 3-6 when notes volume was largest, results are mixed in years 7 and 8 with smallest cohorts.

Conclusions:

While notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians under-code ADRD diagnoses. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using post-processing techniques to aid model accuracy.


 Citation

Please cite as:

Hane CA, Nori VS, Crown WH, Sanghavi DM, Bleicher P

Use of Clinical Notes and Machine Learning to Predict Onset of Dementia

JMIR Medical Informatics. 25/03/2020:17819 (forthcoming/in press)

DOI: 10.2196/17819

URL: https://preprints.jmir.org/preprint/17819

PMID: 32490841

PMCID: 7301255

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.