Deep learning approaches have demonstrated tremendous capabilities for a variety of prediction tasks based on Electronic Health Record data. However, the characteristics of EHR data in clinical settings are known to change continually, due to changes in diagnostic criteria, measurement devices, treatment protocols, or patient cohort attributes. These shifts in data distributions can lead to considerable degradation in the performance of deep learning models in deployment, fundamentally limiting reliability of model inferences and deterring adoption in clinical practice. Here, we introduce an integrated framework to continually monitor model performance and seamlessly update trained models when performance degradation is observed.
First, we leveraged statistical process control tools to design a module for regular monitoring performance of deep learning-based prediction models, in relation to observed ground truth. We implemented a dashboard to visualize model performance as new data streamed in. Second, to address any significant or consistent performance drops observed, we developed a Bayesian continual learning algorithm that can adapt its neural representations to data distribution changes. Specifically, we used a Bayesian Long Short-Term Memory model (BLSTM) backbone and developed an efficient means to continually update model representations via a combination of architectural pruning, regularization and replay strategies while avoiding catastrophic forgetting. Finally, we integrated the performance monitoring and continual learning capability to demonstrate our framework. We focused on a HbA1c prediction use case, based on EHR data from the Singapore Diabetes Registry (2013-19, 22 sites).
In test scenarios with substantial drops in HbA1c prediction performance across time (2013-18), the continually adapted model showed R2 improvements of ~4% over the original model trained on data from prior time periods. Moreover, on a prospective 2019 test set, the continual learning model enabled R2 improvements of ~8% over the original model trained on 2013-18 data. Our integrated performance monitoring and continual learning framework efficiently and seamlessly addresses performance drops due to drifts in data distribution. As such, it could improve reliability of predictive models deployed in real-world decision support tasks.