This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Vendors in the health care industry produce diagnostic systems that, through a secured connection, allow them to monitor performance almost in real time. However, challenges exist in analyzing and interpreting large volumes of noisy quality control (QC) data. As a result, some QC shifts may not be detected early enough by the vendor, but lead a customer to complain.
The aim of this study was to hypothesize that a more proactive response could be designed by utilizing the collected QC data more efficiently. Our aim is therefore to help prevent customer complaints by predicting them based on the QC data collected by in vitro diagnostic systems.
QC data from five select in vitro diagnostic assays were combined with the corresponding database of customer complaints over a period of 90 days. A subset of these data over the last 45 days was also analyzed to assess how the length of the training period affects predictions. We defined a set of features used to train two classifiers, one based on decision trees and the other based on adaptive boosting, and assessed model performance by cross-validation.
The cross-validations showed classification error rates close to zero for some assays with adaptive boosting when predicting the potential cause of customer complaints. Performance was improved by shortening the training period when the volume of complaints increased. Denoising filters that reduced the number of categories to predict further improved performance, as their application simplified the prediction problem.
This novel approach to predicting customer complaints based on QC data may allow the diagnostic industry, the expected end user of our approach, to proactively identify potential product quality issues and fix these before receiving customer complaints. This represents a new step in the direction of using big data toward product quality improvement.
Connected and so-called smart meters and other tools have transformed virtually every industry by enabling new functions and capabilities such as continuous monitoring, control, optimization, and autonomy [
However, the sheer amount of data generated by these connected systems is such that big data analytics are required [
As QC data are routinely used to monitor the performance of IVD and identify signals that may indicate a performance change, a number of approaches have been developed. These range from panels of experts that submit monthly reports [
The objective here is therefore to integrate these two kinds of data, QC data and customer complaints, to be able to predict specific QC issues, while accounting for intrinsic issues pertaining to customer data. Indeed, customer complaint databases have at least three inherent limitations that need to be considered when designing a prediction tool. First, complaint databases may contain inaccurate, incomplete, untimely, or unverified information [
Here, based on a particular connected IVD analyzer, we show that integrating QC data with a database of customer complaints can be used to predict which type of issues customers complain about. We hypothesized that connected systems can be utilized more efficiently by utilizing the collected QC data more efficiently and more specifically by resorting to machine-learning algorithms. We show that it is possible to identify product issues more proactively, which makes it possible to act on these before they trigger a customer complaint. We further show that some filtering of the complaint data (denoising) improves the accuracy of issues prediction. This work represents a first step toward meeting the recent plan from the US Food and Drug Administration (FDA) to leverage on big data to improve device performance and health care [
Data were collected using the e-Connectivity application’s chemistry results, manufactured by Ortho Clinical Diagnostics (Raritan, New Jersey). This feature allows the manufacturer to pull information remotely from equipment installed at customer sites, which are themselves distributed throughout the world. The data retrieved in this study were generated by Ortho Clinical Diagnostics’ VITROS analyzers of the 5,1 FS series, the 5600, 4600, 3600, or ECi/ECiQ Systems, that all log the same kind of information through e-Connectivity. Only QC data were extracted to avoid complications linked to patients’ data (identifiability, variability, etc).
The e-Connectivity data contain information relative to the assay, serial numbers reflecting its origin, the measured concentrations, as well as some information relative to the analyzer itself (see
The corresponding customer complaint data were obtained by querying the product complaint database of the same manufacturer for the same time window as the QC data. Customer data contained information with respect to the assay for which an issue is reported, the call area (error code), and other information related to the assay (see
The only fields that are shared between QC and customer data are assay name, J numbers, and lot numbers (
To find predictors of customer complaints based on QC data, we need to define operational variables, which are called features. These features were defined by inspecting a typical log of the system (
List of the fields logged by e-Connectivity (that includes quality control, QC data) and by the customer complaint system. Corresponding abbreviations are shown.
Data and Abbreviation | Short description | |
Assay | Abbreviation of assay name (recoded here) | |
J Number | Unique identifier assigned to each analyzer placed | |
F Concentration | Concentration of solute (assay); QCa result | |
Units | Unit of measured concentration (mmol/L) | |
F Concentration (SI) | Concentration of solute (assay); QC result | |
Units SI | Unit of measured concentration (SI) | |
Reagent Lot Number | Reagent lot number | |
S Gen | Manufacturing generation number | |
S Lot | Manufacturing lot number | |
ERF Lot | Electrolyte reference fluid lot | |
IWF Lot | Immuno-wash fluid lot number | |
Control Lot Number | Performance verifier lot number | |
Cal Curve ID | Calibration curve ID | |
Result ID | Unique identifier (encrypted) of QC result | |
Sample Name | Unique identifier (encrypted) of sample name | |
Time Metering | Time stamp of concentration log through e-Connectivity | |
Total Dilution | Dilution factor | |
Operator Dilution | Operator requested dilution | |
Body Fluid | Fluid type (serum, plasma, or urine) | |
Create Audit Date | Time stamp of when complaint was placed | |
Call Subject | Same as assay in e-Connectivity | |
Call Area | Classification of concern or problem of the product or the analyzer-generated condition | |
Resolution | Term describing how the complaint was resolved | |
Complaint Number | Unique identifier of each complaint | |
Customer Number | Unique identifier of each customer | |
J Number | Analyzer serial number | |
Lot number | Reagent lot number | |
Region | Geographic region where complaint was placed | |
Call Status | Current call status of complaint (closed or open) | |
Problem description | Free-text field describing the complaint |
aQC: quality control.
Feature definitions based on a typical sample logged in e-Connectivity. Assay concentrations (here for assay A) are plotted as a function of time. Horizontal blue lines show the modes of the density of sample means (our estimated verifiers). Vertical gray lines show timing of maintenance activities (change of calibration curves, etc). The orange vertical line shows when the customer placed a call—for “accuracy high” (ACCH; indicates the measured concentration is suspected of being higher than the actual value) in this example. The concentration reading just before this call (“#1”) and 10 e-Connectivity logs before it (“#10”) are indicated in red. Our machine-learning (ML) algorithms (in red) aim at learning the signatures (in purple) of call areas (orange) from a training set, to be able to identify those call areas, before a customer complains.
Concentration readings departing from expected values can be thought of as the prime trigger of customer complaints. Obviously, their absolute value with no other context has no predictive value (as long as it is not outside of the biological range) for QC data, and therefore, we should focus on departure from verifiers, which are known concentration readings produced during manufacturing. However, these verifiers are not logged by e-Connectivity and are only available as PDF files, which cannot be easily parsed. Customers may also choose to use QC material that is manufactured by a third-party, which further complicates the retrieval of verifier information. As a workaround, we calculated mean concentrations by samples, estimated the density kernel of these sample means, and determined the location of all the modes (
Customers may notice suboptimal performance of a machine and decide to try and resolve the issue on their own and place a call for assistance only if they cannot resolve the issue. We therefore defined features based on different maintenance events logged by the system (six in total): change of S Gen, S Lot, ERF Lot, IWF Lot, Control Lot Number, and Cal Curve ID. We considered both the timing of the last event before the call and the number of such events before the call. This led us to define 12 additional features based on maintenance events, for a grand total of 19 features (
Because the use of only “positive samples” (samples that led to a customer call) to train our algorithms would bias any prediction toward overpredicting calls, we also defined features for “negative samples.” These are QC samples that did not generate any customer complaints. If
These 20 features were used as predictors during the training of machine-learning algorithms, whose goal was to classify (predict) the qualitative nature of problem represented by each call area. Two such algorithms were used here: a simple one, based on decision trees [
Decision trees represent one of the simplest type of classifier, with Classification and Regression Trees (CART) being one of the most basic algorithms. We employed the algorithm implemented in the tree library [
Over the 99 call areas employed so far, some are not directly related to QC, and those related to QC might share some characteristics. Both issues can create some noise, which can easily be filtered out of the data. We therefore created two filters, one that removes all non-QC related call areas (essentially, all error codes starting with a “Z” in
List of the features used in the predictive modeling. Note that a “cutoff” represents the time when a customer calls in the case of “positive samples” (when there is an actual complaint), or the time drawn at random in the case of “negative samples” (see Methods).
Feature name | Definition |
MostRecentConcentration | Assay concentration reading just before cutoff |
TwoMostRecentConcentrationMean | Mean concentration for the two readings before cutoff |
FiveMostRecentConcentrationMean | Mean concentration for the five readings before cutoff |
TenMostRecentConcentrationMean | Mean concentration for the ten readings before cutoff |
TwoMostRecentConcentrationSD | SD of concentration for the two readings before cutoff |
FiveMostRecentConcentrationSD | SD of concentration for the five readings before cutoff |
TenMostRecentConcentrationSD | SD of concentration for the ten readings before cutoff |
NbPriorSGenChange | Number of S Gen changes before cutoff (since start of QC sample) |
NbPriorSLotChange | Number of S Lot changes before cutoff |
NbPriorERFLotChange | Number of ERF Lot changes before cutoff |
NbPriorIWFLotChange | Number of IWF Lot changes before cutoff |
NbPriorContLotNumChange | Number of Control Lot Number changes before cutoff |
NbPriorCalCurveChange | Number of Calibration Curve changes before cutoff |
TimeSinceLastSGenChange | Time elapsed since last S Gen change before cutoff |
TimeSinceLastSLotChange | Time elapsed since last S Lot change before cutoff |
TimeSinceLastERFLotChange | Time elapsed since last ERF Lot change before cutoff |
TimeSinceLastIWFLotChange | Time elapsed since last IWF Lot change before cutoff |
TimeSinceLastContLotNumChange | Time elapsed since last Control Lot Number change before cutoff |
TimeSinceLastCalCurveChange | Time elapsed since last Calibration Curve change before cutoff |
TimeToComplain | Time elapsed since last e-Connectivity log before cutoff |
To predict which call areas are used when a customer complains only using QC data (
Adaptive boosting computes a measure of importance for each feature.
The previous results included time to complain as a feature; again, this is the time lag between the last QC reading by the system and the time when a customer placed a complaint (
Note, however, that this removal of the most important feature did not affect the relative importance of the other features: those involved in the timing of maintenance events and those describing the variability of concentrations (SDs) were still the most important predictors (
Empirical cumulative distribution function (ECDF) of customer complaints. The ECDF was plotted for the five assays considered. The horizontal gray bars represent the first, second, and third quartiles. Each assay is color-coded as shown (inset).
The results above suggest that the rate of complaint may affect performance. But it is unclear if longer training periods can benefit the performance of our algorithms. To test this, we subset the 90-day data to its last 45 days. When all the features were used to train the algorithms, all classification error rates decreased (
In an attempt to denoise the customer data, we first removed non-QC related complaints and trained our classifiers on both the 90- and the 45-day datasets. This led to decreased error rates over all five assays (
In this case, where data are denoised by binning and by only considering QC-only data, the most important features for the classifier based on adaptive boosting remain TimeToComplain for both the 90- and the 45-day datasets (
Distribution of prediction error rates for the binned quality check (QC)–only data. Error rates are shown as derived from the cross-validation analyses, where the data were split 2500 times (see Methods). Results are shown for both classifiers, Classification and Regression Trees (CART; broken lines) and adaptive boosting (solid lines), over the five assays considered for the 90-day data with all features (a) or with TimeToComplain removed (b) and likewise for the 45-day data with (c) or not (d) all features. Each assay is color-coded as shown.
Feature importance under adaptive boosting for the binned quality control (QC)–only data. Importance of the features are shown as radar charts, over the five assays considered. Each assay is color-coded as shown. Top panels are for the whole 90-day datasets, whereas the bottom panels are for the 45-day datasets. Left panels include all feature; right panels exclude TimeToComplain from the models.
Traditionally, failure prediction in industrial applications aims at predicting
One of our challenges here is that a complaint is a symptom of an actual product issue. When an issue occurs, the customer may complain, or not. The customer may wait to have several incidences of same issue before complaining, or may choose not to complain because he or she is busy or stopped complaining when it is a recurrent problem. It is also possible that a customer complains when there is no product-related issue. As a result, the complaint database that we used is intrinsically noisy, but (1) This database represents the best data available and (2) The manufacturer’s goal is to improve customer satisfaction by being able to identify issues before (or even without) a complaint call is placed.
To achieve this goal, we resorted to machine learning. As in any machine-learning application—except maybe with some deep-learning applications as those trained directly on images at the pixel level [
Some additional questions and limitations remain, however. First, we extracted data for a period of 90 days and showed that the length of this period could affect performances. Indeed, shorter training periods seem to improve prediction performances when complaint rate is high. If complaint volume does affect performance, the length of the period used for analytics should be optimized in real time. This point was not addressed here and will require further investigation, in particular, to better understand the link between the volume of customer complaints for specific call areas, the features that become the most important, and how prediction performances are affected (
In the future, a more agnostic approach with respect to feature definition may be required: indeed, the features that are based on concentration readings all depend, to some extent, on the exact time when a customer complained. This time is unknown when performing real-time analytics. To circumvent this limitation, it might be better to implement a sliding window, defined over a time period
Although the approach we described will require further validation and testing, the ultimate goal is to implement this kind of predictive tool into the global monitoring system of IVD analyzers to help manufacturers be more proactive in detecting quality issues of the various assays they marketed around the world. This may help them pinpoint where in the manufacturing process issues are likely to originate—eg, if only a particular lot number is globally generating the same call area, a manufacturing problem specific to this lot can be identified. As such, we might one day be able to develop
In the meantime, the US FDA is planning to use big data to guide regulatory decisions [
Detailed description of the e-Connectivity features.
Detailed description of the customer features.
Description of the 99 error codes reported by the analyzers over the five assays.
Distribution of mean concentration reading per sample for the same assay. For each sample in the e-Connectivity data, the mean of all concentration readings was taken, and their distribution over the entire e-Connectivity 90-day data set was plotted. This distribution is multimodal; modes were estimated and are shown as vertical red dotted lines.
Distribution of prediction error rates for the unfiltered customer data. Error rates are shown as derived from the cross-validation analyses, where the data were split 2500 times (see Methods). Results are shown for both classifiers, CART (broken lines) and adaptive boosting (solid lines), over the five assays considered, for the 90-day data with all features (a) or with TimeToComplain removed (b), and likewise for the 45-day data with (c) or not (d) all features. Each assay is color-coded as shown.
Distribution of call areas for each assay. Distributions are shown for the whole 90-day data sets (a) and the 45-day data set (b). Each assay is color-coded as shown. Non-QC related call areas were filtered out.
Feature importance under adaptive boosting for the unfiltered customer data. Importance of the features are shown as radar charts, over the five assays considered. Each assay is color-coded as shown. Top panels are for the whole 90-day data sets, while the bottom panels are for the 45-day data sets. Left panels include all feature, right panels exclude TimeToComplain from the models.
Distribution of prediction error rates for the QC-only customer data. Error rates are shown as derived from the cross-validation analyses, where the data were split 2500 times (see Methods). Results are shown for both classifiers, CART (broken lines) and adaptive boosting (solid lines), over the five assays considered, for the 90-day data with all features (a) or with TimeToComplain removed (b), and likewise for the 45-day data with (c) or not (d) all features. Each assay is color-coded as shown.
Feature importance under adaptive boosting for the QC-only customer data. Importance of the features are shown as radar charts, over the five assays considered. Each assay is color-coded as shown. Top panels are for the whole 90-day data sets, while the bottom panels are for the 45-day data sets. Left panels include all feature, right panels exclude TimeToComplain from the models.
Examples of confusion tables obtained during cross-validation on the 90-day data, filtered for quality control (QC)–only call areas (data not binned by QC level). Numbers on the diagonal show accurate predictions; false predictions are below the diagonal, whereas missed predictions are above.
Classification and Regression Trees
electronic health record
Food and Drug Administration
in vitro diagnostic
quality control
root cause investigation
The authors would like to thank the Center for Advanced Computing and Compute Ontario for providing them with computing time; Jennifer Paine, Ian Wells, and the Safety Risk Management and Surveillance (SRMS) team for their support; Tom Balland and Craig Ritson for their help accessing the data; and Greg Munro, Jeanette Owejan, and Mike Torpey for discussions, as well as two anonymous reviewers for providing them with constructive comments. This work was supported by the Natural Sciences Research Council of Canada (SAB) and was part of Ortho Clinical Diagnostics SRMS project #2016-0825 (SAB, JK, LL, HL). This work was completed while SAB was being hosted by Yutaka Watanuki, at the University of Hokkaido in Hakodate, thanks to an Invitational Fellowship from the Japanese Society for the Promotion of Science.
None declared.