This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
Accurately predicting refractive error in children is crucial for detecting amblyopia, which can lead to permanent visual impairment, but is potentially curable if detected early. Various tools have been adopted to more easily screen a large number of patients for amblyopia risk.
For efficient screening, easy access to screening tools and an accurate prediction algorithm are the most important factors. In this study, we developed an automated deep learning–based system to predict the range of refractive error in children (mean age 4.32 years, SD 1.87 years) using 305 eccentric photorefraction images captured with a smartphone.
Photorefraction images were divided into seven classes according to their spherical values as measured by cycloplegic refraction.
The trained deep learning model had an overall accuracy of 81.6%, with the following accuracies for each refractive error class: 80.0% for ≤−5.0 diopters (D), 77.8% for >−5.0 D and ≤−3.0 D, 82.0% for >−3.0 D and ≤−0.5 D, 83.3% for >−0.5 D and <+0.5 D, 82.8% for ≥+0.5 D and <+3.0 D, 79.3% for ≥+3.0 D and <+5.0 D, and 75.0% for ≥+5.0 D. These results indicate that our deep learning–based system performed sufficiently accurately.
This study demonstrated the potential of precise smartphone-based prediction systems for refractive error using deep learning and further yielded a robust collection of pediatric photorefraction images.
Amblyopia is the most common cause of permanent visual impairment in children, and its worldwide prevalence is estimated to be approximately 1.6%-5% [
Cycloplegic retinoscopic refraction is the standard technique for measuring refractive error. However, this method has some limitations. It is difficult to get young children to cooperate during the procedure, and advanced clinical ophthalmologic training is required to perform the test (user dependent) [
Previously, autorefractors were developed for faster and easier refraction in children. However, autorefraction presents several difficulties, including maintaining the proper position for testing and maintaining visual fixation on the target for a sufficient duration [
The purpose of this study was to develop an automated deep learning–based prediction system for refractive error using eccentric photorefraction images of pediatric patients captured by a smartphone. We trained our deep convolutional neural network with photorefraction images to identify various refractive error ranges. Thereafter, we comparatively evaluated its performance on our network with conventional cycloplegic retinoscopic refraction.
This study was performed at a single center according to the tenets of the Declaration of Helsinki. The Institutional Review Board of Samsung Medical Center (Seoul, Republic of Korea) approved this study (SMC 2017-11-114).
Patients aged 6 months to 8 years who visited the outpatient clinic for a routine ocular examination were requested to participate in this study. Written informed consent was provided by parents prior to participation. All screening tests were conducted at Samsung Medical Center between June and September 2018. The exclusion criteria were diseases that could affect light reflection, such as congenital cataracts and corneal opacity, diseases involving visual pathways or extraocular muscles, a medical history of previous ophthalmic surgery (eg, strabismus, congenital cataract, and congenital glaucoma), limited cycloplegia, and poor cooperation during study activities.
A total of 305 photorefraction images (191 images from 101 girls and 114 images from 63 boys) were obtained (mean age 4.32 years, SD 1.87 years). All patients underwent a complete ophthalmologic examination, including visual acuity, motility evaluation, and anterior segment evaluation. Eccentric photorefraction images were obtained using a smartphone with a 16-megapixel camera (LGM-X800K; LG Electronics Inc, Seoul, Korea) at a 1-meter distance from the front of the patient in a dark room (<15 lux). The smartphone was placed straight forward to the face of the children without angulation. All photorefraction images were acquired in the same setting (in a dark room and before the cycloplegic procedure). The smartphone’s built-in flash, present next to the camera lens, was used as the light source for eccentric photorefraction, wherein light was refracted and reached the retinal surface and was then magnified and reflected. When optimal reflection was achieved, a characteristic crescent-shaped reflection appeared in the eye. A photograph of the crescent reflection was captured through LED control [
The acquired eccentric photorefraction images were divided into the following seven classes according to the spherical values measured by cycloplegic refraction: ≤−5.0 diopter (D), >−5.0 D and ≤−3.0 D, >−3.0 D and ≤−0.5 D, >−0.5 D and <+0.5 D, ≥+0.5 D and <+3.0 D, ≥+3.0 D and <+5.0 D, and ≥+5.0 D. The cutoff values of the seven classes for refractive errors were determined clinically. Among myopic refraction (minus values), −5.0 D, −3.0 D, and −0.5 D were considered as thresholds of high, moderate, and mild myopia, respectively. In other words, refractive errors ≤−5.0 D indicated high myopia, refractive errors >−5.0 D and ≤−3.0 D indicated moderate myopia, and refractive errors >−3.0 D and ≤−0.5 D indicated mild myopia. Similarly, +0.5 D, +3.0 D, and +5.0 D were thresholds of mild, moderate, and high hyperopia, respectively, among plus values.
Photorefraction images were processed for training our deep convolutional neural network. Initially, the images were cropped to capture the pupil. The images were resized to 224×224 pixels, and the pixel values were scaled from 0 to 1. To overcome an overfitting issue caused by an insufficiently sized training dataset, data augmentation was performed by altering brightness, saturation, hue, and contrast; adding Gaussian noise; and blurring images using Gaussian kernels. Thereafter, the image pixel values were normalized by subtracting the mean and dividing by the SD to ensure that each image had a similar data distribution and would converge faster during the training procedure.
For training, validation, and testing, we used the five-fold cross-validation approach to build a reliable deep learning model with a limited dataset. Initially, all the data were subdivided into five equal-sized folds with the same proportion of different classes in each fold. Four of the five folds were for training and validation (3.5 folds for training and 0.5 folds for validation), and one fold was for testing. After five repetitions of this process, we were able to evaluate the performance of the entire dataset because the test folds were independent of each other, and we confirmed the stability of our model for the entire dataset using the confusion matrix.
We used a deep convolutional neural network to classify photorefraction images into the most probable class of refractive error. Among the various types of convolutional neural networks, we developed Residual Network (ResNet-18) [
Because we did not have a sufficiently large training dataset, we performed transfer learning to capture low-level features, such as edge and color, without wasting image data [
Overview of the proposed deep convolutional neural network architecture. The photorefraction image inputs pass through 17 convolutional layers and one fully connected layer, and the outputs of the network assign the probabilities for each refractive error class given the image. We also generate the localization map highlighting the important regions from the final convolutional feature maps of the layer i (i=1, 2, 3, or 4).
Structure of the basic block and the shortcut connection. The basic block consists of two 3×3 convolutional layers, two Batch Normalization layers, and a Rectified Linear Unit (ReLU) activation function. The shortcut connection adds the input vector of the basic block to the output of the basic block.
Configuration of the deep convolutional network.
Layer type, feature map | Filters | Kernel | Stride | Padding | Learning rate | |||
|
||||||||
|
224×224×3 | —a | — | — | — | 0.0 (freeze) | ||
|
||||||||
|
112×112×64 | 64 | 7×7×3 | 2 | 3 | 0.0 (freeze) | ||
|
||||||||
|
112×112×64 | — | — | — | — | 0.0 (freeze) | ||
|
||||||||
|
56×56×64 | 1 | 3×3 | 2 | 1 | 0.0 (freeze) | ||
|
||||||||
|
|
|||||||
|
|
56×56×64 | 64 | 3×3×64 | 1 | 1 | 0.0 (freeze) | |
56×56×64 | 64 | 3×3×64 | 1 | 1 | 0.0 (freeze) | |||
|
|
|||||||
|
56×56×64 | 64 | 3×3×64 | 1 | 1 | 0.0 (freeze) | ||
56×56×64 | 64 | 3×3×64 | 1 | 1 | 0.0 (freeze) | |||
|
||||||||
|
|
|||||||
|
|
28×28×128 | 128 | 3×3×64 | 2 | 1 | 1e-10 | |
28×28×128 | 128 | 3×3×128 | 1 | 1 | 1e-10 | |||
28×28×128 | 128 | 1×1×64 | 2 | 0 | 1e-10 | |||
|
|
|||||||
|
28×28×128 | 128 | 3×3×128 | 1 | 1 | 1e-10 | ||
28×28×128 | 128 | 3×3×128 | 1 | 1 | 1e-10 | |||
|
||||||||
|
|
|||||||
|
|
14×14×256 | 256 | 3×3×128 | 2 | 1 | 1e-8 | |
14×14×256 | 256 | 3×3×256 | 1 | 1 | 1e-8 | |||
14×14×256 | 256 | 1×1×128 | 2 | 0 | 1e-8 | |||
|
||||||||
|
14×14×256 | 256 | 3×3×256 | 1 | 1 | 1e-8 | ||
14×14×256 | 256 | 3×3×256 | 1 | 1 | 1e-8 | |||
|
||||||||
|
|
|||||||
|
|
7×7×512 | 512 | 3×3×256 | 2 | 1 | 1e-6 | |
7×7×512 | 512 | 3×3×512 | 1 | 1 | 1e-6 | |||
7×7×512 | 512 | 1×1×64 | 2 | 0 | 1e-6 | |||
|
|
|||||||
|
7×7×512 | 512 | 3×3×512 | 1 | 1 | 1e-6 | ||
7×7×512 | 512 | 3×3×512 | 1 | 1 | 1e-6 | |||
|
||||||||
|
1×1×512 | 1 | 7×7 | 7 | 0 | — | ||
|
||||||||
|
1×7 | — | — | — | — | 1e-5 | ||
|
||||||||
|
1×7 | — | — | — | — | — |
aNot applicable.
A total of 305 photorefraction images from 191 girls and 114 boys were acquired. The mean age was 4.32 years (SD 1.87 years), and the median age was 4 years (range 0-8 years). The mean spherical equivalent was 0.13 D (SD 2.27 D; range −5.50 to 6.75 D), and the mean astigmatism was −1.50 D (SD 1.38 D; range −6.50 to 0 D), according to cycloplegic refraction.
According to cycloplegic refraction results, 25 photorefraction images had a refractive error ≤−5.0 D, 18 had an error >−5.0 D and ≤−3.0 D, 50 had an error >−3.0 D and ≤−0.5 D, 84 had an error >−0.5 D and <+0.5 D, 87 had an error ≥+0.5 D and <+3.0 D, 29 had an error ≥+3.0 D and <+5.0 D, and 12 had an error ≥+5.0 D.
Examples of photorefraction images from the seven different refractor error classes. A bright crescent appears in the pupillary reflex, and its size and shape indicate the diopter (D) value.
Dataset participant demographics.
Characteristic | Value | |
Total images, n | 305 | |
|
|
|
|
≤−5.0 Da | 25 |
|
>−5.0 D and ≤−3.0 D | 18 |
|
>−3.0 D and ≤−0.5 D | 50 |
|
>−0.5 D and <+0.5 D | 84 |
|
≥+0.5 D and <+3.0 D | 87 |
|
≥+3.0 D and <+5.0 D | 29 |
|
≥+5.0 D | 12 |
Girls, n (%) | 191 (62.6) | |
Age, mean (SD) | 4.32 (1.87) |
aD: diopters.
We used five-fold cross-validation to evaluate our network’s performance. Training, validation, and testing were independently iterated five times. In each iteration, there were 213 training images, 31 validation images, and 61 testing images. We chose the network with the highest validation accuracy when loss of training was saturated. Thereafter, we measured the classification accuracy of the network in the test fold. All five networks, which were established in the training phase, had an accuracy of more than 80% for each validation set. Similarly, the performances of the five testing folds were 83.6%, 80.3%, 82.0%, 78.7%, and 83.6% (
Results for five-fold cross-validation.
Iterationa | Validation accuracy (%) (N=31) | Test accuracy (%) (N=61) |
First iteration | 87.1 | 83.6 |
Second iteration | 80.6 | 80.3 |
Third iteration | 80.6 | 82.0 |
Fourth iteration | 83.9 | 78.7 |
Fifth iteration | 83.9 | 83.6 |
Average | 83.2 | 81.6 |
aIn each iteration, our network was trained using the rest of the validation and test dataset (213 training images).
In the five-fold test, our network had the following accuracies: 80.0% for class ≤−5.0 D, 77.8% for class >−5.0 D and ≤−3.0 D, 82.0% for class >−3.0 D and ≤−0.5 D, 83.3% for class >−0.5 D and <+0.5 D, 82.8% for class ≥+0.5 D and <+3.0 D, 79.3% for class ≥+3.0 D and <+5.0 D, and 75% for class ≥+5.0 D (
In addition, our network maintained the stability of prediction for refractive error, as shown in the confusion matrix (
Performance of our deep convolutional neural network with the overall test dataset.
Class | Number | Accuracy (%) |
≤−5.0 Da | 25 | 80.0 |
>−5.0 D and ≤−3.0 D | 18 | 77.8 |
>−3.0 D and ≤−0.5 D | 50 | 82.0 |
>−0.5 D and <+0.5 D | 84 | 83.3 |
≥+0.5 D and <+3.0 D | 87 | 82.8 |
≥+3.0 D and <+5.0 D | 29 | 79.3 |
≥+5.0 D | 12 | 75.0 |
Total | 305 | 81.6 |
aD: diopter.
For performance comparison, we developed the following five baseline models and calculated the performances: (1) pretrained VGG-11 [
Confusion matrix for refractive error classification of our deep convolutional neural network.
Ground truth | Predictive value | Accuracy (%) | |||||||
≤−5.0 Da | >−5.0 D and ≤−3.0 D | >−3.0 D and ≤−0.5 D | >−0.5 D and <+0.5 D | ≥+0.5 D and <+3.0 D | ≥+3.0 D and <+5.0 D | ≥+5.0 D | |||
≤−5.0 D | 20b | 3 | 2 | 0 | 0 | 0 | 0 | 80.0 | |
>−5.0 D and ≤−3.0 D | 1 | 14b | 2 | 0 | 1 | 0 | 0 | 77.8 | |
>−3.0 D and ≤−0.5 D | 1 | 4 | 41b | 4 | 0 | 0 | 0 | 82.0 | |
>−0.5 D and <+0.5 D | 0 | 0 | 5 | 70b | 8 | 1 | 0 | 83.3 | |
≥+0.5 D and <+3.0 D | 0 | 0 | 1 | 10 | 72b | 4 | 0 | 82.8 | |
≥+3.0 D and <+5.0 D | 0 | 0 | 0 | 1 | 4 | 23b | 1 | 79.3 | |
≥+5.0 D | 0 | 0 | 0 | 0 | 1 | 2 | 9b | 75.0 | |
Overall accuracy (%) | —c | — | — | — | — | — | — | 81.6 |
aD: diopter.
bNumber of correct predictions of our deep convolutional neural network.
cNot applicable.
Performance comparison of the proposed model and baseline models.
Model | Accuracy (%) |
The proposed deep convolutional neural network | 81.6 |
Pretrained VGG-11 | 70.8 |
Pretrained SqueezeNet | 77.4 |
Support Vector Machine | 65.2 |
Random Forest | 62.9 |
Simple convolutional neural network | 70.8 |
Additionally, we produced heatmaps using gradient-weighted class activation mapping (Grad-CAM) [
Examples of photorefraction images correctly classified by deep neural networks. (A), (B), (C) were identified as ≥+0.5 D and <+3.0 D, ≥+3.0 D and <+5.0 D, and ≥+5.0 D, respectively. The first layers captured low-level features, such as edge and color. With deeper layers, the network focused on high-level features that were regarded as important aspects for classification.
The primary purpose of refractive error screening is the early detection of a refractive error to allow interventions that can reduce the risk of amblyopia. Early detection and treatment of refractive error can lead to better visual outcomes and reduce the prevalence and severity of amblyopia in children [
Several studies have compared the accuracy of photoscreeners for detecting various amblyopia risk factors [
This study compared refractive error estimation with precycloplegic photorefraction images and cycloplegic refraction. The results showed consistent measurements between the two methods. Dubious results regarding estimation of refractive error using photorefractors have been uncovered by previous studies [
This study has several limitations. First, manifest refraction was not performed in all subjects. Since photorefractive refraction tests were performed without the use of a cycloplegic agent, useful information might have been obtained if the number of manifest refraction results without cycloplegia were enough to compare with photorefraction data in the same patient. Second, the number of photorefraction images was relatively small and the model could only predict a range of refractive errors (not a specific value). Third, all children involved in the study were Korean. Thus, a trained model using the eyes of Korean children may not be applicable to the eyes of pediatric patients having different ethnicities [
In conclusion, this study showed that our deep learning–based system successfully yielded accurate and precise refractive measurements. This further demonstrates the potential for developing simplified smartphone-based prediction systems for refractive error using deep learning with large-scale collection of pediatric photorefraction images from patients with various ages and refractive errors.
American Academy of Pediatric Ophthalmology and Strabismus
Support Vector Machine
This research was supported by a National Research Foundation of Korea grant funded by the Government of Korea’s Ministry of Education (NRF-2018R1D1A1A02045884; Seoul, Korea), which was received by Dong Hui Lim, and a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI19C0577), which was received by Dong Hui Lim.
DHL designed the study. JC, YK, KYS, DHL, and K-AP analyzed and interpreted the clinical data. JC and YK wrote the submitted manuscript draft. TYC, SYO, SHH, DHL, and KAP reviewed the design, the results, and the submitted draft. JC and YK contributed equally to the work as cofirst authors. DHL and KAP are the corresponding authors for this study. All authors read and approved the final manuscript.
None declared.