Published on in Vol 9, No 6 (2021): June

Preprints (earlier versions) of this paper are available at, first published .
Informing Developmental Milestone Achievement for Children With Autism: Machine Learning Approach

Informing Developmental Milestone Achievement for Children With Autism: Machine Learning Approach

Informing Developmental Milestone Achievement for Children With Autism: Machine Learning Approach

Original Paper

1R.B. Annis School of Engineering, University of Indianapolis, Indianapolis, IN, United States

2Ubicomp Lab, Department of Computer Science, Marquette University, Milwaukee, WI, United States

3College of Health and Human Sciences, Purdue University, West Lafayette, IN, United States

4Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI, United States

5Telepsychiatry Research and Innovation Network Ltd, Dhaka, Bangladesh

6Department of Computer Science, University of Toronto, Toronto, ON, Canada

*all authors contributed equally

Corresponding Author:

Masud Rabbani, BSc

Ubicomp Lab, Department of Computer Science

Marquette University

1422 W Kilbourn Ave


Milwaukee, WI, 53233-1784

United States

Phone: 1 4143267769


Background: Care for children with autism spectrum disorder (ASD) can be challenging for families and medical care systems. This is especially true in low- and- middle-income countries such as Bangladesh. To improve family–practitioner communication and developmental monitoring of children with ASD, mCARE (Mobile-Based Care for Children with Autism Spectrum Disorder Using Remote Experience Sampling Method) was developed. Within this study, mCARE was used to track child milestone achievement and family sociodemographic assets to inform mCARE feasibility/scalability and family asset–informed practitioner recommendations.

Objective: The objectives of this paper are threefold. First, it documents how mCARE can be used to monitor child milestone achievement. Second, it demonstrates how advanced machine learning models can inform our understanding of milestone achievement in children with ASD. Third, it describes family/child sociodemographic factors that are associated with earlier milestone achievement in children with ASD (across 5 machine learning models).

Methods: Using mCARE-collected data, this study assessed milestone achievement in 300 children with ASD from Bangladesh. In this study, we used 4 supervised machine learning algorithms (decision tree, logistic regression, K-nearest neighbor [KNN], and artificial neural network [ANN]) and 1 unsupervised machine learning algorithm (K-means clustering) to build models of milestone achievement based on family/child sociodemographic details. For analyses, the sample was randomly divided in half to train the machine learning models and then their accuracy was estimated based on the other half of the sample. Each model was specified for the following milestones: Brushes teeth, Asks to use the toilet, Urinates in the toilet or potty, and Buttons large buttons.

Results: This study aimed to find a suitable machine learning algorithm for milestone prediction/achievement for children with ASD using family/child sociodemographic characteristics. For Brushes teeth, the 3 supervised machine learning models met or exceeded an accuracy of 95% with logistic regression, KNN, and ANN as the most robust sociodemographic predictors. For Asks to use toilet, 84.00% accuracy was achieved with the KNN and ANN models. For these models, the family sociodemographic predictors of “family expenditure” and “parents’ age” accounted for most of the model variability. The last 2 parameters, Urinates in toilet or potty and Buttons large buttons, had an accuracy of 91.00% and 76.00%, respectively, in ANN. Overall, the ANN had a higher accuracy (above ~80% on average) among the other algorithms for all the parameters. Across the models and milestones, “family expenditure,” “family size/type,” “living places,” and “parent’s age and occupation” were the most influential family/child sociodemographic factors.

Conclusions: mCARE was successfully deployed in a low- and middle-income country (ie, Bangladesh), providing parents and care practitioners a mechanism to share detailed information on child milestones achievement. Using advanced modeling techniques this study demonstrates how family/child sociodemographic elements can inform child milestone achievement. Specifically, families with fewer sociodemographic resources reported later milestone attainment. Developmental science theories highlight how family/systems can directly influence child development and this study provides a clear link between family resources and child developmental progress. Clinical implications for this work could include supporting the larger family system to improve child milestone achievement.

JMIR Med Inform 2021;9(6):e29242




Autism spectrum disorder (ASD) is a global problem [1] and a heterogeneous neurodevelopmental disorder [2]. In 1943, Kanner [3] first described this disorder in children’s behavior [3]. In this neurodevelopmental disorder, children have social communication issues, repetitive behaviors, restrictive interests, and professional impairments throughout their lifespan [4,5]. In developed countries, 1%-1.5% of children have ASD [4], whereas in the United States, 1 out of 54 children have ASD [6,7]. Although it is the fastest growing developmental disorder, the number of individuals affected globally remains largely unknown [8]. In low- and middle-income countries, this rate is estimated to vary between 0.15% and 0.8%, whereas in a developing country such as Bangladesh this rate is reported to be 3% [9-11]. ASD symptoms gradually show up before 1 year of age, with nearly 80% of problems being identified by 2 years of age [12,13]. In particular, boys are affected 3 to 4 times more than girls with ASD [14]. Unfortunately, nearly 46% of children with ASD do not receive the proper treatment following diagnosis [8].

Medically, early identification and diagnosis of ASD will improve positive functional outcomes in later life for these children [15-18]. As a result, in 2000, the American Academy of Neurology and Child Neurology recommended to screen every child for ASD [14,19-21]. In other words, a reliable ASD diagnosis should be performed in children before 24 months of age [19], as this substantially improves the opportunities for recovery and also reduces the burden on caregivers (diagnostic odyssey) [16]. The major barriers to making improvements in ASD diagnosis and treatment are lack of proper knowledge about ASD, lack of motivation and patience of parents or caregivers, and delayed identification and diagnosis of ASD. Early identification and diagnosis help the care practitioners to make evidence-based decisions during intervention, which has both positive and long-term outcomes on the improvement of patients with ASD [5,22]. Physical therapy or exercise is much more important than medicine in the development of many patients with ASD, and in such cases early intervention can play an important role [23-26].

Besides the early identification and diagnosis, parents’ or caregivers’ demography, social or environmental demography, race, and ethnicity can play a vital role in the developmental process of children with ASD [15,27-31]. Concerning parents’ demography, educational level, occupation, family income and expenditures, number of siblings, and living area remain very important factors in the development of children with ASD [27-30]. Environmental factors such as the socioeconomic condition, neighborhood, and society’s attitudes toward children with ASD are very significant [12,13]. Although genes increasing the risk for ASD in children are mostly prenatal [32], demography of parents remains very important [33], as it can affect the improvement of patients with ASD. In this study, we will use the parents’, environmental, and social demography as a parameter to develop a machine learning model for predicting the improvement level of milestone parameters in children with ASD.

Based on the demography, machine learning models can predict the milestone parameters in children with ASD during their early intervention period. In this study, we have used 10 important demographic information in 4 supervised machine learning models to predict the improvement level of “daily living skills.” In the “Decision Tree” [34] machine learning algorithm, we have used the “Classification Trees” category to build the predictive model. To build a statistical model for our binary dependent variable, we deployed “Logistic Regression” with the sigmoid function [35,36] as the logistic function. We then deployed our preprocessed data sets in the K-nearest neighbor (KNN) algorithm using the “Euclidean distance” [37] to find the nearest neighbor. In the end, we used an artificial neural network (ANN) to build our last predictive model. In ANN, we have used “relu” as the hidden layer’s activation function, and “sigmoid” as the output layer’s activation function.

Prior Work

In our previous work (Mobile-Based Care for Children with Autism Spectrum Disorder Using Remote Experience Sampling Method [mCARE]), we developed a mobile-based system to regularly monitor children with ASD with the help of caregivers in Bangladesh. In mCARE, we deployed a remote experience sampling method to monitor the milestone and behavioral parameters. These longitudinal data can be used in the intervention process, where the care practitioners can make evidence-based decisions based on the data. This tool was very effective in the development process of children with ASD; using this tool, the caregiver and care practitioner can observe the improvement level over a certain period on a graphical view. This tool not only assists the care practitioners but also motivates the caregivers. Besides, this tool has some renowned applications and studies to assist with the ASD diagnosis process in different phases [2,15,19,38,39]. While most studies have been performed for the early identification or recognition of ASD [16,19,22,40-44], little work has been done so far on the prediction of improvement level of ASD parameters or the timeframe for a certain level of improvement, or on the factors that need to be improved. In this study, we developed a relationship between the parents’ demography and the improvement in ASD milestone parameters by deploying a real data set of the mCARE system.

Goal of This Study

Demographic data such as family income, living place, facilities, parents’ age, education and occupation, family types, and number of siblings affect parental stress and psychology [45]. This parental stress and psychological stress definitely impact the mental development of children with ASD, especially “daily living skills” [46-48]. For this reason, cognitive behavioral therapy is very effective on the daily living skill development of children with ASD [49]. In this study, our main goal was to predict the improvement in an ASD milestone parameter (ie, daily living skills) using a machine learning algorithm based on demographic data of caregivers. To achieve our goal, first, we measure the improvement level of the milestone in children with ASD from the mCARE tools. Second, we will deploy an mCARE data set in 5 supervised and 1 unsupervised machine learning algorithm to build the best milestone parameters improvement prediction model. Finally, we will describe the importance of the caregiver-specific demography in predicting the improvement level of certain milestone parameters in children with ASD.


mCARE is a mobile-based app for monitoring the milestone and behavioral parameters of children with ASD regularly and remotely. This project was awarded by the National Institutes of Health (NIH) [50] and has been implemented in Bangladesh for 2 years. For this study, we used data from the mCARE study, which was approved by the Institutional Review Board of the Marquette University on July 9, 2020 (protocol number HR-1803022959). The mCARE study recruited 316 participants, of which we recruited 300 for this study. We deployed the remote experience sampling method to collect data on children with ASD, which was achieved by their caregiver using a smartphone app or an SMS text message. This mobile-based app has significance in the mental health intervention process, where by using the mCARE: Data Management Portal (mCARE: DMP), a caregiver can observe the longitudinal behavioral or milestone data graphically for a certain period. This feature helped the caregiver to make evidence-based decisions in the intervention process. In this study, we will first measure the improvement level of the “test group” participants based on milestone parameters. Using the test group data set, we will build the machine learning–based prediction model for a specific milestone parameter. We will use the test group patients’ demography for constructing the prediction model. Figure 1 summarizes the research design in a simple flowchart.

Figure 1. Outline of research design.
View this figure

Data Collection and Selection Phase

Following approval from the Marquette University Institutional Review Board (Protocol number HR-1803022959), the mCARE project recruited a total of 300 children with ASD (aged 2-9) from Bangladesh. We incorporated diversity in terms of age, sex, ASD severity, and family socioeconomic recourses. We divided the whole sample population into 2 groups: (1) the test group and (2) the control group. Patients in the test group were intervened and monitored regularly, whereas those in the control group were monitored over a certain period. Data from the control group and the test group were compared. This study took place in 4 major institutes of Bangladesh located in 2 geographical locations (Dhaka and Chittagong). We collaborated with 2 government organizations for ASD treatment and research, namely, The National Institute of Mental Health (NIMH) [51] and The Institute of Pediatric Neuro-disorder & Autism (IPNA) [52], to recruit 100 caregivers of children with ASD from each. The participants from each organization were divided into 2 groups: mCARE-APP (n=50) and mCARE-SMS (n=50). Each group was further divided equally into the test (n=25) and control (n=25) groups. Typically, in Bangladesh, families with low and high socioeconomic status receive treatment from public and private organizations, respectively. Therefore, to include participants from all socioeconomic classes, we included 2 private organizations, namely, Nishpap [53] and Autism Welfare Foundation (AWF) [54]. A total of 50 participants chosen from each of these schools were divided into the test group (n=25) and the control group (n=25) only for the mCARE-APP study group. The patient distribution among the 4 centers and the participant demography are presented in Tables 1 and 2, respectively.

Table 1. Patient distribution among the 4 centers.
SerialCenter namePatients distribution
Test group (n=150)Control group (n=150)
1The National Institute of Mental Health (NIMH)5050
2The Institute of Pediatric Neuro-disorder & Autism (IPNA)5050
3Autism Welfare Foundation (AWF)2525
4Nishpap Autism Foundation2525
Table 2. Demographic information of participants in the test group (n=150).
DemographicsmCARE: test group, n (%)
Age (years)

2-637 (24.7)
6-9113 (75.3)

Male124 (82.7)
Female26 (17.3)
Education of children

Never went to school34 (22.7)
Went to usual academic school but failed to continue study22 (14.7)
Went to specialized school but failed to continue study4 (2.7)
Currently he/she is going to usual academic school12 (8.0)
Currently he/she is going to specialized academic school78 (52.0)
Father’s education

Primary29 (19.3)
Secondary23 (15.3)
Undergraduate23 (15.3)
Graduate29 (19.3)
Postgraduate46 (30.7)
Mother’s education

Primary19 (12.7)
Secondary37 (24.7)
Undergraduate25 (16.7)
Graduate32 (21.3)
Postgraduate37 (24.7)
Student0.0 (0.0)
Unemployed4 (2.7)
Father’s occupation

Service70 (46.7)
Business45 (30.0)
Cultivation1 (0.7)
Other7 (4.7)
Unemployed23 (15.3)
Mother’s occupation

Student0.0 (0.0)
Unemployed0.0 (0.0)
Housewife124 (82.7)
Service17 (11.3)
Business4 (2.7)
Cultivation0 (0.0)
Maid1 (0.7)
Other1 (0.7)
Not applied3 (2.0)
Average family spending per month (in thousand Taka)a

<15 K19 (12.7)
15-30 K44 (29.3)
30-50 K31 (20.7)
>50 K56 (37.3)
Family type

Nuclear113 (75.3)
Extended37 (24.7)
Geographic location

Urban120 (80.0)
Semiurban15 (10.0)
Rural15 (10.0)
Slum0.0 (0.0)

aUS $1=84.77 Taka (as of March 18, 2021).

Demographic Information of the Participants in the “Test Group”

We collected demographic information about participants in the test group (n=150). In Table 2, we present in detail the demographic information of participants that took part in the mCARE study.

Measuring the Improvement Level Based on Milestone Parameters

In the mCARE project, there were 4 types of milestone for every test group patient. These were “daily living skills,” “communication,” “motor skills,” and “socialization.” Further, for every patient, based on his/her condition, the recruited care practitioner set different types of parameter from every milestone category. Table 3 lists the 4 types of parameters from each milestone group along with the participant numbers (n). Here the participant number (n) is different for different milestone parameters, as every participant did not have the same milestone parameter initially set by the care practitioner. At the beginning of this project, the care practitioners obtained the baseline information for every milestone parameter by screening the participant. Then, in the project timeline (2 years), the caregiver continuously updated the milestone parameter using the mCARE: APP or mCARE: SMS tool based on the child’s condition. At the end of the project, one can generate the participant’s end improvement level for different levels of their milestone parameters. By comparing the baseline milestone data with the end participant’s improvement data, we can calculate the improvement level (in percentage) for every milestone parameter (described in Table 3). In this table, besides the improvement level, we calculated the 95% CI for the validation of our results. As our sample size was 150, we used the Z value (1.96 for 95% CI) [55] for calculating the 95% CI using the following formula:

where is the mean, Z is 1.96 (chosen from the Z-value table [55]), S is the SD, and n is the average sample number.

Table 3. Improvement level of the test group (mCARE) on their milestone parameters.
Milestone type and parameter with total participants (n)Improvement level (%)95% CIa

Average sample (n)Lower-upper boundAverage improvement
Daily living skills

Asks to use toilet (n=106)61 (57.5)

Brushes teeth (n=140)113 (80.7)

Buttons large buttons in front, in correct buttonholes (n=109)70 (64.2)

Urinates in toilet or potty (n=113)84 (74.3)


Listens to a story for at least 15 minutes (n=101)35 (34.7)

Points to at least five body parts when asked (n=117)62 (52.9)

Says month and day of birthday when asked (n=116)42 (36.2)

Says own phone number when asked (n=23)12 (52.1)

Motor skills

Draws circle freehand while looking at an example (n=136)100 (73.5)

Glues or pastes 2 or more pieces together (n=130)87 (66.9)

Jumps with both feet off the floor (n=104)65 (62.5)

Runs smoothly without falling (n=119)82 (68.9)


Ends conversation appropriately (eg, “good bye” or “khoda hafez”) (n=14)4 (28.5)

Keeps comfortable distance between self and others in social situations (n=130)76 (58.4)

Talks with others about shared interests (eg, sports, TV shows, cartoons) (n=126)50 (39.6)

Uses words to express emotions (eg, “I am happy” or “I am scared”) (n=110)24 (21.8)

Data Set Selection

In the mCARE study, among the 4 categories (Table 3) of milestone parameters, the “daily living skills” showed the highest improvement level. In this study, we selected this category for building the prediction model based on the participant’s demography. In this milestone type, there are 4 different parameters: Asks to use toilet, Brushes teeth, Buttons large buttons in front, in correct buttonholes, and Urinates in toilet or potty. We took the demographic information for every participant who had these milestone parameters and created 4 data sets. In each data set, there were 18 features regarding the participant’s demographic (Multimedia Appendix 1) and 1 value for the “end improvement level” for each participant (this is the label value that will be used in supervised machine learning). We titled each data set by the name of the milestone parameter; for example, Asks to use toilet, which has 106 instances; Brushes teeth, which has 140 instances; Buttons large buttons in front, in correct buttonholes, which has 109 instances, and Urinates in toilet or potty, which has 113 instances. In the following sections, we describe the different machine learning models based on these 4 data sets.

Data Preprocessing

Before building the prediction model, we have preprocessed our data set into 3 steps. In the following section, we will describe these steps.

Data Cleaning and Feature Extraction

In the data cleaning step, we observed some missing data, especially with regard to age and salary, in our data sets. We handled this by replacing the empty cell with the mean value for that particular data set. In our data sets, out of 19 columns, only 6 had a numerical value, whereas others had a string input. Therefore, we created dummy variables for every column and converted the string input into a numerical input to handle this. For example, we categorized the column “gender” into 2 subcolumns, namely, “male” and “female.” The corresponding binary codes were set as “1” if the original input is male; otherwise “0.” By using a similar approach we set the female column. We could thus convert our whole data set into a numeric type by this feature extraction, but the problem is it increased the feature number to 48 from 19. Besides the feature extraction, we used the MinMaxScaler [56,57] to convert all of our features from the 0 to 1 range, as it increases the performance of the machine learning algorithm [58].

Feature Selection

To get the most important features, we first created an extended data set from the “daily living skills” parameter with 18 features. We have used 3 different feature selection methods (univariate selection [59,60], feature importance [59,61-63], and correlation matrix with heatmap [59,64]) with our domain knowledge to select the 10 most important features from the extended data sets. From univariate selection [60] and feature importance [61-63], we obtained 10 important features with their score from each approach (Multimedia Appendices 2 and 3, respectively). We also prepared an important correlation matrix (Multimedia Appendix 4) with heatmap [64] for the features. After computing the most important features with their scores from the 3 feature selection methods, we selected the 10 most important features using these results and our domain knowledge. These features were “family expenditure,” “mother age,” “father age,” “going to specialized school,” “number of siblings,” “housewife-mother,” “father in service,” “living in urban,” “nuclear family,” and “mother education level (undergraduate).” After that, we again split the extended data set into 4 data sets (ie, Brushes teeth; Buttons large buttons in front, in correct buttonholes; Urinates in toilet or potty; and Asks to use toilet) using only these 10 features and with the “end improvement level.” These feature-selected data sets are very important in machine learning algorithm to boost up model performance.

Exploring the Relationship and Associations Underlying the Data Set by Unsupervised Machine Learning: K-Means Clustering

To understand the relation of the 10 selected features (described in the “Feature Selection” section) with the improvement level of “daily living skills” of children with ASD, we implemented K-means clustering [65] to create clusters. As our improvement level is “0” and “1,” we have to describe the children’s improvement clusters by the “cluster centroid.” Figure 2 shows the 10 clusters for the 10 selected features in “daily living skills.” We have selected the cluster number (k) by using the “elbow method” [66]. All elbow graphs are shown in Multimedia Appendix 5. We also validated the cluster number by “Adjusted Random Index” [67].

From the cluster in Figure 2A, we can see that the improvement of children with ASD from high-income families is better than those from low-income families. Age of parents is an important factor in the development of children with ASD, as middle-aged mothers (from Figure 2B) and old-aged fathers (Figure 2C) can take better care of their children’s development. We also obtained similar types of clusters from Figure 2F and 2G, where occupation of parents plays a vital role in the development of their children with ASD. The number of siblings, living in the urban area, and family size (nuclear) are also important factors in our data set. From the clusters in Figure 2E, 2H, and 2I, we can see that small families with less siblings in the urban area can help improve the children in their “daily living skills.” Education levels of children with ASD, especially in specialized school, and their parent’s education, especially mother’s higher education, can also be helpful for their “daily living skills” development (Figure 2D and 2J).

From the explanation of the clusters in Figure 2, we can find the association between our selected feature and the development of children with ASD. Further, using these data, we can validate our main findings, which is described in detail in the “Principal Results” section.

Figure 2. Cluster for the Selected Features of “Daily Living Skills” using K-Means Algorithm. ARI: Adjusted Random Index; ASD: Autism spectrum disorder.
View this figure

Building the Model by Machine Learning

We have used 4 supervised machine learning algorithms (decision tree [68], logistic regression [36,69,70], KNN [71,72], and ANN [73-75]) to build the prediction model and compared the results to find out the best machine learning algorithm that can be used for the prediction from this kind of problem and data sets. We used 4 data sets (described in the “Select the Data Set” section) for each algorithm. We used 80% of data for training purposes and 20% for testing purposes from every data set for all the algorithms. We validated our models by k-fold cross-validation (where k=5) [76,77] and took the score’s average as the model’s accuracy. We describe the models based on different machine learning algorithms in the following sections.

Supervised Machine Learning

Decision Tree

For implementation of the decision tree classification algorithm, we used the tree.DecisionTreeClassifier [78] from the sklearn library [79] of Python [80] to build models for 4 distinguished data sets. The highest accuracy (87.85%; average of fivefold cross-validation score) was obtained for the Brushes teeth data set among the 4 models. These models were implemented in Python’s Jupyter Notebook [81].

Logistic Regression

For implementation of the classification model, we used the LogisticRegression class [82] from the sklearn library [79] of Python [80] to build 4 predictive models from the “daily living skills” milestone parameter. We calculated the accuracy of the model based on the average fivefold cross-validation score, with accuracies for Brushes teeth, Asks to use toilet, Urinates in toilet or potty, and Buttons large buttons in front, in correct buttonholes being 95.00%, 77.35%, 84.98%, and 71.55%, respectively.

K-Nearest Neighbor

We implemented this model in Python Jupyter Notebook using the KNeighborsClassifier [83] from the sklearn library [79] of Python [80]. In this algorithm, the K-value selection is the key to measure the model’s performance. For this reason, to build the relationship between the K-value and testing accuracy, we created a plot for a range of K-values against the accuracy for every data set (Figure 3). From the graphical representation, we can easily pick the right K-value for a standard accuracy data set. For example, from Figure 3A, we have chosen K=5 for the Brushes teeth data set and applied it in the KNeighborsClassifier [83], which created 95.00% (average fivefold cross-validation score) of the model. For other data sets, similarly, we used the K-value from the graphical representation of Figure 3 and obtained satisfactory accuracy (details of outcomes are described in the “Results” section).

Figure 3. Graphical representation for calculating the best K- value against the test accuracy for the datasets. KNN: K-Nearest Neighbor.
View this figure
Artificial Neural Network

We have used the keras.Sequential [84] model from the TensorFlow [85] library to build the models. Figure 4 shows the confusion matrix for the 4 data sets using ANN. Table 4 shows the ANN model’s overall classification report for all data sets.

Figure 4. Confusion Matrix for all the Datasets.
View this figure
Table 4. The artificial neural network model’s overall classification report for all data sets.
Data set and classification reportPrecisionRecallF1 scoreSupport
Brushes teeth



Macro average0.500.480.4942

Weighted average1.000.950.9842
Urinates in toilet or potty



Macro average0.460.500.4834

Weighted average0.830.910.8734
Asks to use toilet



Macro average0.420.500.4632

Weighted average0.710.840.7732
Buttons large buttons in front, in correct buttonholes



Macro average0.880.560.5333

Weighted average0.820.760.6833

aN/A: not applicable.

In this study, we have implemented 4 supervised machine languages to build predictive models for the “daily living skill” milestone parameter of children with ASD based on their demography. A summary of the results for different machine learning algorithms for predicting this milestone parameter is presented in Table 5.

We validated the model’s result by a fivefold validation score. From Table 5, we can conclude that, based on the demography, Daily living skills and Brushes teeth data sets had the highest accuracy in all machine learning–based models. The “ANN” performed well among the machine learning algorithms studied. In conclusion, if we need to develop an automated system to predict the “daily living skill” milestone parameter development based on the demography, then from this study’s outcome, we can recommend developing a system based on machine learning algorithm, especially ANN.

We validated the performance of our classifiers by receiver operating characteristic–area under the curve (ROC–AUC) [86] scores (Table 6), with score “1” considered the outstanding classifier. Rice and Harris [87] suggested that, in applied psychology and prediction model of future behavior, the ROC–AUC values of 0.70 or higher would be considered to have strong effects. The average ROC–AUC scores (from 4 parameters) of the decision tree, logistic regression, KNN, and ANN were 0.84, 0.86, 0.76, and 0.83, respectively (Table 6). The ROC curves of these classifiers are presented in Multimedia Appendices 6-9.

Table 5. Summary of the accuracy of all prediction models based on demography for "daily living skills."
Parameter typesDecision tree (fivefold cross-validation score)Logistic regression (fivefold cross-validation score)K-nearest neighbor fivefold cross-validation score)Artificial neural network
Brushes teeth87.85%95.00%95.00% (K=5)95.00%
Asks to use toilet71.64%77.35%84.00% (K=13)84.00%
Urinates in toilet or potty72.52%84.98%85.02% (K=5)91.00%
Buttons large buttons in front, in correct buttonholes73.46%71.55%66.88% (K=5)76.00%
Table 6. Summary of receiver operating characteristic–area under the curve for all prediction models based on demography for "daily living skills."
Parameter typesDecision treeLogistic regressionK-nearest neighborArtificial neural network
Brushes teeth0.680.910.650.80
Asks to use toilet0.950.770.770.76
Urinates in toilet or potty0.780.890.860.91
Buttons large buttons in front, in correct buttonholes0.940.860.750.84

Principal Findings

This study reports on some major evidence-based findings regarding patients with ASD and their development in the milestone categories based on demography.

Finding 1

Among the 4 major milestone categories, “daily living skills” had the highest improvement level. Thus, it can be concluded that the caregiver and care practitioner give more importance to developing the daily living skills of children with ASD so that they can live independently without requiring any help from others.

Finding 2

The demography of children with ASD impacts the development of their milestone parameters. In Figure 5, we have summarized the demography that impacts the development of their “daily living skills” parameter. Here, “score_at_end”=1 is the final improvement point of the children with ASD. We see that family income or expenditure (Figure 5A) in the middle range helps children with ASD to develop. Besides, a nuclear family (Figure 5B) with a small number of siblings (Figure 5H) in the urban area (Figure 5J) shows the higher improvement rate of children with ASD. The age of parents is also an important factor in the development of children with ASD; generally, middle-aged (aged 25-45) parents can take better care of their children during the course of their development (Figure 5C and 5D). Occupation and education of parents are other good factors to consider; our results show that a mother who works in the house (Figure 5E) but has good education (Figure 5I) and an employed father (Figure 5F) can help achieve significant development in their child. Lastly, gender of patients remains another significant demography in our study, with male children’s development being far better than that of female children (Figure 5G).

Figure 5. The Summary of the Demography’s importance behind the ASD Children’s Milestone Parameter Development.
View this figure

Finding 3

We implemented 4 supervised machine learning algorithms to predict the “daily living skills” improvement level of children with ASD based on their demography. Among the 4 algorithms, the ANN performs better than others, and it has, on average, an average accuracy of over 80% from the same data set we have used in other algorithms. Thus, we can conclude and recommend the ANN to develop a demography-based prediction tool in the intervention or treatment process of children with ASD.


Although we achieved some satisfactory results and reported important findings in this study, our data set lacks in some aspects. The first limitation of the data set is its scattered property, which makes it challenging to find patterns for analysis, but still we achieved good accuracy from this data set. Increasing the number of data can help resolve this problem. Although some studies had been done in this area, the real data set remains very rare. Therefore, we could not compare our study results and findings with other studies and data sets.

Comparison With Prior Work

Most mental health work is related to identification or recognition and symptom analysis of ASD [88]. In this study, we have implemented machine learning models to predict the improvement level of children with ASD based on their demography. A few studies have been performed in this area, and these are described in the following section.

Scheer et al [38] built a clinical model to predict proximal junctional kyphosis and proximal junctional failure. They used the baseline demographic, radiographic, and surgical factors for 510 patients to build the prediction model. The model’s overall accuracy was 86.3%, which has a great significance in caregiving decision making, risk analysis, and risk prediction before surgery. To build this model, they used the decision tree machine learning algorithm with 5 different bootstrapped models. This model would have been more sophisticated had they used more than 1 machine learning model for the prediction.

Another machine learning–based work has been performed by Tariq et al [2] to detect developmental delay in patients with autism, wherein they used home videos of Bangladeshi children to train and validate the model. Their study’s main objective was to determine the “risk scores” for autism. Using a 2-classification layer neural network, they achieved 85% accuracy for predicting developmental delay. This work has been very effective not only for predicting developmental delay but also for early detection of autism remotely. The authors trained the model with the US data set, but they achieved only low accuracy when applying the Bangladeshi data set. Thus, the model had no cultural divergence.

To evaluate the ADDM status of children, Maenner et al [39] have developed a machine learning–based model using the words and phrases in children’s developmental evaluation. This model has been built with the random forest classifier by deploying the 2008 Georgia data set containing data on 1162 children. With 86.5% accuracy, the machine learning–based algorithm significantly differentiated between the children that do and do not meet ASD surveillance criteria. As is the case with Scheer et al [38], this work would have been more in-depth had there been more than 1 machine learning algorithm for building the model.

Nowell et al [15] summarized in their review that patients’ demographic has an influence on their ASD development. The main finding of their study was that “myriad demographic factors influence the diagnosis of ASD.” Their study proves that the patient’s demography, including race, socioeconomic status, ethnicity, and parental education, is the most important factor in ASD diagnosis. However, most of the studies reviewed were based on children in the United States.

A sufficient number of studies have been performed to detect ASD by both supervised [2,89-97] and unsupervised machine [98,99] learning methods. In our study, supervised machine learning has mainly been used for the detection of ASD through behavioral or neuroimaging data, whereas unsupervised machine learning was deployed for predicting ASD assessment. In supervised machine learning, logistic regression, KNN, neural network, convolution neural network, naive Bayes, support vector machine, and rule-based machine learning models have been used to detect ASD. Raj and Masood [89] deployed some supervised machine learning models with 3 nonclinical ASD data sets to predict and analyze the problem of ASD. Feature selection–based machine learning has been used to detect ASD with accuracy greater than 90% [90]. Tariq et al [2,91] used home videos of Bangladeshi children with ASD in supervised machine learning to detect their speech and language problems. Küpper et al [92] deployed the clinical behavioral feature in support vector machine to detect the ASD problems in adolescents. Besides these studies, rule-based [93] classification approaches such as decision trees, random forest, and linear discriminant analysis [94-97] have been used to detect ASD. By contrast, unsupervised machine learning has been used for predicting ASD assessment or analysis of ASD problem in children [98,99].

Comparison With Our Study

Most of the work on children with ASD concerned generalized development, but in this study, we developed prediction models for specific milestone parameters concerning development in children with ASD. Unlike other previous studies, we have validated the prediction result for a specific milestone parameter with more than 1 machine learning algorithm. Our study used the same cultural demographic data set (from Bangladesh) for both training and predicting the models, which helps to get an accurate result from the models.


This study implies 3 significant factors in the area of mental health development of children with ASD in low- and middle-income countries such as Bangladesh. First, we evaluated the improvement in milestone parameters in children with ASD from the mCARE project. The “daily living skills” and “motor skills” had significant improvement after deploying mCARE tools. We have developed 4 supervised machine learning models based on the demographic information of children with ASD to predict their “daily living skills” development. By comparing the accuracy of the algorithms, we can conclude that the ANN with 1 hidden layer can provide the appropriate prediction for the improvement in “daily living skills” of children with ASD. At the end of the study, from the supervised and unsupervised algorithms, we found some important demographic characteristics that can impact the improvement level in children with ASD. In conclusion, successful and accurate prediction tools deploying this study’s findings will make a renovation in the area of mental health, especially in the development of children with ASD.


This study has been partially supported by an NIH grant (1R21MH116726-01). The authors are thankful to 4 specialist autism health care centers and institutions in Bangladesh: The Institute of Pediatric Neuro-disorder & Autism (IPNA) Bangladesh, The National Institute of Mental Health (NIMH), Autism Welfare Foundation (AWF), and Nishpap Autism Foundation and their respective departments for their continuous support throughout this study.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Data set features: 18 features about the participant’s demographic for this study.

DOCX File , 13 KB

Multimedia Appendix 2

Best 10 features from “daily living skill” by univariate selection method.

DOCX File , 13 KB

Multimedia Appendix 3

Top 10 Features from “Daily Living Skills” by Feature Importance method.

PNG File , 43 KB

Multimedia Appendix 4

Correlation Matrix with Heatmap for “Daily Living Skill” Dataset.

PNG File , 4390 KB

Multimedia Appendix 5

Cluster Analysis by “Elbow Method” for the Selected Features of “Daily Living Skills”.

PNG File , 299 KB

Multimedia Appendix 6

ROC curve for four parameters of “Daily Living Skills” by “Decision Tree” model.

PNG File , 205 KB

Multimedia Appendix 7

ROC curve for four parameters of “Daily Living Skills” by “Logistic Regression” model.

PNG File , 204 KB

Multimedia Appendix 8

ROC curve for four parameters of “Daily Living Skills” by “K-Nearest Neighbor” model.

PNG File , 200 KB

Multimedia Appendix 9

ROC curve for four parameters of “Daily Living Skills” by “Artificial Neural Network” model.

PNG File , 211 KB


  1. Wallace GL, Kenworthy L, Pugliese CE, Popal HS, White EI, Brodsky E, et al. Real-World Executive Functions in Adults with Autism Spectrum Disorder: Profiles of Impairment and Associations with Adaptive Functioning and Co-morbid Anxiety and Depression. J Autism Dev Disord 2016 Mar 16;46(3):1071-1083 [FREE Full text] [CrossRef] [Medline]
  2. Tariq Q, Fleming SL, Schwartz JN, Dunlap K, Corbin C, Washington P, et al. Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study. J Med Internet Res 2019 Apr 24;21(4):e13822 [FREE Full text] [CrossRef] [Medline]
  3. Kanner L. Autistic disturbances of affective contact. Acta Paedopsychiatr 1968;35(4):100-136. [Medline]
  4. Sealey L, Hughes B, Sriskanda A, Guest J, Gibson A, Johnson-Williams L, et al. Environmental factors in the development of autism spectrum disorders. Environ Int 2016 Mar;88:288-298. [CrossRef] [Medline]
  5. Hisle-Gorman E, Susi A, Stokes T, Gorman G, Erdie-Lalena C, Nylund CM. Prenatal, perinatal, and neonatal risk factors of autism spectrum disorder. Pediatr Res 2018 Aug;84(2):190-198. [CrossRef] [Medline]
  6. DiGuiseppi CG, Daniels JL, Fallin DM, Rosenberg SA, Schieve LA, Thomas KC, et al. Demographic profile of families and children in the Study to Explore Early Development (SEED): Case-control study of autism spectrum disorder. Disabil Health J 2016 Jul;9(3):544-551 [FREE Full text] [CrossRef] [Medline]
  7. Speaks A. Autism Statistics and Facts. autism speaks.   URL: [accessed 2021-03-18]
  8. Happé FG, Mansour H, Barrett P, Brown T, Abbott P, Charlton RA. Demographic and Cognitive Profile of Individuals Seeking a Diagnosis of Autism Spectrum Disorder in Adulthood. J Autism Dev Disord 2016 Nov;46(11):3469-3480. [CrossRef] [Medline]
  9. Cromer J. Autism: fastest-growing developmental disability. Autism: U.S. ARMY; 2018.   URL: [accessed 2021-03-17]
  10. Baio J, Wiggins L, Christensen DL, Maenner MJ, Daniels J, Warren Z, et al. Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveill Summ 2018 Apr 27;67(6):1-23 [FREE Full text] [CrossRef] [Medline]
  11. Hossain MD, Ahmed HU, Jalal Uddin MM, Chowdhury WA, Iqbal MS, Kabir RI, et al. Autism Spectrum disorders (ASD) in South Asia: a systematic review. BMC Psychiatry 2017 Aug 01;17(1):281 [FREE Full text] [CrossRef] [Medline]
  12. Baghdadli A, Pascal C, Grisi S, Aussilloux C. Risk factors for self-injurious behaviours among 222 young children with autistic disorders. J Intellect Disabil Res 2003 Nov;47(Pt 8):622-627. [CrossRef] [Medline]
  13. De Giacomo A, Fombonne E. Parental recognition of developmental abnormalities in autism. European Child & Adolescent Psychiatry 1998 Oct 12;7(3):131-136. [CrossRef]
  14. Fombonne E. The epidemiology of autism: a review. Psychol Med 1999 Jul;29(4):769-786. [CrossRef] [Medline]
  15. Nowell KP, Brewton CM, Allain E, Mire SS. The Influence of Demographic Factors on the Identification of Autism Spectrum Disorder: A Review and Call for Research. Rev J Autism Dev Disord 2015 Jul 10;2(3):300-309. [CrossRef]
  16. Zwaigenbaum L, Bryson S, Garon N. Early identification of autism spectrum disorders. Behav Brain Res 2013 Aug 15;251:133-146. [CrossRef] [Medline]
  17. Harris S, Handleman J. Age and IQ at intake as predictors of placement for young children with autism: a four- to six-year follow-up. J Autism Dev Disord 2000 Apr;30(2):137-142. [CrossRef] [Medline]
  18. Turner LM, Stone WL, Pozdol SL, Coonrod EE. Follow-up of children with autism spectrum disorders from age 2 to age 9. Autism 2006 May;10(3):243-265. [CrossRef] [Medline]
  19. Webb S, Jones E. Early Identification of Autism: Early Characteristics, Onset of Symptoms, and Diagnostic Stability. Infants Young Child 2009;22(2):100-118 [FREE Full text] [CrossRef] [Medline]
  20. Filipek PA, Accardo PJ, Ashwal S, Baranek GT, Cook EH, Dawson G, et al. Practice parameter: screening and diagnosis of autism: report of the Quality Standards Subcommittee of the American Academy of Neurology and the Child Neurology Society. Neurology 2000 Aug 22;55(4):468-479. [CrossRef] [Medline]
  21. Lord C, Volkmar F. Genetics of childhood disorders: XLII. Autism, part 1: Diagnosis and assessment in autistic spectrum disorders. J Am Acad Child Adolesc Psychiatry 2002 Sep;41(9):1134-1136. [CrossRef] [Medline]
  22. Zwaigenbaum L, Bauman ML, Stone WL, Yirmiya N, Estes A, Hansen RL, et al. Early Identification of Autism Spectrum Disorder: Recommendations for Practice and Research. Pediatrics 2015 Oct;136 Suppl 1:S10-S40. [CrossRef] [Medline]
  23. Bauman ML. Medical comorbidities in autism: challenges to diagnosis and treatment. Neurotherapeutics 2010 Jul;7(3):320-327 [FREE Full text] [CrossRef] [Medline]
  24. Sowa M, Meulenbroek R. Effects of physical exercise on Autism Spectrum Disorders: A meta-analysis. Research in Autism Spectrum Disorders 2012 Jan;6(1):46-57. [CrossRef]
  25. Courchesne E, Carper R, Akshoomoff N. Evidence of brain overgrowth in the first year of life in autism. JAMA 2003 Jul 16;290(3):337-344. [CrossRef] [Medline]
  26. Ursano R, Bell C, Eth S, Friedman M, Norwood A, Pfefferbaum B, Work Group on ASDPTSD, Steering Committee on Practice Guidelines. Practice guideline for the treatment of patients with acute stress disorder and posttraumatic stress disorder. Am J Psychiatry 2004 Nov;161(11 Suppl):3-31. [Medline]
  27. Dyches TT, Wilder LK, Sudweeks RR, Obiakor FE, Algozzine B. Multicultural Issues in Autism. J Autism Dev Disord 2004 Apr;34(2):211-222. [CrossRef]
  28. Mandell DS, Listerud J, Levy SE, Pinto-Martin JA. Race differences in the age at diagnosis among medicaid-eligible children with autism. J Am Acad Child Adolesc Psychiatry 2002 Dec;41(12):1447-1453. [CrossRef] [Medline]
  29. Ravindran N, Myers BJ. Cultural Influences on Perceptions of Health, Illness, and Disability: A Review and Focus on Autism. J Child Fam Stud 2011 May 12;21(2):311-319. [CrossRef]
  30. Thomas P, Zahorodny W, Peng B, Kim S, Jani N, Halperin W, et al. The association of autism diagnosis with socioeconomic status. Autism 2012 Mar;16(2):201-213. [CrossRef] [Medline]
  31. Sathyabama R. Clinical characteristics and demographic profile of children with Autism Spectrum Disorder (ASD) at child development clinic (CDC), Penang Hospital, Malaysia. Med J Malaysia 2019 Oct;74(5):372-376 [FREE Full text] [Medline]
  32. Courchesne E, Gazestani VH, Lewis NE. Prenatal Origins of ASD: The When, What, and How of ASD Development. Trends Neurosci 2020 May;43(5):326-342 [FREE Full text] [CrossRef] [Medline]
  33. Siller M, Reyes N, Hotez E, Hutman T, Sigman M. Longitudinal change in the use of services in autism spectrum disorder: understanding the role of child characteristics, family demographics, and parent cognitions. Autism 2014 May;18(4):433-446. [CrossRef] [Medline]
  34. Decision Trees for Classification: A Machine Learning Algorithm. Resources X. 2017 Sep 7.   URL: [accessed 2021-03-20]
  35. Molnar C. Logistic Regression. Interpretable Machine Learning.   URL: [accessed 2021-03-20]
  36. Logistic regression. Wikipedia.   URL: [accessed 2021-03-19]
  37. Euclidean distance. Wikipedia.   URL: [accessed 2021-03-20]
  38. Scheer J, Osorio J, Smith J, Schwab F, Lafage V, Hart R, International Spine Study Group. Development of Validated Computer-based Preoperative Predictive Model for Proximal Junction Failure (PJF) or Clinically Significant PJK With 86% Accuracy Based on 510 ASD Patients With 2-year Follow-up. Spine (Phila Pa 1976) 2016 Nov 15;41(22):E1328-E1335. [CrossRef] [Medline]
  39. Maenner MJ, Yeargin-Allsopp M, Van Naarden Braun K, Christensen DL, Schieve LA. Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder. PLoS One 2016;11(12):e0168224 [FREE Full text] [CrossRef] [Medline]
  40. Crais ER, Watson LR, Baranek GT, Reznick JS. Early identification of autism: how early can we go? Semin Speech Lang 2006 Aug;27(3):143-160. [CrossRef] [Medline]
  41. Chakrabarti S. Early identification of autism. Indian Pediatr 2009 May;46(5):412-414 [FREE Full text] [Medline]
  42. Barbaro J, Halder S. Early Identification of Autism Spectrum Disorder: Current Challenges and Future Global Directions. Curr Dev Disord Rep 2016 Feb 20;3(1):67-74. [CrossRef]
  43. Eaves LC, Ho HH. The very early identification of autism: outcome to age 4 1/2-5. J Autism Dev Disord 2004 Aug;34(4):367-378. [CrossRef] [Medline]
  44. Guthrie W, Swineford L, Nottke C, Wetherby A. Early diagnosis of autism spectrum disorder: stability and change in clinical diagnosis and symptom presentation. J Child Psychol Psychiatry 2013 May;54(5):582-590 [FREE Full text] [CrossRef] [Medline]
  45. Hsiao Y. Autism Spectrum Disorders: Family Demographics, Parental Stress, and Family Quality of Life. Journal of Policy and Practice in Intellectual Disabilities 2018 Mar 09;15(1):70-79. [CrossRef]
  46. Green SA, Carter AS. Predictors and course of daily living skills development in toddlers with autism spectrum disorders. J Autism Dev Disord 2014 Feb;44(2):256-263 [FREE Full text] [CrossRef] [Medline]
  47. Estes A, Munson J, Dawson G, Koehler E, Zhou X, Abbott R. Parenting stress and psychological functioning among mothers of preschool children with autism and developmental delay. Autism 2009 Jul;13(4):375-387 [FREE Full text] [CrossRef] [Medline]
  48. Estes A, Olson E, Sullivan K, Greenson J, Winter J, Dawson G, et al. Parenting-related stress and psychological distress in mothers of toddlers with autism spectrum disorders. Brain Dev 2013 Feb;35(2):133-138 [FREE Full text] [CrossRef] [Medline]
  49. Drahota A, Wood JJ, Sze KM, Van Dyke M. Effects of cognitive behavioral therapy on daily living skills in children with high-functioning autism and concurrent anxiety disorders. J Autism Dev Disord 2011 Mar;41(3):257-265 [FREE Full text] [CrossRef] [Medline]
  50. U.S. Department of Health and Human Services U. National Institutes of Health (NIH). NIH.   URL: [accessed 2021-03-18]
  51. Ministry of Health and Family Welfare (MoHFW). National Institute of Mental Health and Hospital. Facility Registry- Government of People's Republic of Bangladesh Ministry of Health and Family Welfare.   URL: [accessed 2021-03-18]
  52. Institute For Peadiatric Neurodisorder And Autism in BSMMU. (IPNA) IoPNA. 2019.   URL: [accessed 2021-03-18]
  53. Global T. Nishpap Autism Foundation. Onsite Training with Nishpap Autism Foundation in Chattogram. 2018.   URL: https:/​/www.​​globalblog/​onsite-training-with-nishpap-autism-foundation-in-chattogram-bangladesh/​ [accessed 2021-03-18]
  54. Awfbd. Working for the Brighter Future of Person with Autism. Autism Welfare Foundation.   URL: [accessed 2021-03-18]
  55. MathsIsFun. Confidence Intervals. Math Fun Advanced.   URL: [accessed 2021-03-18]
  56. Roy B. All about Feature Scaling. Towards Data Science.   URL: [accessed 2021-03-19]
  57. Brownlee J. How to Use StandardScaler and MinMaxScaler Transforms in Python. Machine Learning Mastery.   URL: [accessed 2021-03-19]
  58. Chong J. What Is Feature Scaling & Why Is it Important in Machine Learning?.   URL: https:/​/towardsdatascience.​com/​what-is-feature-scaling-why-is-it-important-in-machine-learning-2854ae877048 [accessed 2021-03-18]
  59. Shaikh R. Feature Selection Techniques in Machine Learning with Python. 2018.   URL: https:/​/towardsdatascience.​com/​feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e [accessed 2021-03-18]
  60. Brewer JK, Hills JR. Univariate selection: The effects of size of correlation, degree of skew, and degree of restriction. Psychometrika 1969 Sep;34(3):347-361. [CrossRef]
  61. Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics 2010 May 15;26(10):1340-1347. [CrossRef] [Medline]
  62. Zien A, Krämer N, Sonnenburg S, Rätsch G. The Feature Importance Ranking Measure. 2009 Sep 06 Presented at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases; 2009; Berlin, Germany p. 694-709. [CrossRef]
  63. Hooker S, Erhan D, Kindermans P, Kim B. Evaluating feature importance estimates. 2019 Nov 05 Presented at: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); 2019; Vancouver, BC, Canada.
  64. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 2016 Sep 15;32(18):2847-2849. [CrossRef] [Medline]
  65. Likas A, Vlassis N, J. Verbeek J. The global k-means clustering algorithm. Pattern Recognition 2003 Feb;36(2):451-461. [CrossRef]
  66. Elbow method (clustering). Wikipedia.   URL: [accessed 2021-03-21]
  67. Yeung KY, Ruzzo WL. Principal component analysis for clustering gene expression data. Bioinformatics 2001 Sep;17(9):763-774. [CrossRef] [Medline]
  68. Decision tree. Wikipedia.   URL: [accessed 2021-03-19]
  69. Wright R. Logistic regression. American Psychological Association 1995:2017-2244.
  70. Kleinbaum D, Dietz K, Gail M, Klein M, Klein M. Logistic Regression. New York, NY: Springer; 2010.
  71. Kramer O. K-Nearest Neighbors. Berlin, Germany: Springer; 2013:13-23.
  72. k-Nearest Neighbors Algorithm. Wikipedia.   URL: [accessed 2021-03-19]
  73. Wang S. Artificial Neural Network. Boston, MA: Springer; 2003:81-100.
  74. Artificial Neural Network. Wikipedia.   URL: [accessed 2021-03-19]
  75. Snijders T, Bosker R. Fundamentals of Artificial Neural Networks. London, UK: MIT press; 1999.
  76. Cross-Validation (statistics). Wikipedia.   URL: [accessed 2021-03-19]
  77. Rodriguez J, Perez A, Lozano J. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans. Pattern Anal. Mach. Intell 2010 Mar;32(3):569-575. [CrossRef]
  78. A Decision Tree Classifier: Scikit Learn.   URL: [accessed 2021-03-20]
  79. Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A. Scikit-learn. GetMobile: Mobile Comp. and Comm 2015 Jun;19(1):29-33. [CrossRef]
  80. Python Foundation. python.   URL: [accessed 2021-03-20]
  81. Jupyter.   URL: [accessed 2021-03-20]
  82. LogisticRegression.   URL: [accessed 2021-03-20]
  83. KNeighborsClassifier.   URL: [accessed 2021-03-20]
  84. Keras K. The Sequential Model. 2020.   URL: [accessed 2021-03-20]
  85. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv 2016 Mar 14 [FREE Full text]
  86. Jin Huang, Ling C. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng 2005 Mar;17(3):299-310. [CrossRef]
  87. Rice ME, Harris GT. Comparing effect sizes in follow-up studies: ROC Area, Cohen's d, and r. Law Hum Behav 2005 Oct;29(5):615-620. [CrossRef] [Medline]
  88. Elsevier. Most Downloaded Research in Autism Spectrum Disorders Articles. 2021.   URL: [accessed 2021-03-18]
  89. Raj S, Masood S. Analysis and Detection of Autism Spectrum Disorder Using Machine Learning Techniques. Procedia Computer Science 2020;167:994-1004. [CrossRef]
  90. Kosmicki JA, Sochat V, Duda M, Wall DP. Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Transl Psychiatry 2015 Mar 24;5(2):e514-e514 [FREE Full text] [CrossRef] [Medline]
  91. Tariq Q, Daniels J, Schwartz JN, Washington P, Kalantarian H, Wall DP. Mobile detection of autism through machine learning on home video: A development and prospective validation study. PLoS Med 2018 Nov 27;15(11):e1002705 [FREE Full text] [CrossRef] [Medline]
  92. Küpper C, Stroth S, Wolff N, Hauck F, Kliewer N, Schad-Hansjosten T, et al. Identifying predictive features of autism spectrum disorders in a clinical sample of adolescents and adults using machine learning. Sci Rep 2020 Mar 18;10(1):4805. [CrossRef] [Medline]
  93. Thabtah F, Peebles D. A new machine learning model based on induction of rules for autism detection. Health Informatics J 2020 Mar 29;26(1):264-286 [FREE Full text] [CrossRef] [Medline]
  94. Duda M, Ma R, Haber N, Wall DP. Use of machine learning for behavioral distinction of autism and ADHD. Transl Psychiatry 2016 Mar 09;6(2):e732 [FREE Full text] [CrossRef] [Medline]
  95. Hyde KK, Novack MN, LaHaye N, Parlett-Pelleriti C, Anden R, Dixon DR, et al. Applications of Supervised Machine Learning in Autism Spectrum Disorder Research: a Review. Rev J Autism Dev Disord 2019 Feb 19;6(2):128-146. [CrossRef]
  96. Thabtah F. Machine learning in autistic spectrum disorder behavioral research: A review and ways forward. Inform Health Soc Care 2019 Oct 13;44(3):278-297. [CrossRef] [Medline]
  97. Eslami T, Mirjalili V, Fong A, Laird AR, Saeed F. ASD-DiagNet: A Hybrid Learning Approach for Detection of Autism Spectrum Disorder Using fMRI Data. Front Neuroinform 2019 Nov 27;13:70 [FREE Full text] [CrossRef] [Medline]
  98. Pratap A, Kanimozhiselvi C. Predictive assessment of autism using unsupervised machine learning models. IJAIP 2014;6(2):113. [CrossRef]
  99. Stevens E, Dixon DR, Novack MN, Granpeesheh D, Smith T, Linstead E. Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning. Int J Med Inform 2019 Sep;129:29-36 [FREE Full text] [CrossRef] [Medline]

ADDM: Autism and Developmental Disabilities Monitoring
ANN: artificial neural network
ASD: autism spectrum disorder
AUC: area under the curve
AWF: Autism Welfare Foundation
IPNA: Institute of Pediatric Neuro-disorder & Autism
KNN: K-nearest neighbor
NIH: National Institutes of Health
NIMH: National Institute of Mental Health
ROC: receiver operating characteristic

Edited by G Eysenbach; submitted 30.03.21; peer-reviewed by A Das, M Elbattah, V Jain; comments to author 22.04.21; revised version received 10.05.21; accepted 12.05.21; published 08.06.21


©Munirul M Haque, Masud Rabbani, Dipranjan Das Dipal, Md Ishrak Islam Zarif, Anik Iqbal, Amy Schwichtenberg, Naveen Bansal, Tanjir Rashid Soron, Syed Ishtiaque Ahmed, Sheikh Iqbal Ahamed. Originally published in JMIR Medical Informatics (, 08.06.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.