Published on in Vol 9, No 11 (2021): November

Preprints (earlier versions) of this paper are available at, first published .
Machine Learning Applications in Mental Health and Substance Use Research Among the LGBTQ2S+ Population: Scoping Review

Machine Learning Applications in Mental Health and Substance Use Research Among the LGBTQ2S+ Population: Scoping Review

Machine Learning Applications in Mental Health and Substance Use Research Among the LGBTQ2S+ Population: Scoping Review


1Centre for Addiction and Mental Health, Toronto, ON, Canada

2Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada

3Factor-Inwentash Faculty of Social Work, University of Toronto, Toronto, ON, Canada

4Sunnybrook Research Institute, University of Toronto, Toronto, ON, Canada

5Women’s College Research Institute, Toronto, ON, Canada

6Canadian Institutes of Health Research, Government of Canada, Ottawa, ON, Canada

7School of Pharmacy, Faculty of Science, University of Waterloo, Kitchener, ON, Canada

8Children’s Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada

Corresponding Author:

Anasua Kundu, MSc

Centre for Addiction and Mental Health

1000 Queen Street West

Toronto, ON, M6J 1H4


Phone: 1 6476326493


Background: A high risk of mental health or substance addiction issues among sexual and gender minority populations may have more nuanced characteristics that may not be easily discovered by traditional statistical methods.

Objective: This review aims to identify literature studies that used machine learning (ML) to investigate mental health or substance use concerns among the lesbian, gay, bisexual, transgender, queer or questioning, and two-spirit (LGBTQ2S+) population and direct future research in this field.

Methods: The MEDLINE, Embase, PubMed, CINAHL Plus, PsycINFO, IEEE Xplore, and Summon databases were searched from November to December 2020. We included original studies that used ML to explore mental health or substance use among the LGBTQ2S+ population and excluded studies of genomics and pharmacokinetics. Two independent reviewers reviewed all papers and extracted data on general study findings, model development, and discussion of the study findings.

Results: We included 11 studies in this review, of which 81% (9/11) were on mental health and 18% (2/11) were on substance use concerns. All studies were published within the last 2 years, and most were conducted in the United States. Among mutually nonexclusive population categories, sexual minority men were the most commonly studied subgroup (5/11, 45%), whereas sexual minority women were studied the least (2/11, 18%). Studies were categorized into 3 major domains: web content analysis (6/11, 54%), prediction modeling (4/11, 36%), and imaging studies (1/11, 9%).

Conclusions: ML is a promising tool for capturing and analyzing hidden data on mental health and substance use concerns among the LGBTQ2S+ population. In addition to conducting more research on sexual minority women, different mental health and substance use problems, as well as outcomes and future research should explore newer environments, data sources, and intersections with various social determinants of health.

JMIR Med Inform 2021;9(11):e28962




Members of the lesbian, gay, bisexual, transgender, queer or questioning, and two-spirit (LGBTQ2S+) population experience significant mental health disparities and are at a higher risk of substance use problems compared with their heterosexual and cisgender peers [1-5]. A meta-analysis of 25 studies revealed that lesbian, gay, and bisexual individuals had 2.47 times increased lifetime risk of attempting suicide, 1.5 times increased risk of depression and anxiety disorders, and 1.5 times increased risk of alcohol and other substance dependence over a 12-month period [2]. Recent statistics from the 2015 National Survey on Drug Use and Health in the United States reported that the sexual minority population have an increased likelihood of past year use of illicit drugs, marijuana, and opioids; current use of cigarettes and alcohol; and past year diagnosis of any mental illness compared with sexual majority groups [6]. Members of the LGBTQ2S+ population also use mental health services and substance use treatment more frequently than cisgender and heterosexual individuals [6,7].

There is a robust evidence base documenting sexual orientation and gender identity as social determinants of health, whereby members of the LGBTQ2S+ population experience stressors from stigma, social, and economic exclusion that contribute to increased mental health challenges and resultant coping strategies, including problematic substance use [8-10]. In addition, intersecting experiences of marginalization such as race, ethnicity, disability, and homelessness; lack of familial and peer support; various acts of bullying, harassment, and hate crimes; and experience of self-stigmatization, such as internalized homophobia, biphobia, and transphobia, contribute to further deterioration of mental health and substance use concerns [8,11-16].

With advances in technology, novel statistical methods, such as machine learning (ML), have emerged as promising means of analyzing a vast range of complex data in public health informatics [17,18]. ML uses computational power to identify or mine hidden data patterns and has been increasingly used for content analysis and as a predictive modeling technique [17]. These characteristics are particularly important for investigating mental health and substance use issues among the LGBTQ2S+ population, where social stigma and institutional barriers make sexual and gender identity disclosure difficult, rendering the data invisible [19-21].

There are 3 major types of ML, including (1) supervised learning, (2) unsupervised learning, and (3) semisupervised learning. Supervised learning aims to learn from labeled data to predict the class of unlabeled input data or outcome variables [22]. Unsupervised learning does not require an outcome variable, thereby allowing the algorithm to freely detect and recognize hidden patterns with minimal human interference [22,23]. Semisupervised learning learns from both labeled and unlabeled data, where it can use readily available unlabeled data to improve supervised learning tasks when the labeled data are scarce or expensive [24]. A more advanced form of ML, deep learning, has gained popularity in health research in recent years and uses an artificial neural network model with multiple layers to hierarchically define and process data [25]. These ML methods provide the opportunity to understand data more thoroughly and effectively, as well as yield meaningful predictions beyond traditional statistical methods.

Several reviews, including 3 recent systematic reviews, have been conducted to summarize the application of ML in substance use and mental health issues [23,26-28]. These systematic reviews have reported ML applications in 54 articles on mental health, 87 articles on suicidal behavior, and 17 articles on addiction research and reported good performance in predicting human behavior [23,26,28]. However, most of these reviews and studies focused on broad categories and the general population or patient records.


Although one scoping review has explored studies that predict population-specific health with ML [29], the study did not identify ML applications among the LGBTQ2S+ population. There is a substantial gap in the literature, with no existing review focused on ML studies examining mental health and substance use among the LGBTQ2S+ population. As a result, we conducted a scoping review to address these knowledge gaps with the aim of mapping the current status of ML studies, focusing on this field and identifying the research gap to facilitate future research. Regarding persistent mental health and problematic substance use concerns and disparities among the LGBTQ2S+ population, the findings from this review will provide useful insights to inform research and programs.

Objectives and Methodology Framework

This review aims to conduct a comprehensive search of studies using ML to investigate mental health or substance use among LGBTQ2S+ communities and to determine the scope of future research. We used the following 5-stage methodological framework developed by Arksey and O’Malley [30]: (1) identifying specific research questions; (2) identifying relevant studies through a comprehensive search of different sources; (3) study selection by applying inclusion and exclusion criteria; (4) data charting using custom-made data extraction forms; and (5) collating, summarizing, and reporting the results. We also used an extension of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for scoping reviews [31] to present our findings, and the Joana Briggs Institute proposed methodology of scoping reviews [32] to narrate the implications for future research. The review protocol was registered on the Open Science Framework [33] on December 17, 2020, to facilitate transparency and reproducibility of the study.

Identifying Research Questions

Initially, we identified a broad set of preliminary questions for this scoping review:

  • What is the volume of the literature that used machine learning analysis in the field of mental health and substance use among the LGBTQ2S+ population?
  • What are the fields of mental health and substance use among the LGBTQ2S+ population that have been studied by machine learning?
  • Which subgroups of the LGBTQ2S+ population have been investigated? Are there any specific subgroups that have been studied using machine learning analysis?
  • What types of machine learning methods (eg, supervised, unsupervised, semisupervised, and deep learning) and algorithms (eg, decision trees, random forest, logistic regression, and penalized regression) have been used to study LGBTQ2S+ mental health and substance use?
  • What are the real-world implications of these studies? Are there any knowledge gaps or untouched domains that should be addressed in future research?

Identifying Relevant Studies

To gather a large quantity of relevant literature, we followed previous review studies with similar objectives [27,29] and searched the following databases: MEDLINE (Ovid), Embase (Ovid), CINAHL Plus, APA PsycINFO (Ovid), PubMed, and IEEE Xplore. We also searched the Summon (ProQuest) database used by the University of Toronto Libraries, which searches across many other databases, journal packages, e-book collections, and other resources. Information technology databases such as IEEE Xplore were selected as a potential source of ML-related literature. Literature searches involved a combination of keywords (eg, mental health, mental disease, mental health service, substance abuse, ML, sexual and gender minorities, LGBT, lesbian, gay, men who have sex with men, bisexual, queer, two-spirit, intersex, and transgender) and medical subject headings, if applicable. A librarian was consulted regarding the keywords and search terms.

Two reviewers (AK and RB) conducted the database search from November 25 to December 13, 2020, and imported all citations to the Covidence web platform, where duplicate papers were removed automatically. The databases were searched from the date of inception of the databases to the year 2020, with no filter in place for publication year. The bibliography lists of the included studies and review papers were reviewed on December 13, 2020, to identify any potential studies. The full Embase search strategy, representing an example of the search query applied to all other databases, is presented in Multimedia Appendix 1.

Study Selection

We included studies that used ML to investigate mental health or substance use behaviors of people within the LGBTQ2S+ population. Studies in which ML was used partially, but not for the main statistical analysis, were included in the review. We only included empirical investigations, thereby excluding editorials, opinion pieces, and reviews. We also excluded papers that used logistic regression analyses, not as a ML algorithm, but to determine LGBTQ2S+ identity status. In addition, studies in which full texts could not be retrieved with institutional license, and studies of genomics, pharmacokinetics, and those that were not directly relevant to humans were excluded.

Two reviewers (AK and RB) independently screened each title and abstract based on the eligibility criteria and completed full-text screening of the remaining studies. Disagreements were resolved through discussions among the 3 reviewers (AK, RB, and MC) to yield a list of final included studies.

Data Charting

To facilitate data charting and reporting, individual reviewers (AK and RB) first reviewed all studies and extracted key phrases and concepts from each study. We based our data extraction items on features identified in a recent biomedical guideline for reporting ML studies [34]. Custom-made data extraction forms were developed from this guideline, which included major extraction categories such as general study characteristics (ie, author, year, country, target population, source of data, sample size, field of study, ML domains, ML methods, algorithms, and outcomes), key components of model development (ie, whether the studies discussed methods of feature selection, resampling, model performance metrics, and method of validation), and discussion of study findings (ie, importance ranking of features, intersectionality, and other procedures or features applied).

Collating, Summarizing and Reporting Results

We presented descriptive statistics for the extracted data sets by calculating the total number and percentage of all studies in each category. To provide a visual overview of the range of data, we presented a bar chart that showed the frequency analysis of studies according to the field of study and a pie chart that demonstrated the proportion of studies in the major domains of ML. We used a narrative synthesis approach [35] to describe the findings of the studies in the different ML domains and explored relationships in the data. Finally, we discussed research gaps to facilitate future research.

The initial search of databases yielded 2669 articles, of which 2489 were retrieved after removing duplicates. We also searched the reference lists of potentially eligible articles and previous reviews but could not identify any studies that matched our inclusion criteria. After title and abstract screening, 21 articles were selected for full-text screening. Of these, we excluded articles that did not meet the target population criteria of the LGBTQ2S+ population (3/21, 14%), full-texts could not be retrieved (1/21, 4%), unrelated to ML (4/21, 19%), duplicate article published in a conference proceeding (1/21, 4%), and a commentary (1/21, 4%). This resulted in 11 studies being included in the final review [36-46]. The detailed selection process of the articles is presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Figure 1).

Figure 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram documenting study exclusion. LGBTQ+: lesbian, gay, bisexual, transgender, queer, or questioning; ML: machine learning.
View this figure

Study Characteristics

All 11 included studies [36-46] were published within the last 2 years (Table 1). Most of the studies were carried out in the United States (7/11, 63%) [36,38,39,41-43,45]. Among the target population categories that were not mutually exclusive, sexual minority men (gay, men who have sex with men, bisexual) were the most commonly studied (5/11, 45%) subgroups [37,40,42-44], followed by transgender (3/11, 27%) [39,45,46] and LGBTQ+ (3/11, 27%) [36,38,41] people at large, whereas sexual minority women (lesbian and bisexual) (2/11, 18%) [43,45] were the least commonly represented populations. None of the studies included two-spirit persons as their target population (Table 1).

Table 1. Summary statistics of included studies (N=11) [36-46].a
CharacteristicsNumber of studies, n (%)

United States7 (63)

China2 (18)

Sweden1 (9)

Australia1 (9)
Years published

20195 (45)

20206 (54)
Field of study

Mental health (n=9)

Suicide or self-injury2 (18)

Depression2 (18)

Mood or affect processes3 (27)

Minority stress1 (9)

Gender incongruence1 (9)

Substance use (n=2)

Tobacco1 (9)

Poppers or alkyl nitrites1 (9)
Target populationb

Sexual minorities: male (gay, MSMc, bisexual)5 (45)

Sexual minorities: female (lesbian, bisexual)2 (18)

Transgender or gender minorities3 (27)

LGBT/LGBTQ+d3 (27)
Domains of MLe

Web content analysis6 (55)

Prediction modeling4 (36)

Imaging study1 (9)
Type of ML

Supervised9 (82)

Unsupervised3 (27)

Deep1 (9)
ML algorithms

LDAf3 (27)

RFg2 (18)

SVMh2 (18)

CNNi1 (9)

MLPj1 (9)

NBk1 (9)

Penalized regression (LASSOl, elastic net regularized regression, ridge regression)2 (18)

Logistic regression1 (9)

Boosting (XGBoostm, AdaBoostn, GBMo)3 (27)

Classification tree2 (18)
Feature selection

Yes7 (64)

No4 (36)
Discussed model performance

Used performance metrics9 (82)

Didn\'t use performance metrics1 (9)

Didn\'t discuss performance1 (9)
Method of validation

Hold-out2 (18)

Cross-validation7 (64)

External validation2 (18)

Unspecified4 (36)

aMultiple response options were possible for some study characteristics.

bCategories are not mutually exclusive.

cMSM: men who have sex with men.

dLGBT/LGBTQ+: lesbian, gay, bisexual, and transgender/lesbian, gay, bisexual, transgender, queer, or questioning.

eML: machine learning.

fLDA: latent Dirichlet allocation.

gRF: random forest.

hSVM: support vector machine.

iCNN: convolutional neural network.

jMLP: multilayered perceptron.

kNB: Naive Bayes.

lLASSO: least absolute shrinkage and selection operator.

mXGBoost: eXtreme Gradient Boosting.

nAdaBoost: Adaptive Boosting.

oGBM: Generalized Boosted Model.

Most of the studies focused on mental health (9/11, 82%) [36-42,45,46], and only 18% (2/11) studies [43,44] focused on substance use concerns. Most studies examined several mental health issues, such as depression, suicide, mood or affect processes, minority stress, and gender incongruence [36-42,45,46], whereas other studies that focused on substance use only examined tobacco and poppers or alkyl nitrites use [43,44]. No study looked into mental health issues and substance use concerns among the LGBTQ2S+ population simultaneously (Table 1).

The studies were categorized into 3 major ML domains: web content analysis, prediction modeling, and imaging study. Over half of the studies (6/11, 55%) were identified as web content analysis [36-41], and 36% (4/11) were identified as prediction modeling [42-45]; 1 study (9%) was identified as an imaging study [46] (Table 1).

The most commonly used class of ML methods was supervised (9/11, 82%) [37-39,41-46], followed by unsupervised (3/11, 27%) [36,37,40] and deep learning (1/11, 9%; Table 1) [41]. The most frequently used ML algorithms were latent Dirichlet allocation (3/11, 27%) and boosting (3/11, 27%), followed by random forest, support vector machines, penalized regression (ie, least absolute shrinkage and selection operator, elastic net regularized regression, and ridge regression), classification tree, logistic regression, naive Bayes, multilayered perceptron, and convolutional neural network (Table 1).

Approximately two-thirds (7/11, 64%) of the studies [37,38,42-46] discussed their methods of feature selection, among which the median number of features used was 19. Most of the studies used cross-validation methods (7/11, 64%) [37-39,41,44-46], especially 10-fold cross-validation. Furthermore, 18% (2/11) of the articles used the hold out method [39,41], 18% (2/11) used external validation [37,41], and 36% (4/11) articles [36,40,42,43] did not report how they validated their method. Most studies (9/11, 82%) [36-39,41-43,45,46] used at least one performance metric (eg, area under ROC curve, precision-recall, or F1 score) to discuss model performance. However, the remaining studies either did not use any performance metric [44] or did not discuss any model performance [40] (Table 1).

Machine Learning Domains

Multimedia Appendix 2 summarizes the characteristics of the final 11 included studies [36-46] and Multimedia Appendix 3 [36-46] presents the ML methodology used in the studies.

The 54% (6/11) studies [36-41] in the web content analysis domain obtained their data from social media sources such as Twitter, Blued, Tumblr, Reddit, and LGBT Chat and Forums. The volume of data used ranged from 12,000 to 41 million web posts. Half of the studies used their data to analyze the mood or affect processes of the users related to their sexual and gender identities [39-41] (Multimedia Appendix 2).

Among the 4 studies in the prediction modeling domain, 50% (2/4) of the studies analyzed data on adult participants [42,44] and 50% (2/4) on adolescents [43,45]. Only 1 study used a public health data set of 28,811 participants [43]; other studies used either cross-sectional or cohort data from longitudinal studies [42,44,45]. Half of the studies focused on mental health (depression and suicide) [42,45] and half on substance use behavior (cigarette, e-cigarette, and poppers use) [43,44] (Multimedia Appendix 2). Of the 4 studies, only 25% (1/4) study [45] ranked their feature importance, and 50% (2/4) studies [42,45] examined intersectionalities (Multimedia Appendix 3). One study investigated the intersection of income and other social and environmental stressors with racial or ethnic disparities and its impact on depressive symptomology among men who have sex with men [42], whereas the other focused on the intersection between various social and behavioral determinants of health (self-image, race, education, socioeconomic status, family support, friends, stigma, discrimination, etc) as risk factors of self-injurious behaviors among sexual and gender minority women [45].

One imaging trial study used clinical and functional magnetic resonance imaging data of 25 transgender adults to identify the relationship between pretherapy functional brain connectivity and posthormone therapy body congruence [46]. All 4 studies [42-45] of the prediction modeling domain and 1 imaging study [46] used the supervised method of ML, whereas studies in the web content analysis domain [36-41] used supervised (4/11, 36%), unsupervised (3/11, 27%), and deep learning (1/11, 9%) methods (Multimedia Appendix 3).

Principal Findings

Our results show that the application of ML to assess mental health and substance use behavior among the LGBTQ2S+ population is still new in health research, compared with the increasing use of ML techniques in other health research domains. Although there is continued criminalization and lack of LGBTQ2S+ rights protection in 67 United Nations member states at the end of 2020 [47], there appears to be an increasing acceptance of sexual and gender minority people in diverse contexts such as in North American countries and Western Europe [48]. However, very few of the included studies were conducted outside the United States (Table 1).

Only a few mental health problems were addressed across the few relevant ML studies conducted to date (Table 1). Although there is evidence of a higher prevalence of anxiety disorders, posttraumatic stress disorder, and various mood disorders (eg, mania and persistent depressive disorder) among the LGBTQ2S+ population compared with cisgender and heterosexual counterparts [4], no studies have been conducted on these issues. Compared with mental health issues, substance use problems among the LGBTQ2S+ population were almost untouched. Moreover, both of the included substance use related studies predicted the present use of substances [43,44], and no studies have examined future substance use, cessation, or substance use treatment-seeking behavior.

Underlying factors behind the low number of ML studies on mental health and substance use issues among the LGBTQ2S+ population may be sex and gender identity-related data invisibility and social and institutional bias [21,49]. Electronic health records have been used as a common and promising data source for ML techniques to predict population health in other research areas [27,29]. However, binary representation of sex and gender (ie, man or woman) in the electronic health records system makes some data unavailable for analysis by ML, which can underrepresent the actual problem [21,50,51]. Adopting inclusive gender, sex, and sexual orientation (GSSO) information practices, collecting sexual and gender diversity, has the potential to ensure data justice, alleviate unintentional bias, and reduce health inequity [49]. A good example of inclusive GSSO information practice could be the proposed equity stratifiers by the Canadian Institute of Health Information [52]. However, other potential data sources of ML applications, such as social media, cross-sectional survey data, longitudinal cohort, and administrative data sets were used in the included studies (Multimedia Appendix 2).

Most studies were in the web content analysis domain, indicating social media to be a potentially useful epidemiological resource for collecting data on LGBTQ2S+ people and analyzing the data using ML (Multimedia Appendix 2). We observed that unsupervised ML has also been applied in these studies with data drawn from social media [36,37,40], thus holding the potential to support qualitative research by handling large textual data sets with its computational power. This is particularly useful in LGBTQ2S+ health research, given the stigma-related and structural barriers toward identity disclosure that may inhibit data collection through other methodologies [50,51,53,54]. The use of ML in these studies has shown potential for automated identification of at-risk individuals for crisis suicide prevention and intervention [36], depressive emotions [37], minority stressors [38], negative emotions [40], and mental health signals [41] among the LGBTQ2S+ community. In addition, the sequence of transgender identity disclosure identified in a study by Haimson et al [39] may guide resource allocation and provide support through gender transition. However, self-reported mental health problems on social media might not reflect clinical diagnoses or symptomologies.

Although there is evidence of the influence of intersections of various social and behavioral determinants of health on the increased prevalence of mental health and substance use concerns among the LGBTQ2S+ population [11-16], only 2 studies examined the intersection of sexual and gender identity with ethno-racial identities, and several social, economic, and behavioral factors (ie, income, social stigma, discrimination, and family support), and their impact on depression and self-injurious behaviors [42,45]. No such studies in our review explored intersectionality in the field of substance use. Identifying these intersections by leveraging ML techniques would have practical implications by determining risk and protective factors as well as informing strategies for promoting mental well-being and substance use prevention and intervention with and for LGBTQ2S+ people. In the context of various techniques used in intersectional research, both qualitative and quantitative, and recent trends in mixed methods research [55], ML can be a very useful tool for processing vast quantities of data, data mining and clustering, and classifying attribute relationships [56,57]. Apart from the partial dependency-based measures, newer techniques and methods [58,59] in ML have emerged for analyzing interaction effects and are more suitable for assessing intersectionality.

Following the current guidelines for reporting ML studies in biomedical research [34], we documented a range of explanatory findings seen in the included studies and found that most studies mentioned their performance metrics, method of feature selection, and method of validation of their model (Table 1 and Multimedia Appendix 3). However, only 27% (3/11) studies [37,38,45] adopted the approach of approximating a relative importance score of individual features that reflected their overall contributions to the model (Multimedia Appendix 3). The implications of providing an importance score to features are particularly valuable for predictive modeling studies, where the most important predictors are targeted for future strategy adoption. Another notable finding was about half (n=2) [42,43] of the predictive modeling studies did not report any method of validation, and none of them conducted external validation of the resulting model on a different population (Multimedia Appendix 3). Validation is an important aspect of the predictive modeling process, which increases the reproducibility and generalizability of the model [60]. Hence, future studies in this domain should follow existing guidelines to validate their models [34]. Moreover, half of the predictive modeling studies had small sample sizes (<1000) (Multimedia Appendix 2). Small data sets can affect the model performance [61]. Using large population-based data sets for future research can overcome this problem and fully leverage the benefits of ML.

Compared with the other 2 domains, there was a significant gap in ML research using imaging data (ie, functional magnetic resonance imaging or electroencephalography) to examine mental health and substance use among the LGBTQ2S+ population (Table 1). Although a single identified imaging study [46] predicted cross-sex hormonal therapy responsiveness in the transgender population, which is useful for guiding and selecting candidates for therapy, the sample size was small, limiting the generalizability of the findings.

Future Research Directions

We detected significant research gaps in ML applications for mental health and substance use research among the LGBTQ2S+ population. First, future research should investigate other mental health issues (ie, anxiety disorders and mood disorders) and substance use behavior and problems (ie, alcohol, opioids, and illicit drugs) among the LGBTQ2S+ population. Second, the potential of ML applications in predicting substance use related outcomes (ie, cessation, overdose events, routes of administration, driving impairments, and other adverse reactions), mental health service access, and mental health-related outcomes (ie, disabilities, symptom management, suicide and suicide attempts, economic burden, and health care costs) should be explored.

Third, further research is needed on sexual minority women. The small number of studies included (Table 1) did not allow exploration of shared and different health needs and priorities between and within the LGBTQ2S+ population. Fourth, as the legal and societal context in which the LGBTQ2S+ population lives differ significantly between countries [48], more research should be conducted in countries outside the United States. Fifth, specific research initiatives targeted at investigating the intersection of sexual and gender minority identity with other social determinants of health (ie, race, ethnicity, citizenship, socioeconomic status, and housing condition) are necessary to better understand their potential for fostering risk and resilience regarding mental health and substance use. Finally, different data sources should be used in ML studies. Large-population-level administrative data sets should be used for prediction modeling studies for the accurate application of ML models. In addition, with the advancement of technology, the digitalization of health care, and where LGBTQ2S+ status is captured in electronic health records, these health records can be a potential data resource for ML studies with real-world clinical implications for LGBTQ2S+ people.

Strength and Limitations

To the best of our knowledge, our review is the first of its kind to explore the use of ML applications in examining mental health and substance use among LGBTQ2S+ populations. We adopted a comprehensive search strategy, including searching various multidisciplinary peer-reviewed databases to identify relevant articles as much as possible. The findings of our review need to be interpreted with consideration of one key limitation. Owing to the small number of studies, highly heterogeneous characteristics of the included studies, and inconsistent reporting of model development and validation, we could not perform a critical appraisal of the studies and therefore could not comment significantly on the overall performance of the ML techniques. However, we followed the approaches of previous scoping reviews with similar objectives [27,29] and were interested in understanding the general topics or areas being investigated by ML in the field of mental health and substance use among the LGBTQ2S+ population (ie, most commonly used data sources, study countries, and study populations) and identifying research gaps to inform future research.

As more studies are published on this research topic in the future, a systematic review with critical appraisal of relevant literatures should be conducted as the next step in research. Researchers are attempting to expand established reporting guidelines to include items that accommodate ML studies, such as the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement specific for M [62], the Artificial Intelligence extension for Consolidated Standards of Reporting Trials [63], and Artificial Intelligence extension for Standard Protocol Items: Recommendations for Interventional Trials [63] guidelines. Once developed, these guidelines can be used as critical appraisal tools for studies that adopt ML-based data analysis. There is also an opportunity to incorporate fairness and equity considerations in the development of appraisal tools for ML studies. Preliminary research has already developed mathematical metrics to measure the fairness of a ML algorithm, and if intersectionalities are met in the models [64].


Although there is an exponential growth of ML applications in other health research sectors, few studies have used these techniques in the field of mental health and substance use among the LGBTQ2S+ population. In addition to undertaking more research, future researchers should focus on applying ML algorithms with considerations for real-world implications through public health interventions and adopting policies that aim to improve health equity.


The authors would like to thank Elena Springall, a librarian at the Gerstein Science Information Centre, University of Toronto, for her support in reviewing the database search strategies. The study was funded by the Canadian Institutes of Health Research, grant number 1000993. The funder had no role in the study design, collection, analysis, or interpretation of the data, writing the manuscript, or the decision to submit the paper for publication.

Authors' Contributions

MC contributed to the study design and obtained funding and supervision. AK and RB conducted the database search, article screening, and data extraction. AK conducted the data analysis and primary drafting of the manuscript. All authors, AK, MC, RB, DG, RF, CHL, BB, CY, NM, and RS, contributed to the conceptualization, drafting, review, and approval of the manuscript for submission.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Embase search query.

DOCX File , 13 KB

Multimedia Appendix 2

Summary of studies using machine learning analysis in mental health and substance use among lesbian, gay, bisexual, transgender, queer or questioning, and two-spirit population (N=11).

DOCX File , 17 KB

Multimedia Appendix 3

Summary of characteristics of machine learning methods used (N=11).

DOCX File , 18 KB

  1. Marshal MP, Friedman MS, Stall R, King KM, Miles J, Gold MA, et al. Sexual orientation and adolescent substance use: a meta-analysis and methodological review. Addiction 2008 Apr;103(4):546-556 [FREE Full text] [CrossRef] [Medline]
  2. King M, Semlyen J, Tai SS, Killaspy H, Osborn D, Popelyuk D, et al. A systematic review of mental disorder, suicide, and deliberate self harm in lesbian, gay and bisexual people. BMC Psychiatry 2008 Aug 18;8:70 [FREE Full text] [CrossRef] [Medline]
  3. Marshal MP, Dietz LJ, Friedman MS, Stall R, Smith HA, McGinley J, et al. Suicidality and depression disparities between sexual minority and heterosexual youth: a meta-analytic review. J Adolesc Health 2011 Aug;49(2):115-123 [FREE Full text] [CrossRef] [Medline]
  4. Institute of Medicine. The Health of Lesbian, Gay, Bisexual, and Transgender People: Building a Foundation for Better Understanding. Washington (DC): National Academies Press (US); 2011.
  5. National survey on LGBTQ youth mental health. The Trevor Project. 2019.   URL: https:/​/www.​​wp-content/​uploads/​2019/​06/​The-Trevor-Project-National-Survey-Results-2019.​pdf [accessed 2021-10-19]
  6. Medley G, Lipari R, Bose J, Cribb D, Kroutil L. Sexual orientation and estimates of adult substance use and mental health: results from the 2015 national survey on drug use and health. National Survey on Drug Use and Health. 2016.   URL: https:/​/www.​​data/​sites/​default/​files/​NSDUH-SexualOrientation-2015/​NSDUH-SexualOrientation-2015/​NSDUH-SexualOrientation-2015.​htm [accessed 2021-10-19]
  7. Abramovich A, de Oliveira C, Kiran T, Iwajomo T, Ross LE, Kurdyak P. Assessment of health conditions and health service use among transgender patients in Canada. JAMA Netw Open 2020 Aug 03;3(8):e2015036 [FREE Full text] [CrossRef] [Medline]
  8. Wilson C, Cariola L. LGBTQI+ youth and mental health: a systematic review of qualitative research. Adolescent Res Rev 2019 May 21;5(2):187-211 [FREE Full text] [CrossRef]
  9. Logie C. The case for the World Health Organization's commission on the social determinants of health to address sexual orientation. Am J Public Health 2012 Jul;102(7):1243-1246. [CrossRef] [Medline]
  10. Pega F, Veale JF. The case for the World Health Organization's Commission on Social Determinants of Health to address gender identity. Am J Public Health 2015 Mar;105(3):58-62. [CrossRef] [Medline]
  11. Meyer IH. Prejudice, social stress, and mental health in lesbian, gay, and bisexual populations: conceptual issues and research evidence. Psychol Bull 2003 Sep;129(5):674-697 [FREE Full text] [CrossRef] [Medline]
  12. Ryan C, Huebner D, Diaz RM, Sanchez J. Family rejection as a predictor of negative health outcomes in white and Latino lesbian, gay, and bisexual young adults. Pediatrics 2009 Jan;123(1):346-352. [CrossRef] [Medline]
  13. Burns MN, Ryan DT, Garofalo R, Newcomb ME, Mustanski B. Mental health disorders in young urban sexual minority men. J Adolesc Health 2015 Jan;56(1):52-58 [FREE Full text] [CrossRef] [Medline]
  14. Duncan DT, Hatzenbuehler ML. Lesbian, gay, bisexual, and transgender hate crimes and suicidality among a population-based sample of sexual-minority adolescents in Boston. Am J Public Health 2014 Feb;104(2):272-278. [CrossRef] [Medline]
  15. Kosciw J, Greytak E, Palmer N, Boesen M. The 2013 National School Climate Survey: the experiences of lesbian, gay, bisexual and transgender youth in our nation's schools. GLSEN. 2014.   URL: [accessed 2021-10-19]
  16. Choi S, Wilson B, Shelton J, Gates G. Serving our youth 2015: the needs and experiences of lesbian, gay, bisexual, transgender, and questioning youth experiencing homelessness. The Williams Institute with True Colors Fund. 2015.   URL: [accessed 2021-10-19]
  17. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015 Jul 17;349(6245):255-260. [CrossRef] [Medline]
  18. Luo J, Wu M, Gopukumar D, Zhao Y. Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 2016;8:1-10 [FREE Full text] [CrossRef] [Medline]
  19. Let's discuss stigma and discrimination around mental health and substance use problems. Canadian Mental Health Association, British Columbia Division. 2014.   URL: https:/​/www.​​sites/​default/​files/​stigma-and-discrimination-around-mental-health-and-substance-use-problems.​pdf [accessed 2021-10-19]
  20. Committee on the Science of Changing Behavioral Health Social Norms, Board on Behavioral, Cognitive, and Sensory Sciences, Division of Behavioral and Social Sciences and Education, National Academies of Sciences, Engineering, and Medicine. Ending Discrimination Against People with Mental and Substance Use Disorders: The Evidence for Stigma Change. Washington (DC): National Academies Press (US); 2016:1-170.
  21. Ruberg B, Ruelos S. Data for queer lives: how LGBTQ gender and sexuality identities challenge norms of demographics. Big Data Soc 2020 Jun 18;7(1):205395172093328. [CrossRef]
  22. Naqa IE, Murphy MJ. What is machine learning? In: Naqa IE, Li R, Murphy MJ, editors. Machine Learning in Radiation Oncology. Cham: Springer; 2015:3-11.
  23. Mak KK, Lee K, Park C. Applications of machine learning in addiction studies: a systematic review. Psychiatry Res 2019 May;275:53-60. [CrossRef] [Medline]
  24. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 2009 Jan;3(1):1-130 [FREE Full text] [CrossRef]
  25. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015 May 28;521(7553):436-444. [CrossRef] [Medline]
  26. Thieme A, Belgrave D, Doherty G. Machine learning in mental health: a systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Trans Comput-Hum Interact 2020 Oct 05;27(5):1-53 [FREE Full text] [CrossRef]
  27. Shatte AB, Hutchinson DM, Teague SJ. Machine learning in mental health: a scoping review of methods and applications. Psychol Med 2019 Jul;49(9):1426-1448. [CrossRef] [Medline]
  28. Bernert R, Hilberg A, Melia R, Kim J, Shah N, Abnousi F. Artificial intelligence and suicide prevention: a systematic review of machine learning investigations. Int J Environ Res Public Health 2020 Aug 15;17(16):5929 [FREE Full text] [CrossRef] [Medline]
  29. Morgenstern JD, Buajitti E, O'Neill M, Piggott T, Goel V, Fridman D, et al. Predicting population health with machine learning: a scoping review. BMJ Open 2020 Oct 27;10(10):e037860 [FREE Full text] [CrossRef] [Medline]
  30. Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol Theory Pract 2005 Feb;8(1):19-32 [FREE Full text] [CrossRef]
  31. Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018 Oct 02;169(7):467-473. [CrossRef] [Medline]
  32. Peters M, Godfrey C, McInerney P, Soares C, Khalil H, Parker D. The Joanna Briggs Institute Reviewers’ Manual 2015: methodology for JBI scoping reviews. Joanna Briggs Institute. 2015.   URL: [accessed 2021-10-19]
  33. Kundu A, Billington R, Chaiton M. Machine learning applications in mental health and substance use research among LGBTQ2S+ population: protocol for a scoping review. Open Sci Framework 2020:A. [CrossRef]
  34. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016 Dec 16;18(12):e323 [FREE Full text] [CrossRef] [Medline]
  35. Popay J, Roberts H, Sowden A, Petticrew M, Arai L, Rodgers M, et al. Guidance on the conduct of narrative synthesis in systematic reviews: a product from the ESRC Methods Programme - Version 1. Peninsula Medical School, Universities of Exeter and Plymouth. 2006.   URL: https:/​/www.​​media/​lancaster-university/​content-assets/​documents/​fhm/​dhr/​chir/​NSsynthesisguidanceVersion1-April2006.​pdf [accessed 2021-10-19]
  36. Liang C, Abbott D, Hong Y, Madadi M, White A. Clustering help-seeking behaviors in LGBT online communities: a prospective trial. In: Meiselwitz G, editor. Social Computing and Social Media. Design, Human Behavior and Analytics. Cham: Springer; 2019:345-355.
  37. Li Y, Cai M, Qin S, Lu X. Depressive emotion detection and behavior analysis of men who have sex with men social media. Front Psychiatry 2020;11:830 [FREE Full text] [CrossRef] [Medline]
  38. Saha K, Kim SC, Reddy MD, Carter AJ, Sharma E, Haimson OL, et al. The language of LGBTQ+ minority stress experiences on social media. Proc ACM Hum Comput Interact 2019 Nov;3(CSCW):89 [FREE Full text] [CrossRef] [Medline]
  39. Haimson OL, Veinot TC. Coming out to doctors, coming out to "Everyone": understanding the average sequence of transgender identity disclosures using social media data. Transgend Health 2020;5(3):158-165 [FREE Full text] [CrossRef] [Medline]
  40. Huang G, Cai M, Lu X. Inferring opinions and behavioral characteristics of gay men with large scale multilingual text from blued. Int J Environ Res Public Health 2019 Sep 26;16(19):3597 [FREE Full text] [CrossRef] [Medline]
  41. Zhao Y, Guo Y, He X, Wu Y, Yang X, Prosperi M, et al. Assessing mental health signals among sexual and gender minorities using Twitter data. Health Informatics J 2020 Jun;26(2):765-786 [FREE Full text] [CrossRef] [Medline]
  42. Barrett B, Abraham A, Dean L, Plankey M, Friedman M, Jacobson L, et al. Social inequalities contribute to racial/ethnic disparities in depressive symptomology among men who have sex with men. Soc Psychiatry Psychiatr Epidemiol 2021 Feb;56(2):259-272 [FREE Full text] [CrossRef] [Medline]
  43. Azagba S, Latham K, Shan L. Cigarette smoking, e-cigarette use, and sexual identity among high school students in the USA. Eur J Pediatr 2019 Sep;178(9):1343-1351. [CrossRef] [Medline]
  44. Demant D, Oviedo-Trespalacios O. Harmless? A hierarchical analysis of poppers use correlates among young gay and bisexual men. Drug Alcohol Rev 2019 Jul;38(5):465-472. [CrossRef] [Medline]
  45. Smith DM, Wang SB, Carter ML, Fox KR, Hooley JM. Longitudinal predictors of self-injurious thoughts and behaviors in sexual and gender minority adolescents. J Abnorm Psychol 2020 Jan;129(1):114-121. [CrossRef] [Medline]
  46. Moody T, Feusner J, Reggente N, Vanhoecke J, Holmberg M, Manzouri A, et al. Predicting outcomes of cross-sex hormone therapy in transgender individuals with gender incongruence based on pre-therapy resting-state brain connectivity. Neuroimage Clin 2021;29:102517 [FREE Full text] [CrossRef] [Medline]
  47. Mendos L, Botha K, Lelis R, Tan D, de la Peña E, Savelev I, et al. State-sponsored homophobia : global legislation overview update. ILGA, Geneva. 2020.   URL: https:/​/ilga.​org/​downloads/​ILGA_World_State_Sponsored_Homophobia_report_global_legislation_overview_update_December_2020.​pdf [accessed 2021-10-19]
  48. Poushter J, Kent N. The global divide on homosexuality persists, but increasing acceptance in many countries over past two decades. Pew Research Center. 2020.   URL: https:/​/www.​​global/​wp-content/​uploads/​sites/​2/​2020/​06/​PG_2020.​06.​25_Global-Views-Homosexuality_FINAL.​pdf [accessed 2021-10-19]
  49. Davison K, Queen R, Lau F, Antonio M. Culturally competent gender, sex, and sexual orientation information practices and electronic health records: rapid review. JMIR Med Inform 2021 Feb 11;9(2):e25467 [FREE Full text] [CrossRef] [Medline]
  50. Sokkary N, Awad H, Paulo D. Frequency of sexual orientation and gender identity documentation after electronic medical record modification. J Pediatr Adolesc Gynecol 2021 Jun;34(3):324-327 [FREE Full text] [CrossRef] [Medline]
  51. Lau F, Antonio M, Davison K, Queen R, Bryski K. An environmental scan of sex and gender in electronic health records: analysis of public information sources. J Med Internet Res 2020 Nov 11;22(11):e20050 [FREE Full text] [CrossRef] [Medline]
  52. Canadian Institute for Health Information. In Pursuit of Health Equity: Defining Stratifiers for Measuring Health Inequality - A Focus on Age, Sex, Gender, Income, Education and Geographic Location. Ottawa, ON: CIHI; 2018.
  53. Owen-Smith AA, Woodyatt C, Sineath RC, Hunkeler EM, Barnwell LT, Graham A, et al. Perceptions of barriers to and facilitators of participation in health research among transgender people. Transgend Health 2016;1(1):187-196 [FREE Full text] [CrossRef] [Medline]
  54. Lucassen M, Fleming T, Merry S. Tips for research recruitment: the views of sexual minority youth. J LGBT Youth 2017 Jan 13;14(1):16-30 [FREE Full text] [CrossRef]
  55. Hankivsky O, Grace D. Understanding and emphasizing difference and intersectionality in multimethod and mixed methods research. In: Hesse-Biber SN, Johnson RB, editors. The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry. Oxford, United Kingdom: Oxford University Press; 2015.
  56. Pastrana JL, Reigal RE, Morales-Sánchez V, Morillo-Baro JP, de Mier RJ, Alves J, et al. Data mining in the mixed methods: application to the study of the psychological profiles of athletes. Front Psychol 2019;10:2675 [FREE Full text] [CrossRef] [Medline]
  57. Leavitt A. Human-centered data science: mixed methods and intersecting evidence, inference, and scalability. Annenberg School for Communication & Journalism, University of Southern California. 2016.   URL: [accessed 2021-10-19]
  58. Schiltz F, Masci C, Agasisti T, Horn D. Using regression tree ensembles to model interaction effects: a graphical approach. Appl Econ 2018 Jul 05;50(58):6341-6354 [FREE Full text] [CrossRef]
  59. Oh S. Feature interaction in terms of prediction performance. Appl Sci 2019 Nov 29;9(23):5191 [FREE Full text] [CrossRef]
  60. Han K, Song K, Choi BW. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J Radiol 2016;17(3):339-350 [FREE Full text] [CrossRef] [Medline]
  61. van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol 2014 Dec 22;14:137 [FREE Full text] [CrossRef] [Medline]
  62. Collins GS, Moons KG. Reporting of artificial intelligence prediction models. Lancet 2019 Apr;393(10181):1577-1579. [CrossRef]
  63. CONSORT-AISPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat Med 2019 Oct 24;25(10):1467-1468. [CrossRef] [Medline]
  64. Foulds J, Islam R, Keya K, Pan S. An intersectional definition of fairness. arXiv. 2019 Sep 10.   URL: http:/​/jfoulds.​​papers/​2020/​Foulds%20(2020)%20-%20An%20Intersectional%20Definition%20of%20Fairness%20(ICDE).​pdf [accessed 2019-10-19]

LGBTQ2S+: lesbian, gay, bisexual, transgender, queer or questioning, and two-spirit
ML: machine learning
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Edited by C Lovis; submitted 20.03.21; peer-reviewed by M Cai, K Davison; comments to author 10.07.21; revised version received 02.09.21; accepted 03.10.21; published 11.11.21


©Anasua Kundu, Michael Chaiton, Rebecca Billington, Daniel Grace, Rui Fu, Carmen Logie, Bruce Baskerville, Christina Yager, Nicholas Mitsakakis, Robert Schwartz. Originally published in JMIR Medical Informatics (, 11.11.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.