Background: Fever of unknown origin (FUO) is a group of diseases with heterogeneous complex causes that are misdiagnosed or have delayed diagnoses. Previous studies have focused mainly on the statistical analysis and research of the cases. The treatments are very different for the different categories of FUO. Therefore, how to intelligently diagnose FUO into one category is worth studying.
Objective: We aimed to fuse all of the medical data together to automatically predict the categories of the causes of FUO among patients using a machine learning method, which could help doctors diagnose FUO more accurately.
Methods: In this paper, we innovatively and manually built the FUO intelligent diagnosis (FID) model to help clinicians predict the category of the cause and improve the manual diagnostic precision. First, we classified FUO cases into four categories (infections, immune diseases, tumors, and others) according to the large numbers of different causes and treatment methods. Then, we cleaned the basic information data and clinical laboratory results and structured the electronic medical record (EMR) data using the bidirectional encoder representations from transformers (BERT) model. Next, we extracted the features based on the structured sample data and trained the FID model using LightGBM.
Results: Experiments were based on data from 2299 desensitized cases from Peking Union Medical College Hospital. From the extensive experiments, the precision of the FID model was 81.68% for top 1 classification diagnosis and 96.17% for top 2 classification diagnosis, which were superior to the precision of the comparative method.
Conclusions: The FID model showed excellent performance in FUO diagnosis and thus would be a potentially useful tool for clinicians to enhance the precision of FUO diagnosis and reduce the rate of misdiagnosis.
Fever is one of the most common symptoms in medicine . A febrile temperature may boost the immune system to fight disease [ ]. Prolonged fevers are usually complex to diagnose [ ]. A fever of unknown origin (FUO) has remained a challenging diagnostic problem in recent decades [ ].
As there are more than 200 causes of FUO , isolating the cause of an FUO is a great challenge for clinicians. Thus, many clinicians have been drawn to FUO research [ ]. In 1961, Petersdorf and Beeson [ ] defined FUO. There are usually 3 characteristics: (1) prolonged fever for more than 3 weeks, (2) recurrent fever with a temperature higher than 38.3℃, and (3) undiagnosed fever after a 1-week inpatient investigation [ , ]. The definition has been revised over time with regard to the classification and the duration of fever to be diagnosed [ , ]. The classification of FUO has been hotly debated in previous studies [ , ]. Usually, the categories of causes are infections, immune diseases, and tumors [ , ], and their treatment methods are considerably different, including anti-infection medication, hormones, and chemotherapy, respectively. Therefore, if the cause of an FUO is diagnosed to one category, regardless of the disease causing the FUO, the treatment direction can basically be determined, which would be meaningful for doctors. Infection is the most common cause of FUO [ ]. Chow and Robinson [ ] found that for more than one-half of children with FUO, the FUO was caused by an infection. Knockaert et al [ ] proposed that the wait-and-see strategy may be better for prolonged prognosis in adults. Therefore, FUO is worth studying and exploring. Previous studies on FUO have mainly analyzed real patients' cases. de Kleijn and von der Meer [ ] assessed 53 patients with FUO using a statistical analysis tool called PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [ ]. de Kleijn et al [ ] analyzed 167 patients with FUO using fixed criteria. Similarly, Efstathiou et al [ ] discriminated FUO into infectious and noninfectious causes. While these research results might explain how these patients with FUO were diagnosed, it was difficult to automatically determine the method because the limited amount of data sometimes caused overfitting. Little research on FUO has been done using machine learning methods.
In this paper, we proposed an FUO intelligent diagnosis (FID) model to classify the causes of FUO into 4 categories based on clinical data. Extensive experiments showed good performance of the model. In summary, we made the following contributions: (1) we innovatively introduced an FUO intelligent classification diagnosis model called FID, which can automatically group FUO cases into one of the categories of causes, (2) our experiments were based on real, desensitized data from Peking Union Medical College Hospital and thus, the cases were real and the results were more valuable and credible when applied to a clinical setting, and (3) we conducted extensive experiments to evaluate the performance of the FID based on the gradient boosting methods LightGBM and XGBoost, which performed better on small data sets; the FID model achieved better performance using LightGBM.
In this paper, we proposed the FID model using LightGBM to intelligently diagnose patients with FUO into 1 of 4 causes using their basic information, clinical laboratory data, and electronic medical record (EMR) data. The structure of the FID model is shown in. First, we classified the causes of FUO into 4 categories: infections, immune diseases, tumors, and others. Then, we cleaned the basic information and laboratory data and structured the EMR data via 2 methods: the bidirectional encoder representations from transformers (BERT) model [ ] and Jieba [ ]. Next, we extracted the features and trained the model through extensive experiments. Finally, by comparing the experimental results, we evaluated the performance of the FID model.
This research was based on data from 2299 desensitized FUO cases from Peking Union Medical College Hospital from June 1, 2012, to March 31, 2018, and the data filtering process is shown in. The data contained basic information, laboratory results, and EMR data. One patient visit was taken as 1 sample; if a patient was admitted to the hospital twice, then the 2 visits were taken as 2 samples. There were 3723 total cases whose chief complaint included “fever.” First, we filtered out the 52 cases not eligible for FUO diagnosis, such as those with temperatures lower than 38.3°C. Then, we invited 3 doctors specializing in FUO diagnosis to help check the data and classifications. Two doctors divided the remaining cases into 4 categories: infections, immune diseases, tumors, and others. The third doctor checked the classifications, and if there were disagreements, the 3 doctors discussed the cases until they obtained a consistent classification. Based on the doctors’ suggestion, 1372 cases that had no confirmed diagnosis were filtered out. As a result, 2299 cases whose causes fit into 1 of the 4 categories remained for the experiments. Infections, immune diseases, tumors, and others accounted for 52.28% (1202/2299), 36.50% (839/2299), 7.83% (180/2299), and 3.39% (78/2299) of cases, respectively. The distribution of the data is shown in . We randomly divided 80% of the data into the training set and the remaining 20% into the validation set, and we used the validation set data to evaluate the performance of the model.
After obtaining the data marked by doctors, we needed to clean the basic information and laboratory data and obtain the structured EMR data. The EMR data were unstructured. To structure the EMR data, we used the BERT model. The BERT model was proposed by Devlin et al  in 2018, and it has greatly improved the text structuring process. We used the BERT-based pretrained model to process the text in this study. The input of BERT was all of the EMR text and the text in each line belonging to 1 patient. Each line in the output of BERT was a number vector of 768 dimensions. In addition, we compared the results based on BERT and Jieba. Using Jieba, we segmented all the text data and chose the top 100 text segmentations according to the counts of the words that occurred in all the text, such as “fever,” “infected,” and “lymph node.” In addition, stop words were filtered out manually. Then, for each of the 100 text segmentations, if it existed in the EMR text of 1 patient, we used 1 to represent the segmentation; otherwise, we used 0. Finally, the text of each patient was expressed by a vector of 100 dimensions.
The data were irregular after being extracted from the clinical system. Regarding the laboratory results, each item group consisted of too many cell items; therefore, laboratory data could be a thousand dimensions. However, there were some items that occurred only a few times, which were difficult for the model to learn. Therefore, we filtered out the items occurring less than 10 times. Finally, 214 items were left for the laboratory data. Then, we transformed all the values and units to be consistent. Regarding the synonyms, we fixed them to be the same. For example, “1 L” and “1000 ml” were fixed to “1000 mL.”
Feature engineering is the most important part of machine learning. The performance of a model is determined by feature engineering to a large extent. In this paper, there were 3 kinds of features: numerical features, categorical features, and text features.
Numerical features were objective data, such as age and heart rate. For the numerical features, we directly extracted the values as the features.
Categorical features were the classification indicators, such as gender and positive and negative symptoms. Categorical features were addressed using the LabelEncoder method  with numbers starting from 0, and then numbers were assigned to the features.
Text features, which could also contribute necessary information, were mainly from the EMR. They were structured in the previous step. Then, we merged the structured EMR features with all of the features. We used 2 methods to structure the text: using Jieba, the top 100 segmentations were selected as the features according to the counts of every word segment, and using BERT, all the text was structured as 768 features.
We used the LightGBM framework to train the model. LightGBM is a gradient boosting framework that is based on decision trees and has faster training efficiency, lower memory usage, and higher precision than other frameworks, as well as a large-scale data processing capability. During the training of LightGBM, there were many parameters that needed to be optimized, including the number of iterations, the learning rate, and the number of leaf nodes. During the experiments, we found that different parameters impacted the final results of the model to some extent. In this research, the parameters we used were a learning rate of 0.01 and number of leaves of 6 after the experiments. The output of the multiclassification model was the probability that each sample belonged to each category. Therefore, the classification with the largest prediction probability was generally regarded as the sample classification.
Based on different features, we built 3 models: FID(1) was based on the basic information and laboratory features; FID(2) was based on the basic information, laboratory data, and EMR data using Jieba; and FID(3) was based on the basic information, laboratory data, and EMR data using BERT. The results are shown in. The figure shows that the precision was further improved by increasing the number of features from the text data. After introducing the text features developed by BERT, we obtained the optimal model: FID(3). The precision was 81.68%. Experiments show that, in addition to structured data, unstructured text data also hides a lot of valuable information, which could improve the performance of the model.
In addition to the LightGBM algorithm, we also tested XGBoost, an algorithm with a similarly good performance as LightGBM. Both are gradient boosting algorithms. As shown in, the precision of LightGBM was higher than that of XGBoost with relatively fewer iterations. When the number of iterations increased, the precision of XGBoost exceeded that of LightGBM, but it was still lower than the best performance of LightGBM. Therefore, we used LightGBM as the training algorithm for the subsequent experiments in this study.
In fact, in many cases, it is difficult for a model to make a completely accurate decision. The greatest value of a model is that it can provide more suggestions for decision making. If the classification with the largest prediction probability is not accurate but the top 2 classifications are accurate, it could also provide considerable help for doctors. Here, we evaluated the precision of the 2 classifications with the largest prediction probabilities. In, we can see that the precision of the first 2 categories of the model was 96.17% and there were few mistakes.
In addition, we explored the patient distribution according to average age and gender, as shown in. We observed that for FUO caused by tumors, the average age of patients was the highest at 50.91 years. Regarding FUO caused by immune diseases, the gender distribution showed a large difference, with 34.93% (803/2299) of patients being male and 65.07% (1496/2299) of patients being female. For the other 3 categories of FUO, there were no obvious differences in gender.
|Patient demographics (N=2299)|
|Categories of cause of fever of unknown origin||Mean age (years)||Male, n (%)||Female, n (%)|
|Infections||47.79||1226 (53.33)||1073 (46.67)|
|Immune diseases||42.90||803 (34.93)||1496 (65.07)|
|Tumors||50.91||1201 (52.24)||1098 (47.76)|
|Others||38.81||1268 (55.15)||1031 (44.85)|
Usually, before doctors diagnose the cause of a patient’s FUO, they need to make an appointment to examine the patient.shows the top 10 laboratory measurements that were associated with an FUO cause diagnosis. Percentage of basophils was the measurement that showed the strongest correlation with the FUO cause diagnosis. The other 9 measurements were percentage of large unstained cells, age, fibrinogen level, thrombin time, alkaline phosphatase level, direct bilirubin level, blood sodium level, 24-hour urine volume, and lymphocyte count. The top 10 measurements could be provided to doctors as laboratory appointment decision support.
|Number||Top 10 laboratory measurements|
|1||Percentage of basophils|
|2||Percentage of large unstained cells|
|6||Alkaline phosphatase level|
|7||Direct bilirubin level|
|8||Blood sodium level|
|9||24-hour urine volume|
Intelligent diagnosis of the cause category of FUO is significant and practical. With the rapid development of information technology, big data has been the focus of many fields in recent years . Volume, variety, velocity, and value are the 4 “V” characteristics, although mining the deep value of data sets is the most important aspect for big data research. Similarly, in the medical field, large amounts of health care data are produced every day, such as EMRs, laboratory results, and images [ ]. Considerable amounts of precious information can be extracted and mined from medical data using the proper methods. Previously, experts manually identified and analyzed the meaning of health data [ ], which was time-consuming and difficult to identify specifically. Medical big data is increasingly more accepted by doctors because of its high efficiency and lower costs [ ]. Rodger [ ] mentioned that not only was a data extraction system needed, but medical big data applications such as those for clinical decision support were in urgent demand. Medical big data applications help make the medical process easier and friendlier for patients and relieve the pressure on clinicians. Among patients with FUO, the proportion of undiagnosed patients is approximately 20.5% [ ]. In particular, the treatment for FUO may be much different, even contrary, for different causes. Therefore, helping doctors discover the specific cause as soon as possible is meaningful.
We addressed the problem using more appropriate and superior methods. Currently, medical big data applications have explored many directions [, ]. The different kinds of methods used can be divided into 4 types: data mining, image recognition, natural language processing (NLP), and speech recognition. For example, intelligent diagnosis with the data excluding images [ , ] and intelligent early warnings [ ] are both data mining problems. For image recognition [ ], Simonyan and Zisserman [ ] examined very deep convolutional networks and achieved superior performance. NLP is mainly used in the structured analysis of EMRs [ , ]. In this research, NLP also played an important role in structuring the text data, and we processed the EMR data using BERT and Jieba. Speech recognition [ ], mainly addressed using recurrent neural networks, could help doctors transfer voice to text with high efficiency. Regarding medical big data models, there are unsupervised learning models and supervised learning models [ ], such as logistic regression, decision tree, deep learning, and others. Currently, supervised learning models are used more often because of the sensitivity of the data to medical knowledge. As most problems in medical big data are classification problems, decision trees and deep learning models could achieve better performance. Wu et al [ ] exploited the diagnosis of hypocellular myelodysplastic syndrome and aplastic anemia, and their experiments showed that the decision tree model outperformed the others in classification. Most importantly, deep learning methods require very large data sets [ ], usually millions of data sets. Since there were only 2299 cases in this study, gradient boosting methods were better for this research. Therefore, LightGBM and XGBoost were used to train the data. In addition, 1372 cases not clearly diagnosed were removed from the study, and this kind of case would exist in the real-world setting. Therefore, the precision might be lower in reality, and these cases should be taken into account in future work.
A machine learning method was innovatively introduced into FUO diagnosis. We presented the FUO intelligent diagnosis model called the FID, which was based on basic information, laboratory data, and EMR data from Peking Union Medical College Hospital. After cleaning the disordered data and structuring the text data using BERT, we conducted many experiments on the sample data and compared the performances from several angles. The results showed that the FID outperformed the comparative methods. As the treatments for FUO from different causes are very different, intelligently diagnosing an FUO into a category is meaningful. Our research was based on data from 1 hospital, and we intelligently diagnosed the FUO to 1 category of causes. In the future, we will focus on predicting the exact cause of an FUO using multicenter data. We would include all cases, including cases with no confirmed diagnosis, in our future research to better match real-world scenarios, which would probably improve the method more practical for the real clinical process.
HJ, YL, and WZ contributed to the study concept and design. CZ, YL, XZ, JZ, and NX contributed to the acquisition of the data set. HJ, YL, and WZ consulted on the analyses. All authors interpreted the results, contributed to the manuscript, and approved the final draft.
This study was supported by the National Key Research and Development Program of China (project 2018YFC0116905) and the CAMS Innovation Fund for Medical Sciences (CIFMS) project 2016-I2 M-2-004.
Conflicts of Interest
- Ifesinachi P. Mechanisms of fever in humans. International Journal of Microbiology and Immunology Research 2013:43 [FREE Full text]
- Evans SS, Repasky EA, Fisher DT. Fever and the thermal regulation of immunity: the immune system feels the heat. Nat Rev Immunol 2015 May 15;15(6):335-349. [CrossRef]
- Cunha BA, Lortholary O, Cunha CB. Fever of Unknown Origin: A Clinical Approach. The American Journal of Medicine 2015 Oct;128(10):1138.e1-1138.e15. [CrossRef]
- Keidar Z, Gurman-Balbir A, Gaitini D, Israel O. Fever of Unknown Origin: The Role of 18F-FDG PET/CT. Journal of Nuclear Medicine 2008 Nov 07;49(12):1980-1985. [CrossRef]
- Cunha CB. Prolonged and Perplexing Fevers in Antiquity: Malaria and Typhoid Fever. Infectious Disease Clinics of North America 2007 Dec;21(4):857-866. [CrossRef]
- Petersdorf RG, Beeson PB. Fever of unexplained origin: Report on 100 cases. Medicine 1961;40(1):1-30. [CrossRef] [Medline]
- Fusco FM, Pisapia R, Nardiello S, Cicala SD, Gaeta GB, Brancaccio G. Fever of unknown origin (FUO): which are the factors influencing the final diagnosis? A 2005-2015 systematic review. BMC Infect Dis 2019 Jul 22;19(1):653 [FREE Full text] [CrossRef] [Medline]
- Durack DT, Street AC. Fever of unknown origin--reexamined and redefined. Curr Clin Top Infect Dis 1991;11:35-51. [Medline]
- Knockaert DC, Vanderschueren S, Blockmans D. Fever of unknown origin in adults: 40 years on. J Intern Med 2003 Mar;253(3):263-275. [CrossRef]
- Pasic S, Minic A, Djuric P, Micic D, Kuzmanovic M, Sarjanovic L, et al. Fever of unknown origin in 185 paediatric patients: A single-centre experience. Acta Paediatrica 2006 Apr 1;95(4):463-466. [CrossRef]
- Zhou W, Tan X, Li Y, Tan W. Human Herpes Viruses Are Associated with Classic Fever of Unknown Origin (FUO) in Beijing Patients. PLoS ONE 2014 Jul 3;9(7):e101619. [CrossRef]
- Wright WF, Auwaerter PG. Fever and Fever of Unknown Origin: Review, Recent Advances, and Lingering DogmaC. Open Forum Infectious Diseases. US: Oxford University Press 2020;7(5). [CrossRef]
- Efstathiou SP, Pefanis AV, Tsiakou AG, Skeva II, Tsioulos DI, Achimastos AD, et al. Fever of unknown origin: Discrimination between infectious and non-infectious causes. European Journal of Internal Medicine 2010 Apr;21(2):137-143. [CrossRef]
- Burzo ML, Antonelli M, Pecorini G, Favuzzi AM, Landolfi R, Flex A. Fever of unknown origin and splenomegaly. Medicine 2017;96(50):e9197. [CrossRef]
- Chow A, Robinson JL. Fever of unknown origin in children: a systematic review. World J Pediatr 2010 Dec 30;7(1):5-10. [CrossRef]
- de Kleijn EM, van der Meer JW. Fever of unknown origin (FUO): report on 53 patients in a Dutch university hospital. Neth J Med 1995 Aug;47(2):54-60. [CrossRef] [Medline]
- Transparent reporting of systematic reviews and meta-analyses. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). URL: http://prisma-statement.org/PRISMAStatement/HistoryAndDevelopment [accessed 2020-11-24]
- de Kleijn EMHA, Vandenbroucke JP, van der Meer JWM. Fever of Unknown Origin (FUO): I. A prospective multicenter study of 167 patients with FUO, using fixed epidemiologic entry criteria. Medicine 1997;76(6):392-400. [CrossRef]
- Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint posted online May 24, 2019. [FREE Full text]
- Peng KH, Liou LH, Chang CS, Lee D. Predicting personality traits of Chinese users based on Facebook wall posts. 2015 Presented at: Proceedings of the 2015 24th Wireless and Optical Communication Conference (WOCC); October 23-24, 2015; Taipei, Taiwan p. 9-14. [CrossRef]
- Bisong E. Introduction to Scikit-learn. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress; 2019:215-229.
- Chen M, Mao S, Liu Y. Big Data: A Survey. Mobile Netw Appl 2014 Jan 22;19(2):171-209. [CrossRef]
- Lee CH, Yoon H. Medical big data: promise and challenges. Kidney Res Clin Pract 2017 Mar 31;36(1):3-11. [CrossRef]
- Siuly S, Zhang Y. Medical Big Data: Neurological Diseases Diagnosis Through Medical Data Analysis. Data Sci. Eng 2016 Jul 27;1(2):54-64. [CrossRef]
- Roski J, Bo-Linn GW, Andrews TA. Creating Value In Health Care Through Big Data: Opportunities And Policy Implications. Health Affairs 2014 Jul;33(7):1115-1122. [CrossRef]
- Rodger JA. Discovery of medical Big Data analytics: Improving the prediction of traumatic brain injury survival rates by data mining Patient Informatics Processing Software Hybrid Hadoop Hive. Informatics in Medicine Unlocked 2015;1:17-26. [CrossRef]
- Sun J, Reddy CK. Big data analytics for healthcare. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013 Presented at: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining; 2013; Chicago, Illinois, USA p. 1525. [CrossRef]
- Manogaran G, Thota C, Lopez D, Vijayakumar V, Abbas KM, Sundarsekar R. Big data knowledge system in healthcare. In: Bhatt C, Dey N, Ashour AS, editors. Internet of things and big data technologies for next generation healthcare. Heidelberg, Germany: Springer; 2017:157.
- Jia F, Lei Y, Lin J, Zhou X, Lu N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mechanical Systems and Signal Processing 2016 May;72-73:303-315. [CrossRef]
- Avci E, Turkoglu I. An intelligent diagnosis system based on principle component analysis and ANFIS for the heart valve diseases. Expert Systems with Applications 2009 Mar;36(2):2873-2878. [CrossRef]
- Su H, Wen Z, Wu Z. Study on an Intelligent Inference Engine in Early-Warning System of Dam Health. Water Resour Manage 2011 Jan 15;25(6):1545-1563. [CrossRef]
- Simonyan K, Zisserman A. Very deep convolutional networks for large- scale image recognition. arXiv Preprint posted online April 10, 2015. [FREE Full text]
- Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc 2011 Sep 01;18(5):539-539. [CrossRef]
- Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks. In: IEEE. 2013 Presented at: 2013 IEEE international conference on acoustics, speech and signal processing; 2013; Vancouver, BC, Canada. [CrossRef]
- Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2017 Jun 21;2(4):230-243. [CrossRef]
- Wu J, Zhang L, Yin S, Wang H, Wang G, Yuan J. Differential Diagnosis Model of Hypocellular Myelodysplastic Syndrome and Aplastic Anemia Based on the Medical Big Data Platform. Complexity 2018 Nov 12;2018:1-12. [CrossRef]
- Kamilaris A, Prenafeta-Boldú FX. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 2018 Apr;147:70-90. [CrossRef]
|BERT: bidirectional encoder representations from transformers|
|EMR: electronic medical record|
|FID: fever of unknown origin intelligent diagnosis|
|FUO: fever of unknown origin|
|NLP: natural language processing|
|PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses|
Edited by G Eysenbach; submitted 16.09.20; peer-reviewed by B Qian, L Wang; comments to author 06.10.20; revised version received 18.10.20; accepted 28.10.20; published 30.11.20Copyright
©Huizhen Jiang, Yuanjie Li, Xuejun Zeng, Na Xu, Congpu Zhao, Jing Zhang, Weiguo Zhu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 30.11.2020.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.