Predicting phenotypes of asthma and eczema with machine learning

BMC Medical Genomics

Table 2 Comparison of machine learning methods.

outcome	Model	AUROC	sensitivity (at 90% specificity)	sensitivity (at 80% specificity)	accuracy
Doctor's Diagnosed Eczema	Decision Tree*	0.57 (0.04)	0.15 (0.07)	0.29 (0.07)	0.78 (0.02)
	Random Forest	0.64 (0.03)	0.2 (0.06)	0.34 (0.07)	0.79 (0.02)
	Logistic Regression	0.59 (0.04)	0.18 (0.06)	0.31 (0.08)	0.78 (0.02)
	One Rule*	0.58 (0.06)	0.2 (0.11)	0.3 (0.15)	0.79 (0.02)
	AdaBoost	0.58 (0.04)	0.17 (0.06)	0.3 (0.07)	0.78 (0.02)
Current Asthma	Decision Tree*	0.72 (0.06)	0.39 (0.12)	0.54 (0.11)	0.85 (0.02)
	Random Forest	0.84 (0.03)	0.55 (0.09)	0.72 (0.08)	0.87 (0.02)
	Logistic Regression	0.79 (0.04)	0.45 (0.08)	0.63 (0.08)	0.86 (0.02)
	One Rule*	0.76 (0.06)	0.44 (0.09)	0.61 (0.11)	0.86 (0.02)
	AdaBoost	0.81 (0.04)	0.48 (0.09)	0.66 (0.07)	0.86 (0.02)
Current Wheeze	Decision Tree*	0.62 (0.06)	0.27 (0.1)	0.36 (0.11)	0.88 (0.02)
	Random Forest	0.76 (0.04)	0.47 (0.09)	0.6 (0.09)	0.89 (0.02)
	Logistic Regression	0.72 (0.04)	0.34 (0.08)	0.51 (0.08)	0.88 (0.02)
	One Rule*	0.69 (0.06)	0.33 (0.09)	0.49 (0.12)	0.88 (0.02)
	AdaBoost	0.73 (0.04)	0.32 (0.09)	0.5 (0.09)	0.88 (0.02)

Performance of machine learning models on different outcomes using the full set of demographic, environmental, genetic (single nucleotide polymorphisms), allergen sensitisation, and lung functions variables. Results are mean (standard deviation) values estimated from out-of-bag distributions across 100 bootstrap runs.
^*difference in AUROC significantly shifted from zero at the 0.05 level as compared to that of a random forest. AUROC: area under the receiver operating characteristic curve.

ISSN: 1755-8794