Optimizing Decision Tree Ensembles for Gene-Gene Interaction Detection
-
- $4.99
-
- $4.99
Publisher Description
In recent years, genome-wide association studies (GWAS) have been dedicated to unraveling the genetic etiology of complex diseases. It is widely accepted that most common diseases such as neurodegenerative diseases (e.g., Alzheimer's and Parkinson's diseases), cardiovascular diseases, various cancers, diabetes and osteoporosis are the results of multiple genes, their interactions, environmental factors, and gene-by-environment interactions and thus cannot be explained by a simple Mendelian inheritance model. Consequently, the study of dissecting gene-gene and/or gene-environment interactions involved in complex diseases/traits has become an active research topic in computational genomics. However, high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions, make most existing interaction detection strategies practically inapplicable. Because they are capable of capturing interactions among input variables in addition to the nonlinear effects, decision trees and their ensembles have been recently demonstrated to be effective strategies in detecting interactions in GWAS data. However, an individual decision tree (DT) is highly susceptible to some major limitations, most importantly high variance error, data fragmentation and representational problems, which make them unreliable for use in feature selection in a stand-alone fashion. Ensemble approaches have been proposed to increase the robustness of weak learners such as DTs, by using multiple different and potentially complementary representations of the data. Some of the limitations of individual decision trees would still exist in the ensemble level which may impact their interaction detection performance.