Optimal Algorithm for Metabolomics Classification and Feature Selection varies by Dataset

Jr, Charles E. Determan (2014) Optimal Algorithm for Metabolomics Classification and Feature Selection varies by Dataset. International Journal of Biology, 7 (1). pp. 100-115. ISSN 1916-9671

[thumbnail of 41816-148681-1-PB.pdf] Text
41816-148681-1-PB.pdf - Published Version

Download (407kB)

Abstract

Metabolomics, the systematic identification and quantification of all metabolites in a biological system, is increasingly applied towards identification of biomarkers for disease diagnosis, prognosis and risk prediction. Applications of metabolomics extend across the health spectrum including Alzheimer's, cancer, diabetes, and trauma. Despite the continued interest in metabolomics there are numerous techniques for analyzing metabolomics datasets with the intent to classify group membership (e.g. Control or Treated). These include Partial Least Squares Discriminant Analysis, Support Vector Machines, Random Forest, Regularized Generalized Linear Models, and Prediction Analysis for Microarrays. Each classification algorithm is dependent upon different assumptions and can potentially lead to alternate conclusions. This project seeks to conduct an in depth comparison of algorithm performance on both simulated and real datasets to determine which algorithms perform best given alternate dataset structures. Three simulated datasets were generated to validate algorithm performance and mimic 'real' metabolomics data: (Han et al., 2011) independent null dataset (no correlation, no discriminatory variables), (Davis, Schiller, Eurich, & Sawyer, 2012) correlated null (no discriminating variables), (Guan et al., 2009) correlated discriminatory. This comparison is also applied to 3 open-access datasets including two Nuclear Magnetic Resonance (NMR) and one Mass Spectrometry (MS) dataset. Performance was evaluated based on the Robustness-Performance-Trade-off (RPT) incorporating a balance between model classification accuracy and feature selection stability. We also provide a free, open-source R Bioconductor package (OmicsMarkeR) that conducts the analyses herein. The proposed work provides an important advancement in metabolomics analysis and helps alleviate the confusion of potentially paradoxical analyses thereby leading to improved exploration of disease states and identification of clinically important biomarkers.

Item Type: Article
Subjects: Academic Digital Library > Biological Science
Depositing User: Unnamed user with email info@academicdigitallibrary.org
Date Deposited: 23 May 2023 05:24
Last Modified: 17 Jan 2024 04:21
URI: http://publications.article4sub.com/id/eprint/1603

Actions (login required)

View Item
View Item