Article Details

Feature Selection of High Dimensional Big Data of Gene Expression for Cancer Dataset | Original Article

Prem Kumar Chandrakar A. K. Shrivas in Anusandhan (RNTUJ-AN) | Multidisciplinary Academic Research

ABSTRACT:

Feature selection is an essential data preprocessing technique for such high-dimensional data classification tasks. Traditional dimensionality reduction approach falls into two categories: Feature Extraction (FE) and Feature Selection (FS). The microarray technology has capability to determine the levels of thousands of gene simultaneously in a single experiment. The major challenge to analyze gene expression data, with a large number of genes and small samples, is to extract disease-related information from a massive amount of redundant data and noise. Analysis of gene expression is important in many fields of biological research in order to retrieve the required information. As time progresses, the illness in general and cancer in particular have become more and more complex and complicated, in detecting, analyzing and curing. We know cancer is deadly disease. Cancer research is one of the major area of research in medical field. Predicting precisely of different tumor types is a great challenge and providing accurate prediction will have great value in providing better treatment to the patients. To achieve this, data mining algorithms are important tools and the most extensively used approach to achieve important feature of gene expression data and plays an important role for gene classification. Gene expression profiles, which represent the state of a cell at a molecular level, has greatpotential as a medical diagnosis tool. But compared to the number of genes involved, available trainingdata sets generally have a fairly small sample size for classification. These training data limitationsconstitute a challenge to certain classification methodologies. Feature selections techniques can be usedto extract the marker genes which influence the classification accuracy effectively by eliminating the unwanted noisy and redundant genes. One of major challenges is to discover how to extract useful information from huge datasets. Gene selection, eliminating redundant and irrelevant genes, has been a key step to address this problem.This paper presents a various  of feature selection techniques that have been employed in micro array data based cancer classification and presents recent advances in the machine learning based gene expression data analysis with different feature selection algorithms.