Feature selection is an essential step in machine learning and is used to reduce the number of input variables for predictive models. This reduces the computational load of model training and may also improve the comprehensibility of the resulting model. There are a variety of feature selection methods available for use and it’s important to understand the differences between them. This article aims to provide an overview of the different feature selection techniques that exist and discusses the strengths and weaknesses of each method. It is intended to serve as a primer for machine learning engineers and other data scientists who want to learn more about the various feature selection methods and their application to predictive modeling problems.
Feature Selection methods seek to find a subset of features that are most likely to predict a target variable in a dataset. These methods are often used to reduce the number of input variables for a predictive model in order to increase its speed and accuracy. In addition, the process of removing non-informative or redundant predictors from a model can help to eliminate false positives and reduce the overall error rate of a classifier. The most common feature selection methods are statistical-based. They evaluate the relationship between a set of features and a desired output variable using statistics and then select the set of features that have the strongest relationship to the target variable. These methods can be very fast and effective, although the choice of statistic depends on the type of input variable (numerical versus categorical) and the type of output variable. Another category of feature selection methods is dimensionality reduction. These reduce the number of input variables by grouping together similar attributes and then removing the redundant ones.
These methods can be useful in cases where there are many features and few samples, such as written text or DNA microarray data. Lastly, there are search-based feature selection methods that search for well-performing subsets of features. These methods use a targeted projection pursuit algorithm to find low-dimensional representations of the dataset and then select the features that have the largest projected values in that space. This approach is particularly useful in classification tasks where the goal is to reduce the dimensionality of the feature space and thus reduce the response time of the classifier. There are three general classes of feature selection techniques: filters, wrapper and embedded.
Filter methods search for well-performing subsets of input variables and then apply a classifier to those features. Wrapper methods combine a filter or search-based feature selection with a learning algorithm to generate better performance results. Embedded methods directly embed the feature selection process inside a learning algorithm. I experimented with several feature selection algorithms on the Arrythmia Heart Disease dataset from Kaggle. I achieved best accuracies of 70-75% with a Random Forest and Naive Bayes classifier. In this dataset, 280 features were used and most methods were able to select the top 20-35 features.