Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Feature selection, also known as subset selection or variable selection, is a process commonly used in machine learning, wherein a subset of the features available from the data are selected for application of a learning algorithm. Feature selection is necessary either because it is computationally infeasible to use all available features, or because of problems of estimation when limited data samples (but a large number of features) are present. The latter problem is related to the so-called curse of dimensionality, which refers to the fact that the number of data samples required to estimate some arbitrary multivariate probability distribution increases exponentially as the number of dimensions in the data increases linearly.
    Simple feature selection algorithms are ad hoc, but there are also more methodical approaches. From a theoretical perspective, it can be shown that optimal feature selection for supervised learning problems requires an exhaustive search of all possible subsets of features of the chosen cardinality. If large numbers of features are available, this is impractical. For practical supervised learning algorithms, the search is for a satisfactory set of features instead of an optimal set. Many popular approaches are greedy hill climbing approaches. Such an approach evaluates a possible subset of features and then modifies that subset to see if an improved subset can be found. Evaluation of subsets can be done many ways - some metric is used to score the features, and possibly the combination of features. Since exhaustive search is generally impractical, at some stopping point, the subset of features with the highest scores by the metric will be selected. The stopping point varies by algorithm.

    Two popular metrics for classification problems are correlation and mutual information. These metrics are computed between a candidate feature (or set of features) and the desired output category.

    In statistics the most popular form of feature selection is called stepwise regression. It is a greedy algorithm that adds the best feature (or deletes the worst feature) at each round. The main control issue is deciding when to stop the algorithm. In machine learning, this would typically be done by cross validation. In statistics, some criteria would be optimized. However, this leads to the inherit problem of nesting. More robust methods have been explored, such as Branch and Bound and Piecewise Linear Networks.


        Feature selection
            Optimality criteria
            See also

    top

    Optimality criteria

    There are a variety of optimiality criteria that can be used for controlling feature selection. The oldest would be Collin Mallow's C-p statistic or Akaike information criterion (AIC). These both add variables if the t statistic is bigger than sqrt. Other criteria are, BIC (which uses sqrt), MDL (which asympotically uses sqrt but some argue this asymptotic isn't computed correctly), Bonnferroni / RIC (which use sqrt), and a variety of new criteria that are movitated by FDR (which use something close to sqrt).

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Feature selection". link