Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Bootstrap aggregating (bagging) is a meta-algorithm to improve classification and regression models in terms of stability and classification accuracy. Bagging also reduces variance and helps to avoid overfitting. Although this method is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.
    Given a standard training set D of size N, we generate L new training sets D_i also of size N by sampling examples uniformly from D, and with replacement. By sampling with replacement it is likely that some examples will be repeated in
    each D_i. On average the set D_i will have 63.2% of the examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. The L models are fitted using the above L bootstrap samples and combined by averaging the output (in case of regression) or voting (in case of classification). One particular interesting point about bagging is that, since the method averages several predictors, it is not useful to improve linear models.


        Bootstrap aggregating
            Example: Ozone data
            History
            See also

    top

    Example: Ozone data
    This example is rather artificial, but is intended to illustrate the principles.

    Rousseeuw and Leroy (1986) describe a data set concerning ozone levels. The data are available via the classic data sets page. All computations were performed in R.

    A scatter plot reveals an apparently non-linear relationship between temperature and ozone. One way to model the relationship is to use a loess smoother. Such a smoother requires that a span parameter be chosen. In this example, a span of 0.5 was used.

    One hundred bootstrap samples of the data were taken, and the loess smoother was fit to each sample. Predictions from these 100 smoothers were then made across the range of the data. The first 10 predicted smooth fits appear as grey lines in the figure below. The lines are clearly very wiggly and they overfit the data - a result of the span being too low.

    The red line on the plot below represents the mean of the 100 smoothers. Clearly, the mean is more stable and there is less overfit. This is the bagged predictor.



    top

    History

    Bagging (Bootstrap aggregating) was proposed by Leo Breiman in 1996 to improve the classification by combining classifications of randomly generated training sets.

    top

    See also



     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Bootstrap aggregating". link