Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    For alternate meanings see Perceptron (disambiguation).The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.



        Perceptron
            Definition
            Learning algorithm
            Variants
            History
            See also

    top

    Definition
    The perceptron is a kind of binary classifier that maps its input x (a real-valued vector in the simplest case) to an output value f(x) calculated as


    f(x) = langle w,x
    angle + b


    where w is a vector of real-valued weights and langle cdot,cdot
    angle denotes dot product. (We use the dot product as we are computing a weighted sum). b is the 'bias', a constant term that does not depend on any input value.

    The sign of f(x) is used to classify x as either a positive or a negative instance, in the case of a binary classification problem. The bias can be thought of as offsetting the activation function, or giving the output neuron a "base" level of activity. If b is negative, then the weighted combination of inputs must produce a positive value greater than -b in order to push the classifier neuron over the 0 threshold. Spatially, the bias alters the position (though not the orientation) of the decision boundary.

    Since the inputs are fed directly to the output unit via the weighted connections, the perceptron can be considered the simplest kind of feed-forward neural network.

    top

    Learning algorithm
    In order to describe the training procedure, let D_m = denote a training set of m examples
    where x_i denotes the input and d_i denotes the desired output for the input of the i-th example. We use y_i to refer to the output of the network presented with training example x_i. For convenience, we assume binary values for the desired outputs d_i, i.e.
    d_i=1 for positive examples and d_i=0 for negative ones.

    When the training set D_m is linearly separable, there exists a weight vector w such that y_i = langle w, x_i
    angle +b > 0 for all i,
    the perceptron can be trained by a simple one-line learning algorithm in which examples are presented iteratively and corrections to the weight vectors are made each time a mistake occurs (learning by examples). The correction to the weight vector when a mistake occurs is eta (d_i - y_i) x_i (with learning rate eta ). Novikoff (1962) proved that this algorithm converges after a finite number of iterations.

    However, if the training set is not linearly separable, the above online algorithm will never converge.

    top

    Variants






    The pocket algorithm with ratchet (Gallant, 1990) solves the stability problem of perceptron learning by keeping the best solution seen so far "in its pocket". The pocket algorithm then returns the solution in the pocket, rather than the last solution.

    The alpha-perceptron further utilised a preprocessing layer of fixed random weights, with thresholded output units. This enabled the perceptron to classify patterns, by projecting them into a binary space. In fact, for a projection space of sufficiently high dimension, patterns can become linearly separable.

    As an example, consider the case of having to classify data into two classes. Here is a small such dataset, consisting of two points coming from two Gaussian distributions.




    A linear classifier can only separate things with a hyperplane, so it's not possible to perfectly classify all the examples. On the other hand, we may project the data into a large number of dimensions. In this case a random matrix was used to project the data linearly to a 1000-dimensional space; then each resulting data point was transformed through the hyperbolic tangent function. A linear classifier can then separate the data, as shown in the third figure. However the data may still not be completely separable in this space, in which the perceptron algorithm would not converge. In the example shown, stochastic steepest gradient descent was used to adapt the parameters.

    It should be kept in mind, however, that the best classifier is not necessarily that which classifies all the training data perfectly. Indeed, if we had the prior constraint that the data come from equi-variant Gaussian distributions, the linear separation in the input space is optimal.

    Other training algorithms for linear classifiers are possible: see, e.g., support vector machine and logistic regression.


    top

    History
    Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This led to the field of neural network research stagnating for many years, before it was recognised that a feedforward neural network with three or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer (also called a single layer perceptron) or two.
    Single layer perceptrons are only capable of learning linearly separable patterns; in 1969 a famous monograph entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. They conjectured (incorrectly) that a similar result would hold for a perceptron with three or more layers. 3 years later Stephen Grossberg published a series of papers introducing networks capable of modelling differential, contrast-enhancing and XOR functions. (The papers were published in 1972 and 1973, see e.g.: Grossberg, Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52 (1973), 213-257, online *). Nevertheless the often-cited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until the neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected.

    More recently, interest in the perceptron learning algorithm has increased again after Freund and Schapire (1998) presented a voted formulation of the original algorithm (attaining large margin) and suggested that one can apply the kernel trick to it. The kernel-perceptron not only can handle nonlinearly separable data but can also go beyond vectors and classify instances having a relational representation (e.g. trees, graphs or sequences).

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Perceptron". link