Perceptron: Artificial Neural Networks Explained

A perceptron is a signal transmission network consisting of sensory units (S units), association units
(A units), and output or response units (R units). The
receptor of the perceptron is analogous to the retina of
the eye and is made of an array of sensory elements
(photocells). Depending on whether or not an S-unit
is excited, it produces a binary output. A randomly
selected set of retinal cells isconnected to the next level
of the network, the A units. Each A unit behaves like
the basic building block discussed above,
where the 1,
- 1 weights for the inputs to each A unit are randomly
assigned. The threshold for all A units is the same.
In 1957 the psychologist Frank Rosenblatt proposed
“The Perceptron: a perceiving and recognizing automaton” as a class of artificial nerve nets, embodying
aspects of the brain and receptors of biological
systems. Fig. 1 shows the network of the Mark 1
Perceptron. Later, Rosenblatt protested that the term
perceptron, originally intended as a generic name for a
variety of theoretical nerve nets, was actuallyassociated with a very specific piece of hardware (Rosenblatt, 1962). Thebasicbuildingblock
of a perceptron is an element that accepts a number of inputs
xi, i = 1, . . . , N, and computes a weighted sum of
these inputs where, for each input, its fixed weight w
can be only +1 or- 1. The sum is then compared with a
threshold 0, and an output y is produced that is either 0
or 1,depending on whether or not the sumexceeds the
threshold. In other words,
The binaryoutput Y k of the k th A unit ( k = 1, . . . , m)is
multiplied by a weight ak, and a sumof all m weighted
outputs is formed in a summation unit that is the same
as the basic building blocks
with all weights equal to 1.
Each weight ak is allowed to bepositive, zero, or
negative, and may change independently of other
weights. The output of the perceptron is again binary,
depending on athreshold that is normally set at 0 . The
binary values of the output areused to distinguish two
classes of patterns thatmay bepresented to theretina of
a perceptron. The designof a perceptron to distinguish
between two given sets of patterns involves adjusting
the weights ak, k = 1, . . . , m, and the threshold 6.
Rosenblatt (1962) proposed a number of variations of
the following procedure for “training” perceptrons.
The set of given patterns of known classification are
presented sequentially to the retina, with the complete
set being repeated as often as needed.
Figure 1. Mark 1 Perceptronstructure.
The output of the perceptron is monitored to determine whether a pattern is correctly classified. If not,
the weights are adjusted according to the following
“error correction” procedure: if the nth pattern was
misclassified, the new value ak(n 1) for the kth
weight is calculated as
ak(n + 1) = ak(n) + y d n ) x
where 6(n)is 1 if the nth pattern is from class 1 and
6( n) is - 1 if the nth pattern is from class 2. No adjustment to the weight is made if a pattern is correctly
If there exists a set of weights such that all patterns
can be correctly classified, the pattern classes are said
to be linearly separable. It was conjectured by Rosenblatt that, when the pattern classes are linearly separable, the error correction “learning” procedure will
converge to a set of weights that correctly classifies all
the patterns. Many proofs of this perceptron convergence theorem were subsequently derived, the shortest by A. J. Novikoff. Subsequent contributions related
the simple perceptron to statistical linear discriminant
functions and related theerror-correction learning
algorithm to gradient-descent procedures and to
stochastic approximation methods that were originally
the zeros and extremes of
unknown regression functions (see e.g. Kanal, 1962).
Thesimple perceptron described is a series-coupled
perceptron with feed-forward connections only from S
units to A units and A units to the single R unit. The
weights a k , the only adaptive elements in this network,
are evaluated directly in terms of the output error.This
is sometimes referred to as a single-layer perceptron.
There is no layer of “hidden” elements-Le. elements
for which the adjustment is only indirectly related to
the output error.A perceptron with one ormore layers
of hidden elements is termed a multilayer perceptron.
Rosenblatt investigated cross-coupled perceptrons in
which connections join units of the same type, and also
investigated multilayer
which have feedback paths from units located near
the output. For series-coupled perceptrons with multiple R units, Rosenblatt proposed a “back-propagating
error correction” procedure that used error from the
R units to propagate correction back to the sensory
end. But neither he nor others were able to demonstrate a convergent procedure for training multilayer
Minsky and Papert (1969) proved various theorems
about simple perceptrons, some of which indicated
their limited pattern-classificationand function approximating capabilities. For example, they proved that the
single layerperceptron could not implement the Exclusive OR logical function (see BOOLEAN
several other such predicates. Later, many who wrote
on Artificial Neural Networks (ANN) would blame this
book byMinsky and Papert forgreatly dampening
interest and leading to ademise of funding for research
on ANNs. Thesection on “Alternate Realities” in Kanal
(1992) details why the blame is misplaced. As noted
there, by 1962 many researchers had moved on from
perceptron-type learning machines to statistical and
syntactic procedures for pattern recognition. The demise of funding for perceptron-type networks should
be blamed on the inadequate technology and training
algorithms availableformultilayer
perceptrons and
the premature, overblown results promised the funding agencies.
Minsky and Papert’s results did not apply to multilayer
perceptrons. Research on ANNs,biologically motivated automata, and adaptive systems continued in the
1970s inEurope, Japan, theSoviet Union and the USA,
but without the frenzied excitement of previous years.
In a 1974 Harvard University dissertation, Paul Werbos
presented a general convergence procedure for adaptively adjusting the weights of a differentiable nonlinear
system so as to learn a functional relationship between
the inputs and outputs of the system. The procedure
calculates the derivatives of some function of the
outputs, with respect to all inputs and weightsor
parameters of the system, working backwards from
outputs toinputs. However, this work by Werbos went
essentially unnoticed, until a few years after Rumelhart, Hinton, and Williams independently popularized
a special case of the general method to adjust adaptively
the weights of a multilayer, feedforward perceptron for
pattern classification applications when learning samples are available.This algorithm, which adapts the
weights usinggradient descent, is known as error backpropagation or just backpropagation. It propagates
derivatives from the output layer through each intermediate layer of the multilayer perceptron network.
The resurgence of work on multilayer perceptrons and
their applications in the 1980s is directly attributable to
this convergent backpropagation algorithm.
It has been shown that multilayer feedforward networks with a sufficient number of intermediate or
“hidden” units between the input and output units
have a “universal approximation” property: they can
approximate nearly anyfunction to any desireddegree
of accuracy. It has alsobeen shown by White that
backpropagation is essentiallya special case of stochastic approximation, and once again neural network
learning procedures are being shown to be intimately
related to known statistical techniques (Bishop, 1995).
More on recent developments in backpropagation
algorithms and multilayer perceptrons may be found
in Werbos (1994), Chauvinand Rumelhart (1995), and
Mehrotra et al. (1997).
Laveen N. Kanal
The main purposes of the measurement and evaluation of computer systems are to:
1. Aid in the design of hardware and software.
2. Aid in the selection of a computer system.
3. Improve the performance of an existing system.
Thefirst of these must use some type of model of
the systembeing designed. The latter two mayuse
Figure 1.
A computer system and its subsystems.
actual measurements or models or some combination
of the two.
Measurement and evaluation of computer system
performance is difficult due to the complexity of the
internal structure of computer systems and because of
the difficulty of describing and predicting the workload.
As shown in Fig. 1, a computer system is composed of
subsystems, each of which can be viewed as a system
with its own workload and performance. Total system
performance is related tothe performance of the
subsystems, although the relationship can be complex.
Computer system and subsystem performance measures fall into three categories-responsiveness,
throughput, and cost. The response time for interactive commands or the turnaround
time for batch jobs
are typical measures of responsiveness. Throughput is
a measure of the computational work accomplished by
the system per unit time. There is, however, no generally acceptable definition of a unit of computational
work. Measures such as jobs per unit time or transactions per unit time become meaningful only when
the resource requirements of these tasks are described;
this is one aspect of the workload characterization
problem. The cost of a computer system is the monetary amount required to buy or lease the system.
Response and throughput characteristics have to be
evaluated in terms of the cost of the system.
It is necessary to characterize the load on a system in
order to make meaningful statements about its performance. One aspect of this problem is determining
which characteristics of the load largely determine the
performance measures of interest. Another is determining the values of the workload model parameters
for a particular performance study and is particularly
difficult if the system is not yet operational. But even
with an operational system, the workload may vary
with time, and the workload characteristics measured
will depend on the measurement period chosen.