Subido por Darío X GB

19. schneebeli2011

Anuncio
February 23, 2011
16:15
0218-348X
S0218348X11005191
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
Fractals, Vol. 19, No. 1 (2011) 87–99
c World Scientific Publishing Company
DOI: 10.1142/S0218348X11005191
THE NEWTON–RAPHSON METHOD AND
ADAPTIVE ODE SOLVERS
HANS RUDOLF SCHNEEBELI
Kantonsschule
CH-5400 Baden, Switzerland
[email protected]
THOMAS P. WIHLER∗
Mathematics Institute, University of Bern
Sidlerstrasse 5, CH-3012 Bern, Switzerland
[email protected]
Received November 10, 2009
Revised October 21, 2010
Accepted November 1, 2010
Abstract
The Newton–Raphson method for solving nonlinear equations f (x) = 0 in Rn is discussed
within the context of ordinary differential equations. This framework makes it possible to reformulate the scheme by means of an adaptive step size control procedure that aims at reducing
the chaotic behavior of the original method without losing the quadratic convergence close to
the roots. The performance of the modified scheme is illustrated with a few low-dimensional
examples.
Keywords: Newton–Raphson Method; Euler Method; Ordinary Differential Equations; Dynamical Systems; Chaotic Behavior; Step Size Control; Adaptive Time-Stepping Methods.
∗
Corresponding author.
87
February
23,
S0218348X11005191
88
2011
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
1. ROOT FINDING WITH THE
NEWTON–RAPHSON METHOD —
HOW EFFICIENT IS IT?
The numerical solution of nonlinear equations
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
f (x) = 0,
f : Rn → Rn ,
(1)
by the Newton–Raphson method (Newton method,
for short) is often accompanied by two somewhat incompatible effects: quadratic convergence
and chaotic behavior. The first of the two predicates concerns the local behavior of the algorithm. The second one touches upon some global
aspects. Indeed, typically (i.e. under certain conditions that often hold) the Newton method generates a sequence of approximations that converges
quadratically to a solution x of (1) for any initial values “sufficiently close” to x. Hence, from a
local point of view, the Newton method is quite
efficient. Evidently, however, the amount of work
needed to find suitable starting values for a corresponding iteration is a separate issue. In this context, Cayley1 posed a global question: What are
the attractors of the roots of a complex polynomial
when the Newton method is applied? Note that in
that case, the Newton method corresponds to an
iterated rational map. Here, the work of Julia2 (see
also Fatou3 ) showed the high complexity of fractal
boundaries (Julia sets) separating different attractors. In the 1980s, chaos and fractals became popular catchwords. Stunning pictures attracted the
interest of the public at large; see, e.g. Ref. 4.
Such pictures attempted to answer Cayley’s question empirically, sometimes under questionable conditions, using numerical methods based on floating
point operations and, of course, computer screens
with still-modest resolution. Did the graphics reveal
the mathematical truth? Smale’s paper5 paved the
way for deeper insight and new results.
One of many possible ways to assess the chaotic
behavior of the Newton method for the numerical
solution of (1) is to consider the associated Julia
set f . Several definitions of fractal dimension are
at our disposal in order to describe the complexity and extent of the boundaries separating various
attractors (cf., e.g. Ref. 6). In the present context
the box-counting dimension is a plausible choice.
Take, for example, the Julia set f corresponding
to f (z) = z 3 − 1 in the complex plane C; cf. Fig. 1.
It is well known that f has a fractal dimension d
with 1 < d < 2. This agrees with our visual impression. Indeed, f occupies a significant part of the
plane and is not nearly as simple as a line segment.
Fig. 1 Julia set f ⊂ C and attractors for f (z) = z 3 − 1 =
0; three different colors distinguish the three Wada basins
associated with the three solutions (each of which is marked
by a small circle).
In each open neighborhood of any point of f there
are points belonging to all the attractors (also called
Wada basins) of the three roots of f .
The Julia set is invariant under the action of
the Newton iteration, i.e. the scheme permutes the
points of f . Hence, initial values in the Julia set are
bound to stay there forever, at least in exact arithmetic. We are thus led to the following question:
How efficient is the Newton method if the attractors
are unknown and the search for initial values, for
which the iterates converge quadratically to some
root, is considered part of the problem?
For the current example, f (z) = z 3 − 1, the Newton method iterates a rational function in C with
a pole of order 2 at the origin 0. Close to the pole,
unlimited accelerations occur and the dynamics of
the Newton method become uncontrolled. It seems
obvious that a sequence of points is unlikely to be
quickly convergent and impetuous at the same time,
and it is reasonable to assume that this observation is related to chaos, as seen, for example, in
Fig. 1. To sum up, moving starting points to a sufficiently small neighborhood of a zero of f might not
always be a well-conditioned procedure. For example, iterates may approach the area of quadratic
convergence at a low rate, or they may visit various attractors until they actually approach some
root. Hence, from a global point of view, the chaotic
February
23,
S0218348X11005191
2011
16:15
0218-348X
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
The Newton–Raphson Method and Adaptive ODE Solvers
behavior of the Newton method seems to reduce, to
a certain extent, the high efficiency of the scheme.
In this paper, following, e.g. Refs. 5, 7 and 8,
we shall identify the Newton method as the numerical discretization of a specific ordinary differential equation (ODE) by the explicit Euler method,
with a fixed step size h = 1. This ODE constitutes a continuous version of the Newton method.
Its associated dynamics — and those of its discretization — can be represented by a vector field.
This allows us to compare the continuous behavior with the numerical approximations. It seems
interesting to explore other numerical ODE solvers,
such as Runge–Kutta methods, which are more
sophisticated than Euler’s method (see, e.g. Ref. 9).
Surprisingly, however, numerical experiments show
that the convergence rates of the resulting root finding schemes are usually not competitive with that
of Euler’s method; incidentally, they also show some
considerable chaos. Hence, in this paper, we shall
suggest an alternative approach. Referring to ideas
from Ref. 10 (see also Refs. 11 and 12, for example), we apply an automatic step size control procedure to Euler’s method and thereby obtain a simple
adaptive Newton iteration scheme (see also Ref. 13
and the references therein for an extensive review
of the Newton method and adaptive algorithms; in
addition, we mention Refs. 6, 14 and 15 for different variations of the classical Newton scheme).
The goal of the modified method is to deal with
singularities in the iterations more carefully, and,
thus, to tame the chaotic behavior of the standard
Newton algorithm. In order to obtain good convergence properties, a step size of h = 1 is used
whenever possible. We will consider two examples
for which the Newton method produces fractal Julia
sets and will show that the refined algorithm operates in attractors with almost smooth boundaries.
Moreover, numerical tests demonstrate an improved
convergence rate not matched on average by the
classical Newton method.
We note that, compared to more sophisticated
numerical ODE solvers (like the ones used in, e.g.
Ref. 9), the step size control applied in this work
does not primarily emphasize following as closely
as possible the dynamics of the continuous flow
described by the ODE. In fact, it aims more at
providing a sort of funnel that retains a sequence
of approximations inside the same attractor and
leads it toward a neighborhood of the corresponding
zero in which quadratic convergence for the Newton
method is guaranteed. In addition, we remark that
89
the ODE of the continuous version of the Newton
method describes a flow that slows down exponentially when approaching a root. Hence, in order to
find a sufficiently accurate solution of (1) in a fast
way, it is beneficial to utilize a numerical method
that quickly leaps close to a zero irrespective of the
dynamics of the continuous flow. Euler’s method
with step size h = 1 is a good candidate: although
it might not simulate the continuous dynamics as
accurately as other more sophisticated numerical
ODE solvers, the quadratic convergence close to a
root results in an efficient advancement of the discrete flow.
Furthermore, let us remark that there is a
large application and research area where methods
related to the continuous version of the Newton
method are considered in the context of nonlinear
optimization. Some of these schemes count among
the most efficient ones available for special purposes. The efficiency results from avoiding the inversion of Jacobian matrices or from circumventing the
solution of large linear systems. Such issues, however, are out of the scope of this article. Indeed,
its main purpose is to show that new light can be
shed on a traditional root finder. The perspective
adopted here is to relate the discrete dynamical system given by the Newton method to a continuous
one defined by an ODE — in geometrical terms a
vector field. It is remarkable to realize that the Newton scheme transforms a vector field in such a way
that the field is normalized close to any regular fixed
point. This transform, however, may create new singularities in the transformed field. Looking at the
vector field in a global way will be crucial in finding
improved solvers retaining the celebrated efficiency
of the Newton method close to a regular solution.
The endeavor will be to tame the tendency of Newton iterations to switch between several attractors
of solutions in the presence of singularities caused
by the Newton method itself. This geometric insight
opens a way to derive solvers for nonlinear equations less prone to chaotic behavior than the original Newton method.
The paper is organized as follows. In Sec. 2, we
comment on the connection between the discrete
and continuous versions of the Newton method and
define a transformation of vector fields aptly called
the Newton–Raphson transform (NRT). Furthermore, we show that the Newton method can be
obtained as the numerical discretization of a specific ODE by the explicit Euler method. In Sec. 3,
the possible difference between local and global
February
23,
S0218348X11005191
90
2011
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
root finding is addressed. Moreover, Sec. 4 presents
an adaptive step size algorithm for the Newton
method. Finally, Sec. 5 contains a summary and
some open questions.
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
2. THE NEWTON METHOD
AND ODEs
We consider a function f : Rn → Rn whose zeros are
to be determined. In what follows, f is assumed to
be sufficiently smooth (twice continuously differentiable usually suffices). Let Df denote its Jacobian.
Now, consider the following iterative procedure to
find a zero x of f : Start with a value x0 that is
assumed to be “close enough” to x, then solve the
linearized version,
f (x0 ) + Df (x0 ) · ∆x0 = 0,
of the equation f (x0 + ∆x0 ) = 0 for the increment ∆x0 ; a possibly improved approximation is
now given by x1 = x0 + ∆x0 . Repeat this procedure as long as necessary. This is the classical Newton method. Numerical experiments tend to show
quadratic convergence rates of the scheme, until the
preset tolerances are met or limitations of the computer’s floating point arithmetic cause the iterates
to end up in a finite cycle. In order to guarantee
existence and uniqueness of the increments in this
algorithm, we implicitly assume that the Jacobians
are regular in each step. Here we note that, although
this somewhat idealistic assumption may often be
satisfied, issues such as, e.g. the condition number
of the Jacobian need to be addressed carefully in
order to obtain robust numerical answers.
The above picture is perfectly adequate for the
local study of the algorithm. In order to deal with
global questions as well, we propose a different point
of view. This will be discussed in the following
sections.
2.1. Vector Fields — From
Equations to ODEs
Any nonlinear map f : Rn → Rn gives rise to a vector field on the domain of f in the following way. At
point x attach the vector f (x) (to the tangent space
at x). This vector field defines a flow on the domain
of f . The fixed points of this flow are precisely the
zeros of f that we are looking for. The behavior of
the flow near a fixed point is described up to nonlinear terms by the Jacobian Df at the fixed point.
In the case that all real parts of the eigenvalues
of Df are negative, the fixed point is attractive. In
general, this favorable condition is, of course, not
fulfilled. Interestingly, however, it may typically be
retrieved by applying a suitable transformation to
the vector field; this will be discussed in the next
section. If the fixed point is attractive, we might use
the flow to push some initial value x0 closer and
closer to the fixed point. That is, we use the differential equation ẋ = f (x) and the initial condition
x(0) = x0 to solve f (x) = 0 approximately. This
motivates a numerical procedure whereby we avoid
formal solutions of the initial value problem and rely
on some suitable numerical ODE solvers. Plausibly,
this might be done by opening up a well-defined
toolkit (see, e.g. Ref. 12), traditionally applied to
ODEs, that can now be utilized to solve nonlinear equations as in (1). As usual, choosing an efficient tool will require some additional experience,
insights, and ideas.
2.2. The Newton–Raphson
Transform
For simplicity, we assume that 0 is a regular value
of f : Rn → Rn . This means that the Jacobian Df
is nonsingular close to every zero of f . As mentioned before, we will now transform the vector field
defined by f in such a way that the new vector field
Nf only has fixed points of the most favorable kind.
This happens if, for all zeros x of f , the Jacobian
satisfies Df = −Id, with Id denoting the identity
map. The following transformation, which we call
the Newton–Raphson transform (NRT), does the
trick:
N : f → Nf = −Df −1 · f .
Supposing that f is sufficiently smooth on an open
domain Df ⊆ Rn , let us propose the following
facts:
(i) Nf is a vector field whose domain DNf is the
subset of Df on which Df is invertible.
(ii) If f is linear, then Nf = −Id.
(iii) Suppose that x is an isolated zero of f ; then
the Taylor expansion of Nf at x is given by
Nf (x + ∆x) = Nf (x) + DNf (x) · ∆x
+ O(∆x2 ),
where · signifies the Euclidean norm in Rn .
Using the product rule and the fact that f (x) =
0 results in
D(−Df −1 · f )(x) · ∆x = −∆x.
February
23,
S0218348X11005191
2011
16:15
0218-348X
The Newton–Raphson Method and Adaptive ODE Solvers
This shows that the NRT always standardizes
the vector field in the neighborhood of a zero
and thereby mimics the vector field of −Id at 0
(with respect to ∆x), up to nonlinear effects:
Nf (x + ∆x)
= −∆x + nonlinear terms in ∆x.
(2)
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
Incidentally, this result can be used to explain
the high convergence rate of the Newton
method. In addition, we see that
x1 , x2 , x3 , . . . of the exact solution x(t) at the
points t = h, 2h, 3h, . . . . Then the derivative ẋ at
any discretization point tn = nh, n ∈ N, can be
approximated as follows:
ẋ(tn ) ≈
x(tn + h) − x(tn )
xn+1 − xn
≈
.
h
h
Hence, replacing the exact derivative in (4) with its
approximation leads to
xn+1 − xn
≈ Nf (xn ),
h
((x + ∆x) − x) · Nf (x + ∆x) ≈ ∆x · (−∆x)
= −∆x2 < 0.
and we are able to define the iterative scheme
Hence, the NRT at x+∆x is a descent direction
in the direction of the zero x.
(iv) The set
Sf = {x ∈ Df : Df (x) is singular}
(3)
is closed in the domain Df . Thus, the domain
DNf = Df \Sf is open. Therefore, since all zeros
of f are regular by assumption, each one of
them is located in an attracting neighborhood
contained in DNf .
2.3. Euler Meets Newton
Let us focus on the ODE related to the new (transformed) vector field Nf . Note that all of its fixed
points are surrounded by a local attractor with
convenient properties. We consider the initial value
problem
ẋ(t) = Nf (x(t)),
91
x(0) = x0 ,
(4)
with an initial value x0 that is assumed to be sufficiently close to a root x of f . Recalling (2) with
∆x(t) = x(t) − x, and neglecting terms of higher
order, this ODE may be written as
ẋ(t) = Nf (x(t)) = Nf (x + ∆x(t))
≈ −∆x(t) = x − x(t).
(5)
Letting ∆x0 = x0 − x, we find an explicit solution,
∆x(t) = exp(−t)∆x0 . Hence, the linearized dynamics of the vector field Nf carries an initial point x0
exponentially fast toward a zero x of f . We remark,
however, that it takes infinitely long to reach the
root x, since the flow is slowing down in proportion
to x(t) − x, i.e. the distance between the actual
position and the fixed point (if ∆x0 = 0).
We will now take a look at the numerical solution
of (4) by the explicit Euler method. To this end,
fix a step size h > 0 and consider approximations
xn+1 = xn + h Nf (xn ),
n = 1, 2, 3, . . . ,
(6)
with a given initial value x0 . Using a step size h = 1,
this is exactly the classical Newton method for the
solution of the nonlinear equation (1).
3. LOCAL AND GLOBAL
PERSPECTIVES
Consider the first update x1 = x0 + h Nf (x0 ) in the
iteration (6). Referring to (5), we have the (approximate) equality x1 − x ≈ x0 − x + h · (x − x0 ). Now
choose h = 1 (i.e. the traditional Newton method)
in order to move x0 to x in one step! This is an interpretation of the Newton method in the framework
of dynamical systems and numerical ODE solvers.
We see that a single Euler step of size h = 1 would
suffice to reach the fixed point at x up to errors
resulting from nonlinear terms in x − x.
Our analysis indicates that the Newton method
for solving f (x) = 0 will typically perform very well
close to a zero of f . But what happens if the iteration is started with an initial guess that is relatively
far away from a zero? Let us look at a simple example. Suppose we use the Newton method for solving
exp(x)−p = 0, with some given p > 1. The example
is chosen such that the NRT does not introduce new
singularities in the vector field. Indeed, the formula
for the iteration step,
xnew = x −
exp(x) − p
= x − 1 + p exp(−x),
exp(x)
formally works for any x. The positive real axis
is contained in the attractor of the unique solution x = ln p. We see, however, that for (positive)
initial values far from the solution, a significant
number of steps just moves about one unit to the
left, since the term p exp(−x) is small for x
0.
February
23,
S0218348X11005191
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
92
2011
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
Hence, the convergence is, at least initially, relatively slow. Of course, in this special case, good
initial values may be deduced by looking at the
floating point representation of x. For a general
purpose root finder such tricks are, however, not
available.
The above example illustrates that step size control in the Newton algorithm might be quite useful,
even in good natured examples. Let us now turn
to more realistic examples for which the Jacobian
Df may become ill-conditioned or even singular.
To this end, let us consider again the singular
set Sf from (3). This is a potential source of trouble induced by the NRT. More precisely, close to
the points of Sf , at least one of the eigenvalues of the Jacobian matrix Df is 0 or nearly so,
and, hence, we expect very high accelerations in
the discrete dynamics of the Newton iteration in
the direction of the corresponding eigenvector. It
is therefore quite reasonable to suspect that such
instabilities are a potential cause of the sensitive
dependence of the Newton method on the initial
values (i.e. the chaotic behavior away from the
attractors).
It is well known that relaxation (also called line
search parametrization) in the Newton method (i.e.
the use of a suitable step size h < 1 in (6)) may
enhance the stability of the numerical approximations; however, it is clear that the quadratic convergence behavior cannot be expected. On the other
hand, as mentioned above, working with h = 1 globally may lead the Newton method to be chaotic,
i.e. numerically unstable (particularly if the starting value is not sufficiently close to a zero).
Traditionally, the Newton method is bound to
a local perspective. If a good starting point for
an iteration is taken for granted, then quadratic
convergence can typically be expected. What happens, however, if we adopt a global perspective,
i.e. if, for example, we perform a possibly slow
search for an initial value preceding the phase of
quadratic convergence? Would a suitably modified scheme be able to reduce the effort required
to find suitable initial values? Would this result
in smoothing out the boundaries of attractors of
the various roots? Here, recognizing the connection between the Newton method and numerical
ODE solvers allows for improved flexibility in the
choice of tools and approaches for solving equations
within a global framework. For example, using variable step sizes h in the explicit Euler scheme for the
solution of (4) could provide an interesting balance
between stability and fast convergence. Here, the
idea is to resolve the behavior of the ODE (4)
close to the singular points in Sf more accurately
and thereby to obtain better numerical approximations. In order to decide how to choose the
local step sizes, we formulate the following (natural)
requirements:
(i) use step size h = 1 whenever possible to retain
the quadratic convergence, in particular close
to a solution;
(ii) if high accelerations occur close to points in Sf ,
reduce the step size; and
(iii) minimize the number of function evaluations.
Since the behavior of the solution x(·) of the continuous root solver (4) might be quite difficult to
predict in practice, it would be desirable to adjust
the local step sizes at run time, without the need
for a lot of a priori knowledge. For this purpose, an
adaptive algorithm can be used. Such a procedure
is able to extract local information on the solutions
and to control the step sizes automatically. This will
be discussed in detail in the next section.
4. ADAPTIVITY FOR THE
NEWTON METHOD
We shall now use the link between the Newton
method and the explicit Euler scheme applied to the
ODE (4), i.e. the continuous version of the Newton
method. More precisely, we find an automatic step
size control algorithm for the Newton method by
looking at adaptive step size procedures for numerical ODE solvers. In this article, we shall refer to
an idea presented in Ref. 10 (see also Ref. 12, for
example).
4.1. Step Size Control
In order to design an adaptive Newton algorithm,
we will use a so-called error indicator. Basically, this
is a computable local upper bound on the error that
is able to tell us whether the error in the current
iteration is large or small; if the error is found to
be too large, the corresponding step is recomputed
with a smaller step size.
Let us recall Euler’s method (6) for the discretization of the continuous root finder (4):
xn+1 = xn + hn Nf (xn ),
with starting value x0 .
(7)
February
23,
S0218348X11005191
2011
16:15
0218-348X
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
The Newton–Raphson Method and Adaptive ODE Solvers
Here, in contrast to the standard Newton scheme
(for which we choose h = hn = 1 in every step),
we shall keep the step size hn flexible. For notational simplicity, we consider the one-dimensional
case f : R → R only; replace Nf by a function g,
and drop the subscript ·n in the step size h. Furthermore, we fix a certain time tn = nh, n ∈ N,
and assume that the exact solution x(tn ) of (4) is
equal to its corresponding numerical approximation
denoted by xn . In practice, of course, this is typically not the case; however, assuming that the previous iterations in Euler’s method are sufficiently
accurate, it is reasonable that x(tn ) and xn are
“very close” to each other.
In order to obtain a computable error bound at
tn+1 = tn +h, we shall compute two different numern+1 at tn+1 in such
ical approximations xn+1 and x
n+1 | takes the
a way that the difference |xn+1 − x
role of an error indicator.
Applying a Taylor expansion and recalling (4),
we find
x(tn + h) = x(tn ) + hẋ(tn ) +
Hence, using (7), we obtain the following error
bound:
h2
g(xn )g (xn ) + O(h3 ).
2
(9)
Next, we shall compute an alternative (improved)
approximation x
n+1 of x(tn+1 ) by performing two
steps of the explicit Euler method of size h2 . Let
us first consider the numerical solution x
n+ 1 corre2
sponding to tn + h2 :
x
n+ 1 = xn +
2
h
g(xn ).
2
Then an additional step of size h2 results in the following numerical approximation of the exact solution x(tn+1 ):
n+ 1 +
x
n+1 = x
2
h
g(
xn+ 1 )
2
2
h
h
h
= xn + g(xn ) + g xn + g(xn ) .
2
2
2
leads to
h
g(xn )
2
h h
2
g(xn ) + g (xn )g(xn ) + O(h )
+
2
2
x
n+1 = xn +
= xn + hg(xn ) +
h2 g (xn )g(xn ) + O(h3 ).
4
Thus, recalling (8), we get
x(tn + h) − x
n+1 =
h2 g (xn )g(xn ) + O(h3 ).
4
(11)
Subtracting (11) from (9) twice gives
n+1 ) = O(h3 ),
x(tn + h) − xn+1 − 2(x(tn + h) − x
and, therefore, we obtain the following a posteriori
error estimate:
xn+1 − xn+1 ) + O(h3 ).
x(tn + h) − xn+1 = 2(
h2
ẍ(tn ) + O(h3 )
2
h2
= xn + hg(xn ) + g(xn )g (xn ) + O(h3 ).
2
(8)
x(tn + h) − xn+1 =
93
(10)
Let us have a look at the error between x(tn+1 ) and
x
n+1 . A Taylor expansion of the last term in (10)
We see that the expression 2(
xn+1 − xn+1 ) is a computable quantity and (neglecting the O(h3 ) terms)
can be used as an error indicator in each individual step. Although the celebrated quadratic convergence rate of the Newton method can only be
expected to occur close to the root, adaptive step
size control serves to safely direct raw approximations into a region of fast convergence.
We mention that, recalling (11), we have x(tn +
h) − x
n+1 = O(h2 ). This implies that x(tn +
h) − xn+1 = O(h2 ), and since the initial value
at tn was considered exact, it follows that the local
truncation error (not the overall error) in Euler’s
method applied to (4) converges quadratically in h;
see, e.g. Ref. 10 for details. Note that this refers
to the local approximation properties in Euler’s
method rather than the convergence behavior of
the Newton method as an iterative root finding
scheme.
We generalize the above result to higher dimensions and suggest the following adaptive step size
algorithm for the Newton method:
Algorithm 1 (Adaptive
algorithm ).
Newton–Raphson
(i) Fix a tolerance τ > 0, and start the iteration (n = 0) with h0 = 1 and an approximate
value x0 .
February
23,
S0218348X11005191
94
2011
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
(ii) In each iteration step n = 0, 1, 2, . . . , compute
n+1 from xn with the explicit Euler
xn+1 and x
scheme (7), respectively, (10). Then, if
τ
(12)
xn+1 − xn+1 < ,
2
set hn+1 = min(2hn , 1) and go to the next step
n ← n + 1; otherwise, set hn ← 12 hn and start
the current step again.
Remark 1. We point out that we choose the
doubling of the step size for small errors, i.e.
hn+1 = min(2hn , 1), in such a way that h ≤ 1.
We thereby aim to obtain iterations that are mostly
of step size 1 and take advantage of the quadratic
convergence behavior of the Newton method, in particular close to the root. Note that there exist alternative criteria for the reduction of the step size h;
see, e.g. the Armijo–Goldstein rule (e.g. Chapter 6
of Ref. 16, and also Refs. 6 and 14), which is also
based on bisection of the line search parameter h
(multiplied by a suitable factor of typically 10−4 )
and relies on a similar criterion as (12).
Remark 2. In addition, we note that the above
algorithm is mainly designed for zeros of multiplicity 1. In the general case, where zeros of possibly higher multiplicity p ≥ 2 need to be found,
an appropriate extension of the automatic step size
control procedure would be desirable. This is due to
the fact that, for zeros of multiplicity p, a step size
of h = p (close to the solution) is usually necessary
to ensure quadratic convergence.
Remark 3.
Intuitively, the choice of the tolerance parameter τ in the above adaptive algorithm depends on the vector field x governing the
dynamics of the root finding procedure. At this
experimental stage, however, a scheme which is able
to compute τ in dependence of x is not provided.
Indeed, in the following experiments, Sec. 4.2, suitable values for τ have been chosen on the basis of
a systematic trial and error procedure similar to
bisection.
4.2. Examples
The following two examples are chosen in such a
way that some phenomena occurring in the Newton
method may be visualized. In both cases, we produce two kinds of plots. First, we show the vector
field corresponding to ẋ = f (x) and compare it to
its NRT. The goal of the visual comparison is to perceive the dynamics related to both ODEs. Note that
for the purpose of better presentation, the arrows in
the plots of Figs. 2–3 and 5–6 have been scaled by
suitable factors. Second, the attractors of solutions
for the Newton method are generated approximately by numerical experiments with the traditional Newton scheme and with step size control.
5
5
4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
−5
−5
−4
−3
−2
Fig. 2
−1
0
1
2
3
4
−5
−5
−4
−3
−2
−1
0
1
Example 1: The original vector field (scaled by a factor of 1.5) and its direction field.
2
3
4
February
23,
S0218348X11005191
2011
16:15
0218-348X
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
The Newton–Raphson Method and Adaptive ODE Solvers
4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
−5
−5
−4
−3
−2
Fig. 3
−1
−5
0
1
2
3
4
−5
−4
−3
−2
−1
0
1
2
3
95
4
Example 1: The NRT of the field (scaled by a factor of 10) and its direction field.
Fig. 4 Example 1: Numerical simulations for finding attractors for the Newton method and for the Newton method with
step size control (τ = 0.05).
Example 1: Two real equations with
singular set of real co-dimension 1
Here and in what follows, x1 and x2 denote
real variables. We consider the system of real
equations
−x21 + x2 + 3
=0
f (x1 , x2 ) =
−x1 x2 − x1 + 4
in R2 . Its only real solution x = (2, 1) is an
attractive fixed point in the vector field associated
with ẋ = f (x); see Fig. 2. Note that vectors close
to x clearly show a curl. In Fig. 3, however, the
vectors point directly to x and the curl is removed
by the NRT. The parabola displays the set Sf
of co-dimension 1 where the Jacobian is singular.
Points close to Sf may move fast and the direction
February
23,
S0218348X11005191
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
96
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
2011
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-1
-2
Fig. 5
1
0
2
3
-3
-3
-1
-2
1
0
2
3
Example 2: The vector field corresponding to f (z) = z 3 − 1 (scaled by a factor of 1.5) and its direction field.
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
Fig. 6
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
Example 2: NRT of the vector field for f (z) = z 3 − 1 (scaled by a factor of 5) and its direction field.
of the field is bound to sudden changes there. Moreover, in Fig. 4 we present the attractors of x for
the traditional and the adaptive Newton methods
(with τ = 0.05). Both pictures are based on sampling initial values on the same grid of 1001 × 1001
starting values in the domain [−3, 7] × [−4, 6]. The
coloring of the right part of the frames marks the
attractor of x. While the Newton method with fixed
step size h = 1 shows traces of empirical chaos, step
size control is able to tame this unstable behavior
considerably.
Example 2: Two real equations with
singular set of real co-dimension 2
We consider the complex equation z 3 −1 = 0 in C in
its real form in R2 (i.e. by separating the real and
imaginary parts into a system of two equations);
February
23,
S0218348X11005191
2011
16:15
0218-348X
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
The Newton–Raphson Method and Adaptive ODE Solvers
see Fig. 5 for the corresponding vector field. Then
the NRT exhibits a unique singular point at the
origin. The example shows symmetry by rotation
of 2π/3 about 0. More precisely, there are three sectors starting at 0, each containing a solution on its
central line. These sectors are the attractors of the
system ẋ = Nf (x). Figure 1 shows the three attractors corresponding to the Newton method for the
third roots of unity in C. The Julia set separates the
attractors; we refer the reader to Ref. 8, for example, for further details on attractors for the Newton
method and fractals. Figure 6 presents the three
“exact” sectors, i.e. the attractors of the zeros in the
continuous version of the Newton method. The discrete dynamics clearly may jump between the three
sectors. The pole at 0 produces high accelerations,
as can be seen from the considerable differences in
the size and direction of the arrows near 0. Thanks
to step size control, the influence of the high accelerations is tamed. In fact, if a tentative step crosses
the boundary of the attractor near 0, the tolerance
τ is unlikely to be met and the step size is reduced.
In Fig. 7, we display the behavior of the classical
and the adaptive Newton methods (with τ = 0.1)
for the initial point x0 = (−0.5, 0.1); we see that
while the classical solution shows large displacements and thereby leaves the original attractor,
the iterates corresponding to the adaptive Newton method follow the exact solution of (4) (which
we approximate by a numerical reference solution,
i.e. (6) with h
1) quite closely and approach
the “correct” zero. Generally, step size control is
considered successful if it is able to reproduce the
Reference solution (h<<1)
Adaptive Newton method
Standard Newton method
0.7
0.6
0.5
0.4
0.3
0.2
boundaries between the attractors of the vector field
Nf rather than the Julia set derived from the discrete dynamics. Referring to Fig. 8, our expectation
is well met, at least up to the resolution of the simulation and the graphics device used.
The example generalizes to polynomial functions
in the following way. Let q : C → C be a monic polynomial of degree p > 0, q(z) = z p +cz p−1 +· · ·. Then
its NRT is of the form
c
z
+ r(z) ,
+
Nq : z → −
p p2
k=1
{ζk }pk=1
0.1
starting point
0
-0.1
-0.6
Fig. 8 Example 2: Attractors for z 3 − 1 = 0 by the Newton
method with step size control (τ = 0.1).
with a rational function r, whose denominator has
a degree strictly greater than its numerator. We see
that the vector field of Nq may be decomposed into
a far field dominating for |z|
0, where r(z) ≈ 0,
and a near field ruled by the poles of r. The far
field senses the global properties of q. In particular,
starting at z0 far away from 0, a single Euler step
with h = deg q = p brings us to
p
c
1
z0
1
z0 − p
+ 2 + r(z0 ) ≈ − c =
ζk ,
p
p
p
p
0.9
0.8
97
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Fig. 7 Example 2: Performance of the classical Newton
method and the Newton method with adaptive step size control (with τ = 0.1) for the starting point x0 = (−0.5, 0.1).
is the set of the roots of q; i.e. we
where
arrive at a point “close” to the mean value of the
roots of q. Incidentally, for |z|
0, the term z p
in q is dominant and, thus, q resembles a polynomial
with a zero of multiplicity p “relatively close” to the
origin. This argument, in turn, would suggest that
a step size of h = p could be an appropriate choice;
February
23,
S0218348X11005191
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
16:15
0218-348X
H. R. Schneebeli & T. P. Wihler
cf. Remark 2. We therefore notice
that, knowing
the formal representation q(z) = pk=0 ck z k rather
than its associated vector field Nq, we could directly
start the Newton iteration with an initial value close
to − pc . A possible problem, however, is that − pc
could coincide with a pole of Nq, as for qa (z) =
z p − a with p > 1. Close to the pole, step size control of the ODE solver matters. A corresponding
algorithm has to find a viable compromise between
staying in the attractor of a root ζk and minimizing
the number of “small” steps until quadratic convergence toward ζk sets in.
10-1
Reference solution (h<<1)
Adaptive Newton method
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9 -5
10
Some statistical data
10-4
10-2
10-3
10-1
100
en-1
One of many possibilities to address the performance of a numerical root finder quantitatively is
to look at its rate of convergence. To this end, for a
given starting value x0 in the attractor of a zero x,
consider the iterates x1 , x2 , . . . resulting from the
numerical scheme, and introduce the error in the
nth iteration:
en = x − xn .
Then the rate of convergence ρ is defined by
en = Ceρn−1 ,
100
en
98
2011
n = 1, 2, . . . ,
or, equivalently,
+ ρ ln en−1 ,
ln en = C
n = 1, 2, . . . ,
(13)
are suitable constants. Here, we assume
where C, C
ideally that ρ tends to a limit as n → ∞, i.e. the
graph of (13) in a (ln en−1 , ln en )-coordinate system is asymptotically a straight line. Note, however,
that then, under sufficient regularity assumptions,
the overwhelming majority of values {xn }n>n0 , for
some n0 ≥ 1, resulting from the Newton method
will belong to a region of quadratic convergence
close to x. In particular, this approach does not take
0
,
into account the behavior of the iterates {xn }nn=1
i.e. before the zone of fast convergence, and is thus
quite insensitive to the problems discussed here. In
numerical experiments, the situation is less ideal.
Clearly, from the practical point of view, we aim
at computing a relatively small number of iterates,
and the infinite tail close to the exact root remains
untouched. In this case, the convergence rate ρ may
only be measured empirically, and it is supposed
that the values of ρ stabilize near a constant in a
finite range of n ≤ n0 . Henceforth, for the sake of
clarity, we shall denote an empirically determined
convergence rate by ρ. For the experiments in this
Fig. 9 Example 2: Convergence graphs for the Newton
method with and without adaptive step size control for the
starting point x0 = (−0.5, 0.1).
paper, we determine ρ by applying a least squares
approximation to (13) (averaged over all computed
iterations n ≥ 1) for the unknown parameters ρ
and C.
In Fig. 9 we plot the convergence graphs corresponding to Fig. 7, i.e. Example 2 with a starting
value x0 = (−0.5, 0.1). We clearly see the quadratic
convergence of the adaptive Newton method; the
Newton scheme with relaxation (h
1) converges,
as expected, only linearly.
In Table 1, we compare the performance of
the traditional Newton method with the adaptive
scheme for Examples 1 and 2 examples. In both,
examples the information is based on 104 initial values which have been picked randomly in the coordinate range of the corresponding experiment. We list
the percentage of convergent iterations, the average
number of iterations necessary to obtain an absolute accuracy of at least 10−8 , and the average convergence rate as introduced above. Here, an iteration is considered convergent if it approaches the
Table 1
Performance Data for Examples 1 and 2.
Example 1
Example 2
on [−3, 7]×[−4, 6] on [−5, 5]×[−5, 5]
Trad. Adaptive Trad. Adaptive
Method Method Method Method
Avg. Rate ρb
Avg. Nr. of Iter.
% of Convergent
Iterations
1.66
20.24
1.89
20.85
1.73
17.14
84.07%
89.30%
90.40%
1.91
9.13
99.64%
February
23,
S0218348X11005191
2011
16:15
0218-348X
Fractals 2011.19:87-99. Downloaded from www.worldscientific.com
by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only.
The Newton–Raphson Method and Adaptive ODE Solvers
“correct” zero, i.e. the zero that is located in the
same “exact” attractor as the initial value; the exact
attractors are approximately determined using reference solutions (i.e. the Newton scheme with relaxation, h
1). The benefits of step size control are
clearly visible. In both examples, we see a noticeable improvement in the average convergence rate.
In particular, in Example 2, the convergence is
nearly quadratic; in addition, almost all iterations
converge and the number of iterations is reduced
by approximately 50% compared to the traditional
method.
5. CONCLUSIONS
This paper blends various known ideas about
numerical solvers in an unusual manner. Solving
equations and solving ODEs, two seemingly different tasks, appear to be linked intimately — and we
might add solvers for extrema. Associating a vector field with any function f : Rn → Rn leads to
ODEs in a natural way. Looking at how the Newton
method acts on vector fields was key to understanding some of the dynamics of this solver.
In low-dimensional examples, step size control
seems to be very beneficial. What about the
high-dimensional cases that occur in practical applications? There, the priorities are completely different and additional issues arise. For example,
how do we reliably compute the Jacobian in a
reasonable amount of time? How do we deal with
ill-conditioned matrices? How do we limit the number of function evaluations? None of these questions
was addressed here. Further questions still come to
mind, such as whether Euler’s ODE solver with step
size control is able to avoid chaotic dynamics when
solving nonlinear equations f (x) = 0 in dimensions
n > 2 or possibly n
2.
This article demonstrates that chaos does not
necessarily cast a cloud over numerical solvers for
nonlinear equations. Where has the chaos gone?
Maybe behind the size of the pixels on our screen,
or maybe to higher dimensions.
REFERENCES
1. A. Cayley, The Newton–Fourier imaginary problem,
Am. J. Math. 2 (1879) 97.
99
2. G. Julia, Mémoire sur l’iteration des fonctions
rationnelles, J. Math. Pure Appl. 8 (1918) 47–245.
3. P. Fatou, Sur les équations fonctionnelles (French),
Bull. Soc. Math. France 47 (1919) 161–271.
4. J. H. Curry, L. Garnett and D. Sullivan, On the
iteration of a rational function: computer experiments with Newton’s method, Comm. Math. Phys.
91 (1983) 267–277.
5. S. Smale, On the efficiency of algorithms of analysis,
Bull. Am. Math. Soc. 13 (1985) 87–121.
6. B. I. Epureanu and H. S. Greenside, Fractal basins
of attraction associated with a damped Newton’s
method, SIAM Rev. 40 (1998) 102–109.
7. J. W. Neuberger, Continuous Newton’s method for
polynomials, Math. Intel. 21 (1999) 18–23.
8. H.-O. Peitgen and P. H. Richter, The Beauty of
Fractals (Springer-Verlag, New York, 1986).
9. J. Jacobsen, O. Lewis and B. Tennis, Approximations of continuous Newton’s method: an extension
of Cayley’s problem, Electron. J. Differ. Eq. 15
(2007) 163–173.
10. A. Quarteroni, R. Sacco and F. Saleri, Numerical
mathematics, Texts Appl. Math., Vol. 37 (SpringerVerlag, New York, 2000).
11. P. Deuflhard and F. Bornemann, Scientific Computing with Ordinary Differential Equations, Texts
Appl. Math., Vol. 42 (Springer-Verlag, New York,
2002).
12. E. Hairer, S. P. Nørsett and G. Wanner, Solving
Ordinary Differential Equations, Vol. I (SpringerVerlag, New York, 1993).
13. P. Deuflhard, Newton methods for nonlinear problems, Springer Ser. Comput. Math., Vol. 35
(Springer-Verlag, Berlin, Heidelberg, 2004).
14. M. Drexler, I. J. Sobey and C. Bracher, On the Fractal Characteristics of a Stabilised Newton Method,
Tech. Report NA-95/26, Computing Laboratory,
Oxford University, 1995.
15. J. L. Varona, Graphic and numerical comparison
between iterative methods, Math. Intel. 24 (2002)
37–46.
16. J. E. Dennis, Jr. and R. B. Schnabel, Numerical
Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall Ser. Comput. Math.
(Prentice–Hall, Englewood Cliffs, NJ, 1983).
Descargar