February 23, 2011 16:15 0218-348X S0218348X11005191 Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. Fractals, Vol. 19, No. 1 (2011) 87–99 c World Scientific Publishing Company DOI: 10.1142/S0218348X11005191 THE NEWTON–RAPHSON METHOD AND ADAPTIVE ODE SOLVERS HANS RUDOLF SCHNEEBELI Kantonsschule CH-5400 Baden, Switzerland [email protected] THOMAS P. WIHLER∗ Mathematics Institute, University of Bern Sidlerstrasse 5, CH-3012 Bern, Switzerland [email protected] Received November 10, 2009 Revised October 21, 2010 Accepted November 1, 2010 Abstract The Newton–Raphson method for solving nonlinear equations f (x) = 0 in Rn is discussed within the context of ordinary differential equations. This framework makes it possible to reformulate the scheme by means of an adaptive step size control procedure that aims at reducing the chaotic behavior of the original method without losing the quadratic convergence close to the roots. The performance of the modified scheme is illustrated with a few low-dimensional examples. Keywords: Newton–Raphson Method; Euler Method; Ordinary Differential Equations; Dynamical Systems; Chaotic Behavior; Step Size Control; Adaptive Time-Stepping Methods. ∗ Corresponding author. 87 February 23, S0218348X11005191 88 2011 16:15 0218-348X H. R. Schneebeli & T. P. Wihler 1. ROOT FINDING WITH THE NEWTON–RAPHSON METHOD — HOW EFFICIENT IS IT? The numerical solution of nonlinear equations Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. f (x) = 0, f : Rn → Rn , (1) by the Newton–Raphson method (Newton method, for short) is often accompanied by two somewhat incompatible effects: quadratic convergence and chaotic behavior. The first of the two predicates concerns the local behavior of the algorithm. The second one touches upon some global aspects. Indeed, typically (i.e. under certain conditions that often hold) the Newton method generates a sequence of approximations that converges quadratically to a solution x of (1) for any initial values “sufficiently close” to x. Hence, from a local point of view, the Newton method is quite efficient. Evidently, however, the amount of work needed to find suitable starting values for a corresponding iteration is a separate issue. In this context, Cayley1 posed a global question: What are the attractors of the roots of a complex polynomial when the Newton method is applied? Note that in that case, the Newton method corresponds to an iterated rational map. Here, the work of Julia2 (see also Fatou3 ) showed the high complexity of fractal boundaries (Julia sets) separating different attractors. In the 1980s, chaos and fractals became popular catchwords. Stunning pictures attracted the interest of the public at large; see, e.g. Ref. 4. Such pictures attempted to answer Cayley’s question empirically, sometimes under questionable conditions, using numerical methods based on floating point operations and, of course, computer screens with still-modest resolution. Did the graphics reveal the mathematical truth? Smale’s paper5 paved the way for deeper insight and new results. One of many possible ways to assess the chaotic behavior of the Newton method for the numerical solution of (1) is to consider the associated Julia set f . Several definitions of fractal dimension are at our disposal in order to describe the complexity and extent of the boundaries separating various attractors (cf., e.g. Ref. 6). In the present context the box-counting dimension is a plausible choice. Take, for example, the Julia set f corresponding to f (z) = z 3 − 1 in the complex plane C; cf. Fig. 1. It is well known that f has a fractal dimension d with 1 < d < 2. This agrees with our visual impression. Indeed, f occupies a significant part of the plane and is not nearly as simple as a line segment. Fig. 1 Julia set f ⊂ C and attractors for f (z) = z 3 − 1 = 0; three different colors distinguish the three Wada basins associated with the three solutions (each of which is marked by a small circle). In each open neighborhood of any point of f there are points belonging to all the attractors (also called Wada basins) of the three roots of f . The Julia set is invariant under the action of the Newton iteration, i.e. the scheme permutes the points of f . Hence, initial values in the Julia set are bound to stay there forever, at least in exact arithmetic. We are thus led to the following question: How efficient is the Newton method if the attractors are unknown and the search for initial values, for which the iterates converge quadratically to some root, is considered part of the problem? For the current example, f (z) = z 3 − 1, the Newton method iterates a rational function in C with a pole of order 2 at the origin 0. Close to the pole, unlimited accelerations occur and the dynamics of the Newton method become uncontrolled. It seems obvious that a sequence of points is unlikely to be quickly convergent and impetuous at the same time, and it is reasonable to assume that this observation is related to chaos, as seen, for example, in Fig. 1. To sum up, moving starting points to a sufficiently small neighborhood of a zero of f might not always be a well-conditioned procedure. For example, iterates may approach the area of quadratic convergence at a low rate, or they may visit various attractors until they actually approach some root. Hence, from a global point of view, the chaotic February 23, S0218348X11005191 2011 16:15 0218-348X Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. The Newton–Raphson Method and Adaptive ODE Solvers behavior of the Newton method seems to reduce, to a certain extent, the high efficiency of the scheme. In this paper, following, e.g. Refs. 5, 7 and 8, we shall identify the Newton method as the numerical discretization of a specific ordinary differential equation (ODE) by the explicit Euler method, with a fixed step size h = 1. This ODE constitutes a continuous version of the Newton method. Its associated dynamics — and those of its discretization — can be represented by a vector field. This allows us to compare the continuous behavior with the numerical approximations. It seems interesting to explore other numerical ODE solvers, such as Runge–Kutta methods, which are more sophisticated than Euler’s method (see, e.g. Ref. 9). Surprisingly, however, numerical experiments show that the convergence rates of the resulting root finding schemes are usually not competitive with that of Euler’s method; incidentally, they also show some considerable chaos. Hence, in this paper, we shall suggest an alternative approach. Referring to ideas from Ref. 10 (see also Refs. 11 and 12, for example), we apply an automatic step size control procedure to Euler’s method and thereby obtain a simple adaptive Newton iteration scheme (see also Ref. 13 and the references therein for an extensive review of the Newton method and adaptive algorithms; in addition, we mention Refs. 6, 14 and 15 for different variations of the classical Newton scheme). The goal of the modified method is to deal with singularities in the iterations more carefully, and, thus, to tame the chaotic behavior of the standard Newton algorithm. In order to obtain good convergence properties, a step size of h = 1 is used whenever possible. We will consider two examples for which the Newton method produces fractal Julia sets and will show that the refined algorithm operates in attractors with almost smooth boundaries. Moreover, numerical tests demonstrate an improved convergence rate not matched on average by the classical Newton method. We note that, compared to more sophisticated numerical ODE solvers (like the ones used in, e.g. Ref. 9), the step size control applied in this work does not primarily emphasize following as closely as possible the dynamics of the continuous flow described by the ODE. In fact, it aims more at providing a sort of funnel that retains a sequence of approximations inside the same attractor and leads it toward a neighborhood of the corresponding zero in which quadratic convergence for the Newton method is guaranteed. In addition, we remark that 89 the ODE of the continuous version of the Newton method describes a flow that slows down exponentially when approaching a root. Hence, in order to find a sufficiently accurate solution of (1) in a fast way, it is beneficial to utilize a numerical method that quickly leaps close to a zero irrespective of the dynamics of the continuous flow. Euler’s method with step size h = 1 is a good candidate: although it might not simulate the continuous dynamics as accurately as other more sophisticated numerical ODE solvers, the quadratic convergence close to a root results in an efficient advancement of the discrete flow. Furthermore, let us remark that there is a large application and research area where methods related to the continuous version of the Newton method are considered in the context of nonlinear optimization. Some of these schemes count among the most efficient ones available for special purposes. The efficiency results from avoiding the inversion of Jacobian matrices or from circumventing the solution of large linear systems. Such issues, however, are out of the scope of this article. Indeed, its main purpose is to show that new light can be shed on a traditional root finder. The perspective adopted here is to relate the discrete dynamical system given by the Newton method to a continuous one defined by an ODE — in geometrical terms a vector field. It is remarkable to realize that the Newton scheme transforms a vector field in such a way that the field is normalized close to any regular fixed point. This transform, however, may create new singularities in the transformed field. Looking at the vector field in a global way will be crucial in finding improved solvers retaining the celebrated efficiency of the Newton method close to a regular solution. The endeavor will be to tame the tendency of Newton iterations to switch between several attractors of solutions in the presence of singularities caused by the Newton method itself. This geometric insight opens a way to derive solvers for nonlinear equations less prone to chaotic behavior than the original Newton method. The paper is organized as follows. In Sec. 2, we comment on the connection between the discrete and continuous versions of the Newton method and define a transformation of vector fields aptly called the Newton–Raphson transform (NRT). Furthermore, we show that the Newton method can be obtained as the numerical discretization of a specific ODE by the explicit Euler method. In Sec. 3, the possible difference between local and global February 23, S0218348X11005191 90 2011 16:15 0218-348X H. R. Schneebeli & T. P. Wihler root finding is addressed. Moreover, Sec. 4 presents an adaptive step size algorithm for the Newton method. Finally, Sec. 5 contains a summary and some open questions. Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. 2. THE NEWTON METHOD AND ODEs We consider a function f : Rn → Rn whose zeros are to be determined. In what follows, f is assumed to be sufficiently smooth (twice continuously differentiable usually suffices). Let Df denote its Jacobian. Now, consider the following iterative procedure to find a zero x of f : Start with a value x0 that is assumed to be “close enough” to x, then solve the linearized version, f (x0 ) + Df (x0 ) · ∆x0 = 0, of the equation f (x0 + ∆x0 ) = 0 for the increment ∆x0 ; a possibly improved approximation is now given by x1 = x0 + ∆x0 . Repeat this procedure as long as necessary. This is the classical Newton method. Numerical experiments tend to show quadratic convergence rates of the scheme, until the preset tolerances are met or limitations of the computer’s floating point arithmetic cause the iterates to end up in a finite cycle. In order to guarantee existence and uniqueness of the increments in this algorithm, we implicitly assume that the Jacobians are regular in each step. Here we note that, although this somewhat idealistic assumption may often be satisfied, issues such as, e.g. the condition number of the Jacobian need to be addressed carefully in order to obtain robust numerical answers. The above picture is perfectly adequate for the local study of the algorithm. In order to deal with global questions as well, we propose a different point of view. This will be discussed in the following sections. 2.1. Vector Fields — From Equations to ODEs Any nonlinear map f : Rn → Rn gives rise to a vector field on the domain of f in the following way. At point x attach the vector f (x) (to the tangent space at x). This vector field defines a flow on the domain of f . The fixed points of this flow are precisely the zeros of f that we are looking for. The behavior of the flow near a fixed point is described up to nonlinear terms by the Jacobian Df at the fixed point. In the case that all real parts of the eigenvalues of Df are negative, the fixed point is attractive. In general, this favorable condition is, of course, not fulfilled. Interestingly, however, it may typically be retrieved by applying a suitable transformation to the vector field; this will be discussed in the next section. If the fixed point is attractive, we might use the flow to push some initial value x0 closer and closer to the fixed point. That is, we use the differential equation ẋ = f (x) and the initial condition x(0) = x0 to solve f (x) = 0 approximately. This motivates a numerical procedure whereby we avoid formal solutions of the initial value problem and rely on some suitable numerical ODE solvers. Plausibly, this might be done by opening up a well-defined toolkit (see, e.g. Ref. 12), traditionally applied to ODEs, that can now be utilized to solve nonlinear equations as in (1). As usual, choosing an efficient tool will require some additional experience, insights, and ideas. 2.2. The Newton–Raphson Transform For simplicity, we assume that 0 is a regular value of f : Rn → Rn . This means that the Jacobian Df is nonsingular close to every zero of f . As mentioned before, we will now transform the vector field defined by f in such a way that the new vector field Nf only has fixed points of the most favorable kind. This happens if, for all zeros x of f , the Jacobian satisfies Df = −Id, with Id denoting the identity map. The following transformation, which we call the Newton–Raphson transform (NRT), does the trick: N : f → Nf = −Df −1 · f . Supposing that f is sufficiently smooth on an open domain Df ⊆ Rn , let us propose the following facts: (i) Nf is a vector field whose domain DNf is the subset of Df on which Df is invertible. (ii) If f is linear, then Nf = −Id. (iii) Suppose that x is an isolated zero of f ; then the Taylor expansion of Nf at x is given by Nf (x + ∆x) = Nf (x) + DNf (x) · ∆x + O(∆x2 ), where · signifies the Euclidean norm in Rn . Using the product rule and the fact that f (x) = 0 results in D(−Df −1 · f )(x) · ∆x = −∆x. February 23, S0218348X11005191 2011 16:15 0218-348X The Newton–Raphson Method and Adaptive ODE Solvers This shows that the NRT always standardizes the vector field in the neighborhood of a zero and thereby mimics the vector field of −Id at 0 (with respect to ∆x), up to nonlinear effects: Nf (x + ∆x) = −∆x + nonlinear terms in ∆x. (2) Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. Incidentally, this result can be used to explain the high convergence rate of the Newton method. In addition, we see that x1 , x2 , x3 , . . . of the exact solution x(t) at the points t = h, 2h, 3h, . . . . Then the derivative ẋ at any discretization point tn = nh, n ∈ N, can be approximated as follows: ẋ(tn ) ≈ x(tn + h) − x(tn ) xn+1 − xn ≈ . h h Hence, replacing the exact derivative in (4) with its approximation leads to xn+1 − xn ≈ Nf (xn ), h ((x + ∆x) − x) · Nf (x + ∆x) ≈ ∆x · (−∆x) = −∆x2 < 0. and we are able to define the iterative scheme Hence, the NRT at x+∆x is a descent direction in the direction of the zero x. (iv) The set Sf = {x ∈ Df : Df (x) is singular} (3) is closed in the domain Df . Thus, the domain DNf = Df \Sf is open. Therefore, since all zeros of f are regular by assumption, each one of them is located in an attracting neighborhood contained in DNf . 2.3. Euler Meets Newton Let us focus on the ODE related to the new (transformed) vector field Nf . Note that all of its fixed points are surrounded by a local attractor with convenient properties. We consider the initial value problem ẋ(t) = Nf (x(t)), 91 x(0) = x0 , (4) with an initial value x0 that is assumed to be sufficiently close to a root x of f . Recalling (2) with ∆x(t) = x(t) − x, and neglecting terms of higher order, this ODE may be written as ẋ(t) = Nf (x(t)) = Nf (x + ∆x(t)) ≈ −∆x(t) = x − x(t). (5) Letting ∆x0 = x0 − x, we find an explicit solution, ∆x(t) = exp(−t)∆x0 . Hence, the linearized dynamics of the vector field Nf carries an initial point x0 exponentially fast toward a zero x of f . We remark, however, that it takes infinitely long to reach the root x, since the flow is slowing down in proportion to x(t) − x, i.e. the distance between the actual position and the fixed point (if ∆x0 = 0). We will now take a look at the numerical solution of (4) by the explicit Euler method. To this end, fix a step size h > 0 and consider approximations xn+1 = xn + h Nf (xn ), n = 1, 2, 3, . . . , (6) with a given initial value x0 . Using a step size h = 1, this is exactly the classical Newton method for the solution of the nonlinear equation (1). 3. LOCAL AND GLOBAL PERSPECTIVES Consider the first update x1 = x0 + h Nf (x0 ) in the iteration (6). Referring to (5), we have the (approximate) equality x1 − x ≈ x0 − x + h · (x − x0 ). Now choose h = 1 (i.e. the traditional Newton method) in order to move x0 to x in one step! This is an interpretation of the Newton method in the framework of dynamical systems and numerical ODE solvers. We see that a single Euler step of size h = 1 would suffice to reach the fixed point at x up to errors resulting from nonlinear terms in x − x. Our analysis indicates that the Newton method for solving f (x) = 0 will typically perform very well close to a zero of f . But what happens if the iteration is started with an initial guess that is relatively far away from a zero? Let us look at a simple example. Suppose we use the Newton method for solving exp(x)−p = 0, with some given p > 1. The example is chosen such that the NRT does not introduce new singularities in the vector field. Indeed, the formula for the iteration step, xnew = x − exp(x) − p = x − 1 + p exp(−x), exp(x) formally works for any x. The positive real axis is contained in the attractor of the unique solution x = ln p. We see, however, that for (positive) initial values far from the solution, a significant number of steps just moves about one unit to the left, since the term p exp(−x) is small for x 0. February 23, S0218348X11005191 Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. 92 2011 16:15 0218-348X H. R. Schneebeli & T. P. Wihler Hence, the convergence is, at least initially, relatively slow. Of course, in this special case, good initial values may be deduced by looking at the floating point representation of x. For a general purpose root finder such tricks are, however, not available. The above example illustrates that step size control in the Newton algorithm might be quite useful, even in good natured examples. Let us now turn to more realistic examples for which the Jacobian Df may become ill-conditioned or even singular. To this end, let us consider again the singular set Sf from (3). This is a potential source of trouble induced by the NRT. More precisely, close to the points of Sf , at least one of the eigenvalues of the Jacobian matrix Df is 0 or nearly so, and, hence, we expect very high accelerations in the discrete dynamics of the Newton iteration in the direction of the corresponding eigenvector. It is therefore quite reasonable to suspect that such instabilities are a potential cause of the sensitive dependence of the Newton method on the initial values (i.e. the chaotic behavior away from the attractors). It is well known that relaxation (also called line search parametrization) in the Newton method (i.e. the use of a suitable step size h < 1 in (6)) may enhance the stability of the numerical approximations; however, it is clear that the quadratic convergence behavior cannot be expected. On the other hand, as mentioned above, working with h = 1 globally may lead the Newton method to be chaotic, i.e. numerically unstable (particularly if the starting value is not sufficiently close to a zero). Traditionally, the Newton method is bound to a local perspective. If a good starting point for an iteration is taken for granted, then quadratic convergence can typically be expected. What happens, however, if we adopt a global perspective, i.e. if, for example, we perform a possibly slow search for an initial value preceding the phase of quadratic convergence? Would a suitably modified scheme be able to reduce the effort required to find suitable initial values? Would this result in smoothing out the boundaries of attractors of the various roots? Here, recognizing the connection between the Newton method and numerical ODE solvers allows for improved flexibility in the choice of tools and approaches for solving equations within a global framework. For example, using variable step sizes h in the explicit Euler scheme for the solution of (4) could provide an interesting balance between stability and fast convergence. Here, the idea is to resolve the behavior of the ODE (4) close to the singular points in Sf more accurately and thereby to obtain better numerical approximations. In order to decide how to choose the local step sizes, we formulate the following (natural) requirements: (i) use step size h = 1 whenever possible to retain the quadratic convergence, in particular close to a solution; (ii) if high accelerations occur close to points in Sf , reduce the step size; and (iii) minimize the number of function evaluations. Since the behavior of the solution x(·) of the continuous root solver (4) might be quite difficult to predict in practice, it would be desirable to adjust the local step sizes at run time, without the need for a lot of a priori knowledge. For this purpose, an adaptive algorithm can be used. Such a procedure is able to extract local information on the solutions and to control the step sizes automatically. This will be discussed in detail in the next section. 4. ADAPTIVITY FOR THE NEWTON METHOD We shall now use the link between the Newton method and the explicit Euler scheme applied to the ODE (4), i.e. the continuous version of the Newton method. More precisely, we find an automatic step size control algorithm for the Newton method by looking at adaptive step size procedures for numerical ODE solvers. In this article, we shall refer to an idea presented in Ref. 10 (see also Ref. 12, for example). 4.1. Step Size Control In order to design an adaptive Newton algorithm, we will use a so-called error indicator. Basically, this is a computable local upper bound on the error that is able to tell us whether the error in the current iteration is large or small; if the error is found to be too large, the corresponding step is recomputed with a smaller step size. Let us recall Euler’s method (6) for the discretization of the continuous root finder (4): xn+1 = xn + hn Nf (xn ), with starting value x0 . (7) February 23, S0218348X11005191 2011 16:15 0218-348X Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. The Newton–Raphson Method and Adaptive ODE Solvers Here, in contrast to the standard Newton scheme (for which we choose h = hn = 1 in every step), we shall keep the step size hn flexible. For notational simplicity, we consider the one-dimensional case f : R → R only; replace Nf by a function g, and drop the subscript ·n in the step size h. Furthermore, we fix a certain time tn = nh, n ∈ N, and assume that the exact solution x(tn ) of (4) is equal to its corresponding numerical approximation denoted by xn . In practice, of course, this is typically not the case; however, assuming that the previous iterations in Euler’s method are sufficiently accurate, it is reasonable that x(tn ) and xn are “very close” to each other. In order to obtain a computable error bound at tn+1 = tn +h, we shall compute two different numern+1 at tn+1 in such ical approximations xn+1 and x n+1 | takes the a way that the difference |xn+1 − x role of an error indicator. Applying a Taylor expansion and recalling (4), we find x(tn + h) = x(tn ) + hẋ(tn ) + Hence, using (7), we obtain the following error bound: h2 g(xn )g (xn ) + O(h3 ). 2 (9) Next, we shall compute an alternative (improved) approximation x n+1 of x(tn+1 ) by performing two steps of the explicit Euler method of size h2 . Let us first consider the numerical solution x n+ 1 corre2 sponding to tn + h2 : x n+ 1 = xn + 2 h g(xn ). 2 Then an additional step of size h2 results in the following numerical approximation of the exact solution x(tn+1 ): n+ 1 + x n+1 = x 2 h g( xn+ 1 ) 2 2 h h h = xn + g(xn ) + g xn + g(xn ) . 2 2 2 leads to h g(xn ) 2 h h 2 g(xn ) + g (xn )g(xn ) + O(h ) + 2 2 x n+1 = xn + = xn + hg(xn ) + h2 g (xn )g(xn ) + O(h3 ). 4 Thus, recalling (8), we get x(tn + h) − x n+1 = h2 g (xn )g(xn ) + O(h3 ). 4 (11) Subtracting (11) from (9) twice gives n+1 ) = O(h3 ), x(tn + h) − xn+1 − 2(x(tn + h) − x and, therefore, we obtain the following a posteriori error estimate: xn+1 − xn+1 ) + O(h3 ). x(tn + h) − xn+1 = 2( h2 ẍ(tn ) + O(h3 ) 2 h2 = xn + hg(xn ) + g(xn )g (xn ) + O(h3 ). 2 (8) x(tn + h) − xn+1 = 93 (10) Let us have a look at the error between x(tn+1 ) and x n+1 . A Taylor expansion of the last term in (10) We see that the expression 2( xn+1 − xn+1 ) is a computable quantity and (neglecting the O(h3 ) terms) can be used as an error indicator in each individual step. Although the celebrated quadratic convergence rate of the Newton method can only be expected to occur close to the root, adaptive step size control serves to safely direct raw approximations into a region of fast convergence. We mention that, recalling (11), we have x(tn + h) − x n+1 = O(h2 ). This implies that x(tn + h) − xn+1 = O(h2 ), and since the initial value at tn was considered exact, it follows that the local truncation error (not the overall error) in Euler’s method applied to (4) converges quadratically in h; see, e.g. Ref. 10 for details. Note that this refers to the local approximation properties in Euler’s method rather than the convergence behavior of the Newton method as an iterative root finding scheme. We generalize the above result to higher dimensions and suggest the following adaptive step size algorithm for the Newton method: Algorithm 1 (Adaptive algorithm ). Newton–Raphson (i) Fix a tolerance τ > 0, and start the iteration (n = 0) with h0 = 1 and an approximate value x0 . February 23, S0218348X11005191 94 2011 16:15 0218-348X H. R. Schneebeli & T. P. Wihler Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. (ii) In each iteration step n = 0, 1, 2, . . . , compute n+1 from xn with the explicit Euler xn+1 and x scheme (7), respectively, (10). Then, if τ (12) xn+1 − xn+1 < , 2 set hn+1 = min(2hn , 1) and go to the next step n ← n + 1; otherwise, set hn ← 12 hn and start the current step again. Remark 1. We point out that we choose the doubling of the step size for small errors, i.e. hn+1 = min(2hn , 1), in such a way that h ≤ 1. We thereby aim to obtain iterations that are mostly of step size 1 and take advantage of the quadratic convergence behavior of the Newton method, in particular close to the root. Note that there exist alternative criteria for the reduction of the step size h; see, e.g. the Armijo–Goldstein rule (e.g. Chapter 6 of Ref. 16, and also Refs. 6 and 14), which is also based on bisection of the line search parameter h (multiplied by a suitable factor of typically 10−4 ) and relies on a similar criterion as (12). Remark 2. In addition, we note that the above algorithm is mainly designed for zeros of multiplicity 1. In the general case, where zeros of possibly higher multiplicity p ≥ 2 need to be found, an appropriate extension of the automatic step size control procedure would be desirable. This is due to the fact that, for zeros of multiplicity p, a step size of h = p (close to the solution) is usually necessary to ensure quadratic convergence. Remark 3. Intuitively, the choice of the tolerance parameter τ in the above adaptive algorithm depends on the vector field x governing the dynamics of the root finding procedure. At this experimental stage, however, a scheme which is able to compute τ in dependence of x is not provided. Indeed, in the following experiments, Sec. 4.2, suitable values for τ have been chosen on the basis of a systematic trial and error procedure similar to bisection. 4.2. Examples The following two examples are chosen in such a way that some phenomena occurring in the Newton method may be visualized. In both cases, we produce two kinds of plots. First, we show the vector field corresponding to ẋ = f (x) and compare it to its NRT. The goal of the visual comparison is to perceive the dynamics related to both ODEs. Note that for the purpose of better presentation, the arrows in the plots of Figs. 2–3 and 5–6 have been scaled by suitable factors. Second, the attractors of solutions for the Newton method are generated approximately by numerical experiments with the traditional Newton scheme and with step size control. 5 5 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −4 −4 −5 −5 −4 −3 −2 Fig. 2 −1 0 1 2 3 4 −5 −5 −4 −3 −2 −1 0 1 Example 1: The original vector field (scaled by a factor of 1.5) and its direction field. 2 3 4 February 23, S0218348X11005191 2011 16:15 0218-348X Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. The Newton–Raphson Method and Adaptive ODE Solvers 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −4 −4 −5 −5 −4 −3 −2 Fig. 3 −1 −5 0 1 2 3 4 −5 −4 −3 −2 −1 0 1 2 3 95 4 Example 1: The NRT of the field (scaled by a factor of 10) and its direction field. Fig. 4 Example 1: Numerical simulations for finding attractors for the Newton method and for the Newton method with step size control (τ = 0.05). Example 1: Two real equations with singular set of real co-dimension 1 Here and in what follows, x1 and x2 denote real variables. We consider the system of real equations −x21 + x2 + 3 =0 f (x1 , x2 ) = −x1 x2 − x1 + 4 in R2 . Its only real solution x = (2, 1) is an attractive fixed point in the vector field associated with ẋ = f (x); see Fig. 2. Note that vectors close to x clearly show a curl. In Fig. 3, however, the vectors point directly to x and the curl is removed by the NRT. The parabola displays the set Sf of co-dimension 1 where the Jacobian is singular. Points close to Sf may move fast and the direction February 23, S0218348X11005191 16:15 0218-348X H. R. Schneebeli & T. P. Wihler 96 Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. 2011 3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 -1 -2 Fig. 5 1 0 2 3 -3 -3 -1 -2 1 0 2 3 Example 2: The vector field corresponding to f (z) = z 3 − 1 (scaled by a factor of 1.5) and its direction field. 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -1.5 Fig. 6 -1 -0.5 0 0.5 1 -1.5 -1 -0.5 0 0.5 1 Example 2: NRT of the vector field for f (z) = z 3 − 1 (scaled by a factor of 5) and its direction field. of the field is bound to sudden changes there. Moreover, in Fig. 4 we present the attractors of x for the traditional and the adaptive Newton methods (with τ = 0.05). Both pictures are based on sampling initial values on the same grid of 1001 × 1001 starting values in the domain [−3, 7] × [−4, 6]. The coloring of the right part of the frames marks the attractor of x. While the Newton method with fixed step size h = 1 shows traces of empirical chaos, step size control is able to tame this unstable behavior considerably. Example 2: Two real equations with singular set of real co-dimension 2 We consider the complex equation z 3 −1 = 0 in C in its real form in R2 (i.e. by separating the real and imaginary parts into a system of two equations); February 23, S0218348X11005191 2011 16:15 0218-348X Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. The Newton–Raphson Method and Adaptive ODE Solvers see Fig. 5 for the corresponding vector field. Then the NRT exhibits a unique singular point at the origin. The example shows symmetry by rotation of 2π/3 about 0. More precisely, there are three sectors starting at 0, each containing a solution on its central line. These sectors are the attractors of the system ẋ = Nf (x). Figure 1 shows the three attractors corresponding to the Newton method for the third roots of unity in C. The Julia set separates the attractors; we refer the reader to Ref. 8, for example, for further details on attractors for the Newton method and fractals. Figure 6 presents the three “exact” sectors, i.e. the attractors of the zeros in the continuous version of the Newton method. The discrete dynamics clearly may jump between the three sectors. The pole at 0 produces high accelerations, as can be seen from the considerable differences in the size and direction of the arrows near 0. Thanks to step size control, the influence of the high accelerations is tamed. In fact, if a tentative step crosses the boundary of the attractor near 0, the tolerance τ is unlikely to be met and the step size is reduced. In Fig. 7, we display the behavior of the classical and the adaptive Newton methods (with τ = 0.1) for the initial point x0 = (−0.5, 0.1); we see that while the classical solution shows large displacements and thereby leaves the original attractor, the iterates corresponding to the adaptive Newton method follow the exact solution of (4) (which we approximate by a numerical reference solution, i.e. (6) with h 1) quite closely and approach the “correct” zero. Generally, step size control is considered successful if it is able to reproduce the Reference solution (h<<1) Adaptive Newton method Standard Newton method 0.7 0.6 0.5 0.4 0.3 0.2 boundaries between the attractors of the vector field Nf rather than the Julia set derived from the discrete dynamics. Referring to Fig. 8, our expectation is well met, at least up to the resolution of the simulation and the graphics device used. The example generalizes to polynomial functions in the following way. Let q : C → C be a monic polynomial of degree p > 0, q(z) = z p +cz p−1 +· · ·. Then its NRT is of the form c z + r(z) , + Nq : z → − p p2 k=1 {ζk }pk=1 0.1 starting point 0 -0.1 -0.6 Fig. 8 Example 2: Attractors for z 3 − 1 = 0 by the Newton method with step size control (τ = 0.1). with a rational function r, whose denominator has a degree strictly greater than its numerator. We see that the vector field of Nq may be decomposed into a far field dominating for |z| 0, where r(z) ≈ 0, and a near field ruled by the poles of r. The far field senses the global properties of q. In particular, starting at z0 far away from 0, a single Euler step with h = deg q = p brings us to p c 1 z0 1 z0 − p + 2 + r(z0 ) ≈ − c = ζk , p p p p 0.9 0.8 97 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Fig. 7 Example 2: Performance of the classical Newton method and the Newton method with adaptive step size control (with τ = 0.1) for the starting point x0 = (−0.5, 0.1). is the set of the roots of q; i.e. we where arrive at a point “close” to the mean value of the roots of q. Incidentally, for |z| 0, the term z p in q is dominant and, thus, q resembles a polynomial with a zero of multiplicity p “relatively close” to the origin. This argument, in turn, would suggest that a step size of h = p could be an appropriate choice; February 23, S0218348X11005191 Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. 16:15 0218-348X H. R. Schneebeli & T. P. Wihler cf. Remark 2. We therefore notice that, knowing the formal representation q(z) = pk=0 ck z k rather than its associated vector field Nq, we could directly start the Newton iteration with an initial value close to − pc . A possible problem, however, is that − pc could coincide with a pole of Nq, as for qa (z) = z p − a with p > 1. Close to the pole, step size control of the ODE solver matters. A corresponding algorithm has to find a viable compromise between staying in the attractor of a root ζk and minimizing the number of “small” steps until quadratic convergence toward ζk sets in. 10-1 Reference solution (h<<1) Adaptive Newton method 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 -5 10 Some statistical data 10-4 10-2 10-3 10-1 100 en-1 One of many possibilities to address the performance of a numerical root finder quantitatively is to look at its rate of convergence. To this end, for a given starting value x0 in the attractor of a zero x, consider the iterates x1 , x2 , . . . resulting from the numerical scheme, and introduce the error in the nth iteration: en = x − xn . Then the rate of convergence ρ is defined by en = Ceρn−1 , 100 en 98 2011 n = 1, 2, . . . , or, equivalently, + ρ ln en−1 , ln en = C n = 1, 2, . . . , (13) are suitable constants. Here, we assume where C, C ideally that ρ tends to a limit as n → ∞, i.e. the graph of (13) in a (ln en−1 , ln en )-coordinate system is asymptotically a straight line. Note, however, that then, under sufficient regularity assumptions, the overwhelming majority of values {xn }n>n0 , for some n0 ≥ 1, resulting from the Newton method will belong to a region of quadratic convergence close to x. In particular, this approach does not take 0 , into account the behavior of the iterates {xn }nn=1 i.e. before the zone of fast convergence, and is thus quite insensitive to the problems discussed here. In numerical experiments, the situation is less ideal. Clearly, from the practical point of view, we aim at computing a relatively small number of iterates, and the infinite tail close to the exact root remains untouched. In this case, the convergence rate ρ may only be measured empirically, and it is supposed that the values of ρ stabilize near a constant in a finite range of n ≤ n0 . Henceforth, for the sake of clarity, we shall denote an empirically determined convergence rate by ρ. For the experiments in this Fig. 9 Example 2: Convergence graphs for the Newton method with and without adaptive step size control for the starting point x0 = (−0.5, 0.1). paper, we determine ρ by applying a least squares approximation to (13) (averaged over all computed iterations n ≥ 1) for the unknown parameters ρ and C. In Fig. 9 we plot the convergence graphs corresponding to Fig. 7, i.e. Example 2 with a starting value x0 = (−0.5, 0.1). We clearly see the quadratic convergence of the adaptive Newton method; the Newton scheme with relaxation (h 1) converges, as expected, only linearly. In Table 1, we compare the performance of the traditional Newton method with the adaptive scheme for Examples 1 and 2 examples. In both, examples the information is based on 104 initial values which have been picked randomly in the coordinate range of the corresponding experiment. We list the percentage of convergent iterations, the average number of iterations necessary to obtain an absolute accuracy of at least 10−8 , and the average convergence rate as introduced above. Here, an iteration is considered convergent if it approaches the Table 1 Performance Data for Examples 1 and 2. Example 1 Example 2 on [−3, 7]×[−4, 6] on [−5, 5]×[−5, 5] Trad. Adaptive Trad. Adaptive Method Method Method Method Avg. Rate ρb Avg. Nr. of Iter. % of Convergent Iterations 1.66 20.24 1.89 20.85 1.73 17.14 84.07% 89.30% 90.40% 1.91 9.13 99.64% February 23, S0218348X11005191 2011 16:15 0218-348X Fractals 2011.19:87-99. Downloaded from www.worldscientific.com by NATIONAL TAICHUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY on 04/22/14. For personal use only. The Newton–Raphson Method and Adaptive ODE Solvers “correct” zero, i.e. the zero that is located in the same “exact” attractor as the initial value; the exact attractors are approximately determined using reference solutions (i.e. the Newton scheme with relaxation, h 1). The benefits of step size control are clearly visible. In both examples, we see a noticeable improvement in the average convergence rate. In particular, in Example 2, the convergence is nearly quadratic; in addition, almost all iterations converge and the number of iterations is reduced by approximately 50% compared to the traditional method. 5. CONCLUSIONS This paper blends various known ideas about numerical solvers in an unusual manner. Solving equations and solving ODEs, two seemingly different tasks, appear to be linked intimately — and we might add solvers for extrema. Associating a vector field with any function f : Rn → Rn leads to ODEs in a natural way. Looking at how the Newton method acts on vector fields was key to understanding some of the dynamics of this solver. In low-dimensional examples, step size control seems to be very beneficial. What about the high-dimensional cases that occur in practical applications? There, the priorities are completely different and additional issues arise. For example, how do we reliably compute the Jacobian in a reasonable amount of time? How do we deal with ill-conditioned matrices? How do we limit the number of function evaluations? None of these questions was addressed here. Further questions still come to mind, such as whether Euler’s ODE solver with step size control is able to avoid chaotic dynamics when solving nonlinear equations f (x) = 0 in dimensions n > 2 or possibly n 2. This article demonstrates that chaos does not necessarily cast a cloud over numerical solvers for nonlinear equations. Where has the chaos gone? Maybe behind the size of the pixels on our screen, or maybe to higher dimensions. REFERENCES 1. A. Cayley, The Newton–Fourier imaginary problem, Am. J. Math. 2 (1879) 97. 99 2. G. Julia, Mémoire sur l’iteration des fonctions rationnelles, J. Math. Pure Appl. 8 (1918) 47–245. 3. P. Fatou, Sur les équations fonctionnelles (French), Bull. Soc. Math. France 47 (1919) 161–271. 4. J. H. Curry, L. Garnett and D. Sullivan, On the iteration of a rational function: computer experiments with Newton’s method, Comm. Math. Phys. 91 (1983) 267–277. 5. S. Smale, On the efficiency of algorithms of analysis, Bull. Am. Math. Soc. 13 (1985) 87–121. 6. B. I. Epureanu and H. S. Greenside, Fractal basins of attraction associated with a damped Newton’s method, SIAM Rev. 40 (1998) 102–109. 7. J. W. Neuberger, Continuous Newton’s method for polynomials, Math. Intel. 21 (1999) 18–23. 8. H.-O. Peitgen and P. H. Richter, The Beauty of Fractals (Springer-Verlag, New York, 1986). 9. J. Jacobsen, O. Lewis and B. Tennis, Approximations of continuous Newton’s method: an extension of Cayley’s problem, Electron. J. Differ. Eq. 15 (2007) 163–173. 10. A. Quarteroni, R. Sacco and F. Saleri, Numerical mathematics, Texts Appl. Math., Vol. 37 (SpringerVerlag, New York, 2000). 11. P. Deuflhard and F. Bornemann, Scientific Computing with Ordinary Differential Equations, Texts Appl. Math., Vol. 42 (Springer-Verlag, New York, 2002). 12. E. Hairer, S. P. Nørsett and G. Wanner, Solving Ordinary Differential Equations, Vol. I (SpringerVerlag, New York, 1993). 13. P. Deuflhard, Newton methods for nonlinear problems, Springer Ser. Comput. Math., Vol. 35 (Springer-Verlag, Berlin, Heidelberg, 2004). 14. M. Drexler, I. J. Sobey and C. Bracher, On the Fractal Characteristics of a Stabilised Newton Method, Tech. Report NA-95/26, Computing Laboratory, Oxford University, 1995. 15. J. L. Varona, Graphic and numerical comparison between iterative methods, Math. Intel. 24 (2002) 37–46. 16. J. E. Dennis, Jr. and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall Ser. Comput. Math. (Prentice–Hall, Englewood Cliffs, NJ, 1983).