Subido por Auto SMS

2188136 UNIDAD 1

Anuncio
Numerical Methods Preliminaries
Professor Ph.D. Henry Arguello Fuentes
Universidad Industrial de Santander
Colombia
2020
High Dimensional Signal Processing Group
www.hdspgroup.com
[email protected]
Professor Henry Arguello Fuentes
Numerical methods
2020
1 / 80
Outline
1
Introduction
2
Binary numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
2 / 80
Introduction: numerical methods applications
(a) Model the probable evolution of a
pathology
(b) Model and simulate the growth of a
tumor
(c) Microscopy super-resolution
(d) Thermal management
Professor Henry Arguello Fuentes
Numerical methods
2020
3 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
4 / 80
Base 2 numbers
Base 10 numbers: Expanded form of the number 1563
1563 = (1 × 103 ) + (5 × 102 ) + (6 × 101 ) + (3 × 100 ).
Professor Henry Arguello Fuentes
Numerical methods
2020
5 / 80
Base 2 numbers
Base 10 numbers: Expanded form of the number 1563
1563 = (1 × 103 ) + (5 × 102 ) + (6 × 101 ) + (3 × 100 ).
Let N denote a positive integer; then the digits a0 , a1 , ..., ak exist so that
N has the base 10 expansion
Base 10 expansion
N = (ak × 10k ) + (ak−1 × 10k−1 ) + · · · + (a1 × 101 ) + (a0 × 100 ),
(1)
Where the digits ak are chosen from 0, 1, ..., 8, 9.
Professor Henry Arguello Fuentes
Numerical methods
2020
5 / 80
Base 2 numbers
Base 2 numbers: Expanded form of the number 1563
1563 =(1 × 210 ) + (1 × 29 ) + (0 × 28 ) + (0 × 27 ) + (0 × 26 ) + (0 × 25 )+
(1 × 24 ) + (1 × 23 ) + (0 × 22 ) + (1 × 21 ) + (1 × 20 ).
So that:
1563 = 1024 + 512 + 16 + 8 + 2 + 1.
Professor Henry Arguello Fuentes
Numerical methods
2020
6 / 80
Base 2 numbers
Let N denote a positive integer; the digits b0 , b1 , ..., bJ exist so that N
has the base 2 expansion
Base 2 expansion
N = (bJ × 2J ) + (bJ−1 × 2J−1 ) + · · · + (b1 × 21 ) + (b0 × 20 ),
(2)
Where each digit bj is either a 0 or 1. Thus N is expressed in binary
notation as
N = bJ bJ−1 · · · b2 b1 b0two .
(3)
Professor Henry Arguello Fuentes
Numerical methods
2020
7 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
8 / 80
Base 2 representation of the integer N
Process: Generate sequences Qk and Rk of quotients and remainders,
respectively. End the process when Qk = 0, for some integer k = J.
Professor Henry Arguello Fuentes
Numerical methods
2020
9 / 80
Base 2 representation of the integer N
Process: Generate sequences Qk and Rk of quotients and remainders,
respectively. End the process when Qk = 0, for some integer k = J.
Example:
𝒌
1563
𝑸𝒌
𝑹𝒌
0
1563/2=
781
1
1
781/2=
390
1
2
390/2=
195
0
3
195/2=
97
1
4
97/2=
48
1
5
48/2=
24
0
6
24/2=
12
0
7
12/2=
6
0
8
6/2=
3
0
9
3/2=
1
1
10
1/2=
0
1
1
1 0 0 0 0 1 1 0
1
1
b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0
Most Significant Bit -MSB
Professor Henry Arguello Fuentes
Numerical methods
Least Significant Bit - LSB
2020
9 / 80
Base 2 representation of the integer N
Exercise 1: Find the base 2 representation of 697
Professor Henry Arguello Fuentes
Numerical methods
2020
10 / 80
Base 2 representation of the integer N
Exercise 1: Find the base 2 representation of 697
Start by dividing the integer N from 2 to calculate Q0 and R0 .
697/2 = 348.5 → Q0 = 348 and R0 = 1
Professor Henry Arguello Fuentes
Numerical methods
2020
10 / 80
Base 2 representation of the integer N
Exercise 1: Find the base 2 representation of 697
Start by dividing the integer N from 2 to calculate Q0 and R0 .
697/2 = 348.5 → Q0 = 348 and R0 = 1
Continue the process until finding Qk = 0, for some integer k = J.
Qk = Qk−1 /2
𝒌
0
𝑸𝒌
𝑹𝒌
697/2= 348
𝟔𝟗𝟕
1
1
2
3
4
5
6
7
8
9
b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0
Professor Henry Arguello Fuentes
Numerical methods
2020
10 / 80
Base 2 representation of the integer N
Solution
𝒌
𝑸𝒌
𝑹𝒌
0
697/2= 348
𝟔𝟗𝟕
1
1
348/2= 174
0
2
174/2=
87
0
3
87/2=
43
1
4
43/2=
21
1
5
21/2=
10
1
6
10/2=
5
0
7
5/2=
2
1
8
2/2=
1
0
9
1/2=
0
1
1 0 1 0 1 1 1 0 0 1
b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0
Then, 69710 = 10101110012
Professor Henry Arguello Fuentes
Numerical methods
2020
11 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
12 / 80
Sequences and Series
Commonly, when you express a rational number in decimal form, you
require infinitely many digits.
1
= 0.3 , the symbol 3 means that the digit 3 is repeated
3
forever to form an infinite repeating decimal.
For example, in
But, the number
1
is the shorthand notation for the infinite series S
3
S = (3 × 10−1 ) + (3 × 10−2 ) + · · · + (3 × 10−∞ )
S=
∞
X
k=1
Professor Henry Arguello Fuentes
1
3(10)−k = .
3
Numerical methods
2020
13 / 80
Sequences and Series
Definition 1.
The infinite series S
S=
∞
X
crn = c + cr + cr2 + · · · + crn + · · · ,
(4)
n=0
where c 6= 0 and r 6= 0, is called a geometric series with ratio r.
Professor Henry Arguello Fuentes
Numerical methods
2020
14 / 80
Sequences and Series
Definition 1.
The infinite series S
S=
∞
X
crn = c + cr + cr2 + · · · + crn + · · · ,
(4)
n=0
where c 6= 0 and r 6= 0, is called a geometric series with ratio r.
Theorem 1. (Geometric Series)
The geometric series has the following properties:
If |r| < 1, then
∞
X
crn =
n=0
c
.
1−r
(5)
If |r| > 1, then the series diverges.
Professor Henry Arguello Fuentes
Numerical methods
2020
14 / 80
Sequences and Series
Example: The series S is given by
n
1
2
∞ X
∞
1
1
1
1
7
S = (7)
+ (7)
+ · · · + (7)
=
,
7
7
7
7
n=1
Professor Henry Arguello Fuentes
Numerical methods
2020
15 / 80
Sequences and Series
Example: The series S is given by
n
1
2
∞ X
∞
1
1
1
1
7
S = (7)
+ (7)
+ · · · + (7)
=
,
7
7
7
7
n=1
which is equal to − 7 +
n
∞
X
1
7
,
7
n=0
Professor Henry Arguello Fuentes
Numerical methods
2020
15 / 80
Sequences and Series
Example: The series S is given by
n
1
2
∞ X
∞
1
1
1
1
7
S = (7)
+ (7)
+ · · · + (7)
=
,
7
7
7
7
n=1
which is equal to − 7 +
n
∞
X
1
7
,
7
n=0
and acording with (5) S = −7 +
Then,
7
1
1−
7
=
7
= 1.16,
6
7
is the shorthand notation for the infinite series S
6
Professor Henry Arguello Fuentes
Numerical methods
2020
15 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
16 / 80
Binary Fractions
A binary fraction is a serie of sums with negative powers of 2, which is
used to express a real number R that lies in the range 0 < R < 1.
Professor Henry Arguello Fuentes
Numerical methods
2020
17 / 80
Binary Fractions
A binary fraction is a serie of sums with negative powers of 2, which is
used to express a real number R that lies in the range 0 < R < 1.
Binary fractions
R = (d1 × 2−1 ) + (d2 × 2−2 ) + · · · + (dn × 2−n ) + · · · ,
(6)
where dj ∈ 0, 1 and 0 < R < 1.
Binary fraction
Representation of R
R=
R = 0.d1 d2 · · · dn · · ·two
Professor Henry Arguello Fuentes
Numerical methods
P∞
j=1 dj (2)
−j
2020
17 / 80
Binary Fractions-Decimal to binary
Process: Generate sequences dk and Fk multiplying by two.
Professor Henry Arguello Fuentes
Numerical methods
2020
18 / 80
Binary Fractions-Decimal to binary
Process: Generate sequences dk and Fk multiplying by two.
Example:
d1 d2 d 3 d 4 d 5 d6 d7
0. 1 0 1 1 0 0 1
𝑗
0.7
𝐹𝑗
0.4
2 (0.4)(2) = 0.8 0
0.8
3 (0.8)(2) = 1.6 1
0.6
4 (0.6)(2) = 1.2 1
0.2
5 (0.2)(2) = 0.4 0
0.4
6 (0.4)(2) = 0.8 0
0.8
7 (0.8)(2) = 1.6 1
0.6
8 (0.6)(2) = 1.2 1
0.2
9 (0.2)(2) = 0.4 0
0.4
…
d9
0 …
𝑑𝑗 𝑓𝑟𝑎𝑐
1 (0.7)(2) = 1.4 1
Professor Henry Arguello Fuentes
d8
1
0.7  0.10110 2
Numerical methods
2020
18 / 80
Binary Fractions-Decimal to binary
Exercise 2: Calculate the binary fraction for 0.6.
Start by multiplying 0.6 by 2, to generate sequences dj and Fj
d1 d2 d3 d4 d5 d6 d7
d8
d9
…
𝑗
0.6
𝐹𝑗
𝑑𝑗 𝑓𝑟𝑎𝑐
1 (0.6)(2) = 1.2 1
0.2
2
3
4
5
6
7
8
…
9
Professor Henry Arguello Fuentes
Numerical methods
2020
19 / 80
Binary Fractions-Decimal to binary
Solution
d1 d2 d3 d4 d5 d6 d7
d8
d9
…
𝑗
0.6
𝐹𝑗
𝑑𝑗 𝑓𝑟𝑎𝑐
0.2
2 (0.2)(2) = 0.4 0
0.4
3 (0.4)(2) = 0.8 0
0.8
4 (0.8)(2) = 1.6 1
0.6
5 (0.6)(2) = 1.2 1
0.2
6 (0.2)(2) = 0.4 0
0.4
7 (0.4)(2) = 0.8 0
0.8
8 (0.8)(2) = 1.6 1
0.6
9 (0.6)(2) = 1.2 1
0.2
…
1 (0.6)(2) = 1.2 1
Professor Henry Arguello Fuentes
0.6 = 0. 1001
Numerical methods
2020
20 / 80
Binary Fractions-Binary to decimal
The base 10 rational number R10 associated to a base 2 binary fraction
R2 can be found using geometric series.
Professor Henry Arguello Fuentes
Numerical methods
2020
21 / 80
Binary Fractions-Binary to decimal
The base 10 rational number R10 associated to a base 2 binary fraction
R2 can be found using geometric series.
Example:
0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · ·
the expression above is writted as
Professor Henry Arguello Fuentes
Numerical methods
2020
21 / 80
Binary Fractions-Binary to decimal
The base 10 rational number R10 associated to a base 2 binary fraction
R2 can be found using geometric series.
Example:
0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · ·
the expression above is writted as
∞
∞
X
X
=
(2−2 )k = −1 +
(2−2 )k
k=1
= −1 +
k=0
1
1−
Professor Henry Arguello Fuentes
1
4
= −1 +
4
1
= .
3
3
Numerical methods
2020
21 / 80
Binary Fractions-Binary to decimal
The base 10 rational number R10 associated to a base 2 binary fraction
R2 can be found using geometric series.
Example:
0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · ·
the expression above is writted as
∞
∞
X
X
=
(2−2 )k = −1 +
(2−2 )k
k=1
= −1 +
k=0
1
1−
then,
1
4
= −1 +
4
1
= .
3
3
1
is the 10 rational number associated to 0.012
3
Professor Henry Arguello Fuentes
Numerical methods
2020
21 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
22 / 80
Binary shifting
Let R be
(7)
R = 0.00000110002 .
Professor Henry Arguello Fuentes
Numerical methods
2020
23 / 80
Binary shifting
Let R be
(7)
R = 0.00000110002 .
Multiplying both sides of (7) by
25 = 32 will shift the binary
point 5 places to the right
32R = 0.110002 .
Professor Henry Arguello Fuentes
Multiplying both sides of (7) by
210 = 1024 will shift the binary
point 10 places to the right
1024R = 11000.110002 .
Numerical methods
2020
23 / 80
Binary shifting
Let R be
(7)
R = 0.00000110002 .
Multiplying both sides of (7) by
25 = 32 will shift the binary
point 5 places to the right
32R = 0.110002 .
Multiplying both sides of (7) by
210 = 1024 will shift the binary
point 10 places to the right
1024R = 11000.110002 .
Taking the difference 1024R − 32R = 11000.110002 − 0.110002 ,
we obtain 992R = 110002 ,
given that 110002 = 2410 we find that,
992R = 24, Therefore R =
Professor Henry Arguello Fuentes
Numerical methods
3
.
124 10
2020
23 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
24 / 80
Scientific Notation
The scientific notation is a standard way to present a real number. It is
obtained by properly shifting the decimal point.
Professor Henry Arguello Fuentes
Numerical methods
2020
25 / 80
Scientific Notation
The scientific notation is a standard way to present a real number. It is
obtained by properly shifting the decimal point.
Examples
0.0000747 = 7.47 × 10−5
31.4159265 = 3.14159265 × 10
9, 700, 000.000 = 9.7 × 109
The Avogadro’s constant used in chemistry = 6.02252 × 1023 .
The quantity 1K = 1.024 × 103 used in computer science.
Professor Henry Arguello Fuentes
Numerical methods
2020
25 / 80
Contents
1
Introduction
2
Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3
Error Analysis
Professor Henry Arguello Fuentes
Numerical methods
2020
26 / 80
Machine Numbers
Machine Numbers
A mathematical quantity x is stored in a computer as a binary approximation given by
x ≈ ±q × 2n .
(8)
The finite binary number q is the mantissa, where 1/2 ≤ q ≤ 1.
The integer n is the exponent.
Professor Henry Arguello Fuentes
Numerical methods
2020
27 / 80
Floating-point format
A real number is stored in a computer as a set of binary numbers
expressing:
The sign
The exponent
The mantissa
𝑆𝑖𝑔𝑛
𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡
𝑀𝑎𝑛𝑡𝑖𝑠𝑠𝑎
The sign is always one bit where, S = 0 if, x > 0 and S = 1, if x < 0.
The amount of bits for the exponent and the mantissa depends on
the precision of the machine.
Professor Henry Arguello Fuentes
Numerical methods
2020
28 / 80
Floating-point format-IEEE 754 standard
Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Professor Henry Arguello Fuentes
Numerical methods
2020
29 / 80
Floating-point format-IEEE 754 standard
Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to
be able to represent both tiny and huge values, but two’s complement.
Professor Henry Arguello Fuentes
Numerical methods
2020
29 / 80
Floating-point format-IEEE 754 standard
Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to
be able to represent both tiny and huge values, but two’s complement.
Then, the exponent is biased by adjusting its value.
Professor Henry Arguello Fuentes
Numerical methods
2020
29 / 80
Floating-point format-IEEE 754 standard
Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to
be able to represent both tiny and huge values, but two’s complement.
Then, the exponent is biased by adjusting its value.
The exponent bias is calculated as bias = 2exp−1 −1, where exp indicates
the amount of bits for the exponent.
Example:
if exp = 15 bits, then, bias = 215−1 − 1 = 16383
Professor Henry Arguello Fuentes
Numerical methods
2020
29 / 80
Floating-point format-IEEE 754 standard
Possible cases:
Sign (S) Exponent (E) Man0ssa (M) Value 0-­‐1 All 0 < E < All 1 M (-­‐1)S (1.M)(2E-­‐bias) 0 1 E=all 1 E=all 1 M=0 M=0 +∞ -­‐∞ 0-­‐1 0-­‐1 0-­‐1 E=all 1 E=all 0 E=all 0 M≠0 M=0 M≠0 NaN 0 (-­‐1)S (0.M)(21-­‐bias) Professor Henry Arguello Fuentes
Numerical methods
2020
30 / 80
Floating-point format
Example: Determine the floating point format to stored the number
59.187510 in a computer with 32 bits of precision.
Professor Henry Arguello Fuentes
Numerical methods
2020
31 / 80
Floating-point format
Example: Determine the floating point format to stored the number
59.187510 in a computer with 32 bits of precision.
1. Find the binary representation of the number 59.187510
59.187510
LBS
Integer part
Decimal part
5910
0.187510
59 2
1 29 2
1 14 2
2
0 7
1 3 2
1 1 2
MSB
1 0
0.1875×2 = 0.375 0
0.375×2 = 0.75 0
0.75×2 = 1.5
1
0.5×2 = 1.0
1
MSB
LBS
111011.00112
Professor Henry Arguello Fuentes
Numerical methods
2020
31 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
Professor Henry Arguello Fuentes
Numerical methods
2020
32 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 28−1 − 1 = 127
Professor Henry Arguello Fuentes
Numerical methods
2020
32 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 1101100112
Professor Henry Arguello Fuentes
Numerical methods
2020
32 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 1101100112
5. Determine the exponent
exp = 5 + bias = 5 + 127 = 13210 = 100001002
Professor Henry Arguello Fuentes
Numerical methods
2020
32 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 1101100112
5. Determine the exponent
exp = 5 + bias = 5 + 127 = 13210 = 100001002
S E M 0 10000100 11011001100000000000000 Professor Henry Arguello Fuentes
Numerical methods
2020
32 / 80
Floating-point format
Example: Determine the floating point format to stored the number
132.2812510 in a computer with 32 bits of precision.
Professor Henry Arguello Fuentes
Numerical methods
2020
33 / 80
Floating-point format
Example: Determine the floating point format to stored the number
132.2812510 in a computer with 32 bits of precision.
1. Find the binary representation of the number 132.2812510
132.2812510
Decimal part
Integer part
LBS
132 2 13210
0 66 2
0 33 2
1 16 2
0 8 2
0 4 2
0 2 2
0
1 2
MSB
1
0
0.2812510
0.28125×2 = 0.5625 0
0.375×2 = 1.125 1
0.125×2 = 0.25
MSB
0
0.25×2 = 0.5
0
0.5×2 = 1.0
1
LBS
10000100.010012
Professor Henry Arguello Fuentes
Numerical methods
2020
33 / 80
Floating-point format
2. Do the proper binary shifting
10000100.010012 = 1.0000100010012 × 27
Professor Henry Arguello Fuentes
Numerical methods
2020
34 / 80
Floating-point format
2. Do the proper binary shifting
10000100.010012 = 1.0000100010012 × 27
3. Calculate the bias
bias = 28−1 − 1 = 127
Professor Henry Arguello Fuentes
Numerical methods
2020
34 / 80
Floating-point format
2. Do the proper binary shifting
10000100.010012 = 1.0000100010012 × 27
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 0000100010012
Professor Henry Arguello Fuentes
Numerical methods
2020
34 / 80
Floating-point format
2. Do the proper binary shifting
10000100.010012 = 1.0000100010012 × 27
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 0000100010012
5. Determine the exponent
exp = 7 + bias = 7 + 127 = 13410 = 100001102
Professor Henry Arguello Fuentes
Numerical methods
2020
34 / 80
Floating-point format
2. Do the proper binary shifting
10000100.010012 = 1.0000100010012 × 27
3. Calculate the bias
bias = 28−1 − 1 = 127
4. Determine the mantissa
Mantissa = 0000100010012
5. Determine the exponent
exp = 7 + bias = 7 + 127 = 13410 = 100001102
S E M 0 10000110 00001000100100000000000 Professor Henry Arguello Fuentes
Numerical methods
2020
34 / 80
Floating-point format
Example: Determine the floating point format to stored the number
59.187510 in a computer with 64 bits of precision.
Professor Henry Arguello Fuentes
Numerical methods
2020
35 / 80
Floating-point format
Example: Determine the floating point format to stored the number
59.187510 in a computer with 64 bits of precision.
1. Find the binary representation of the number 59.187510
59.187510
LBS
Integer part
Decimal part
5910
0.187510
59 2
1 29 2
1 14 2
2
0 7
1 3 2
1 1 2
MSB
1 0
0.1875×2 = 0.375 0
0.375×2 = 0.75 0
0.75×2 = 1.5
1
0.5×2 = 1.0
1
MSB
LBS
111011.00112
Professor Henry Arguello Fuentes
Numerical methods
2020
35 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
Professor Henry Arguello Fuentes
Numerical methods
2020
36 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 211−1 − 1 = 1023
Professor Henry Arguello Fuentes
Numerical methods
2020
36 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 211−1 − 1 = 1023
4. Determine the mantissa
Mantissa = 1101100112
Professor Henry Arguello Fuentes
Numerical methods
2020
36 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 211−1 − 1 = 1023
4. Determine the mantissa
Mantissa = 1101100112
5. Determine the exponent
exp = 5 + bias = 5 + 1023 = 102810 = 100000001002
Professor Henry Arguello Fuentes
Numerical methods
2020
36 / 80
Floating-point format
2. Do the proper binary shifting
111011.00112 = 1.1101100112 × 25
3. Calculate the bias
bias = 211−1 − 1 = 1023
4. Determine the mantissa
Mantissa = 1101100112
5. Determine the exponent
exp = 5 + bias = 5 + 1023 = 102810 = 100000001002
S
E
M
0
10000000100
1101100110000000000000000000000000000000000000000000
Professor Henry Arguello Fuentes
Numerical methods
2020
36 / 80
Floating-point format
The real value associated with a given 32 bit binary is calculated as
!
23
X
S
−i
value = (−1) 1 +
d(23−i) 2
× 2(E−127)
i=1
Where,
S = The sign
E = Exponent
127 = Bias
dj = Bits of the mantissa
Professor Henry Arguello Fuentes
Numerical methods
2020
37 / 80
Floating-point format
Exercise: Find the real value for the binary data:
S E M 0 01010010 01101000000100100000000 Professor Henry Arguello Fuentes
Numerical methods
2020
38 / 80
Floating-point format
Exercise: Find the real value for the binary data:
S E M 0 01010010 01101000000100100000000 value = (−1)S
1+
23
X
!
d(23−i) 2−i
× 2(E−127)
i=1
Professor Henry Arguello Fuentes
Numerical methods
2020
38 / 80
Floating-point format
Exercise: Find the real value for the binary data:
E S M 0 01010010 01101000000100100000000 value = (−1)S
1+
23
X
!
d(23−i) 2−i
× 2(E−127)
i=1
In this example:
S=0
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582
1
4
6
2(E−127) = 2((2 +2 +2 )−127) = 282−127 = 2−45
Professor Henry Arguello Fuentes
Numerical methods
2020
38 / 80
Floating-point format
Exercise: Find the real value for the binary data:
E S M 0 01010010 01101000000100100000000 value = (−1)S
1+
23
X
!
d(23−i) 2−i
× 2(E−127)
i=1
In this example:
S=0
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582
1
4
6
2(E−127) = 2((2 +2 +2 )−127) = 282−127 = 2−45
Thus
value = 1.4065246582 × 2−45
Professor Henry Arguello Fuentes
Numerical methods
2020
38 / 80
Floating-point format
Example: Find the real value for the binary data:
E S M 1 10000100 01000000000000000000000 Ÿ Ÿ 31 30 Ÿ Ÿ 23 22 Ÿ 0 In this example:
S=1
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25
2(E−127) = 2(132−127) = 25
Professor Henry Arguello Fuentes
Numerical methods
2020
39 / 80
Floating-point format
Example: Find the real value for the binary data:
E S M 1 10000100 01000000000000000000000 Ÿ Ÿ 31 30 Ÿ Ÿ 23 22 Ÿ 0 In this example:
S=1
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25
2(E−127) = 2(132−127) = 25
Thus
value = −1.25 × 25 = −40.
Professor Henry Arguello Fuentes
Numerical methods
2020
39 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
40 / 80
Absolute and relative error
Definition 2.
Suppose that b
p is an approximation to p. The absolute error is
Ep = |p − b
p|, and the relative error is Rp = |p − b
p|/|p|, provided that
p 6= 0.
Professor Henry Arguello Fuentes
Numerical methods
2020
41 / 80
Absolute and relative error
Definition 2.
Suppose that b
p is an approximation to p. The absolute error is
Ep = |p − b
p|, and the relative error is Rp = |p − b
p|/|p|, provided that
p 6= 0.
The absolute error is the difference between the true value and
the approximate value.
The relative error expresses the error as a percentage of the true
value.
Professor Henry Arguello Fuentes
Numerical methods
2020
41 / 80
Absolute and relative error
Example: Find the absolute and relative error in the following three
cases:
Professor Henry Arguello Fuentes
Numerical methods
2020
42 / 80
Absolute and relative error
Example: Find the absolute and relative error in the following three
cases:
Real |p|
Approximation p̂
Absolute Error Ep
Relative Error Rp
Professor Henry Arguello Fuentes
x = 3.141592
b
x = 3.14
Ex = |x − b
x|
= 0.001592
Rx = Ex /|x|
= 5.067×10−4
Numerical methods
y = 1, 000, 000
by = 999, 996
Ey = |y − by|
=4
Ry = Ey /|y|
= 0.000004
z = 0.000012
bz = 0.000009
Ez = |z − bz|
= 0.000003
Rz = Ez /|z|
= 0.25
2020
42 / 80
Absolute and relative error
Example: Find the absolute and relative error in the following three
cases:
Real |p|
Approximation p̂
Absolute Error Ep
Relative Error Rp
x = 3.141592
b
x = 3.14
Ex = |x − b
x|
= 0.001592
Rx = Ex /|x|
= 5.067×10−4
y = 1, 000, 000
by = 999, 996
Ey = |y − by|
=4
Ry = Ey /|y|
= 0.000004
z = 0.000012
bz = 0.000009
Ez = |z − bz|
= 0.000003
Rz = Ez /|z|
= 0.25
Observe that as |p| moves away from 1 (greater than or less than) the
relative error Rp is a better indicator than Ep of the accuracy of the approximation.
Professor Henry Arguello Fuentes
Numerical methods
2020
42 / 80
Absolute and relative error
Definition 3.
The number b
p is said to approximate p to d significant digits if d is the
largest nonnegative integer for which
b
|p − p|
101−d
<
.
|p|
2
Professor Henry Arguello Fuentes
Numerical methods
2020
43 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
Professor Henry Arguello Fuentes
Numerical methods
2020
44 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
1−0
if d = 0: 2.07900 × 10− 3 < 102
= 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
Professor Henry Arguello Fuentes
Numerical methods
2020
44 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
1−0
if d = 0: 2.07900 × 10− 3 < 102
= 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
if d = 1:
1−1
2.07900 × 10− 3 < 102
Professor Henry Arguello Fuentes
= 0.5 Xsatisfies
Numerical methods
2020
44 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
1−0
if d = 0: 2.07900 × 10− 3 < 102
= 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
if d = 1:
if d = 2:
1−1
2.07900 × 10− 3 < 102
2.07900 × 10− 3 <
Professor Henry Arguello Fuentes
101−2
2
= 0.5 Xsatisfies
= 0.05 Xsatisfies
Numerical methods
2020
44 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
1−0
if d = 0: 2.07900 × 10− 3 < 102
= 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
if d = 1:
if d = 2:
if d = 3:
1−1
2.07900 × 10− 3 < 102
= 0.5 Xsatisfies
101−2
2.07900 × 10− 3 < 2 = 0.05 Xsatisfies
1−3
2.07900 × 10− 3 < 102 = 0.005 Xsatisfies
Professor Henry Arguello Fuentes
Numerical methods
2020
44 / 80
Absolute and relative error
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10−3
|2.1645|
1−0
if d = 0: 2.07900 × 10− 3 < 102
= 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
if d = 1:
1−1
2.07900 × 10− 3 < 102
= 0.5 Xsatisfies
101−2
if d = 2: 2.07900 × 10− 3 < 2 = 0.05 Xsatisfies
1−3
if d = 3: 2.07900 × 10− 3 < 102 = 0.005 Xsatisfies
1−4
if d = 4: 2.07900 × 10− 3 < 102 = 0.0005 X does not satisfy
Then, ŵ approximate w to 3 significant digits.
Professor Henry Arguello Fuentes
Numerical methods
2020
44 / 80
Absolute and relative error
Other examples:
If x = 3.141592 and b
x = 3.14, then |x − b
x|/|x| = 0.000507 < 10−2 /2.
Therefore, b
x approximates x to three significant digits.
If y = 1, 000, 000 and by = 999, 996, then
|y − by|/|y| = 0.000004 < 10−5 /2. Therefore, by approximates y to six
significant digits.
If z = 0.000012 and bz = 0.000009, then |z − bz|/|z| = 0.25 < 10−0 /2.
Therefore, bz approximates z to one significant digits.
Professor Henry Arguello Fuentes
Numerical methods
2020
45 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
46 / 80
Truncation Error
Truncation error refers to errors introduced when a more complicated
mathematical expression is "replaced" with a more elementary formula.
Professor Henry Arguello Fuentes
Numerical methods
2020
47 / 80
Truncation Error
Truncation error refers to errors introduced when a more complicated
mathematical expression is "replaced" with a more elementary formula.
For example, the infinite Taylor series
2
ex = 1 + x 2 +
x4 x6 x8
x2n
+
+
+ ··· +
+ ···
2!
3!
4!
n!
x6
x8
x4
+
+ .
might be replaced with just the first five terms 1 + x2 +
2!
3!
4!
Then a truncation error appears.
Professor Henry Arguello Fuentes
Numerical methods
2020
47 / 80
Truncation Error
R 1/2 2
Example: Given p = 0 ex dx = 0.544987104184. Determine the accu2
racy of the approximation obtained by replacing the integrand f (x) = ex
x4 x6 x8
with the truncated Taylor series P8 (x) = 1 + x2 +
+
+ .
2!
3!
4!
R 1/2
Determine 0 P8 (x)dx:
Professor Henry Arguello Fuentes
Numerical methods
2020
48 / 80
Truncation Error
R 1/2 2
Example: Given p = 0 ex dx = 0.544987104184. Determine the accu2
racy of the approximation obtained by replacing the integrand f (x) = ex
x4 x6 x8
with the truncated Taylor series P8 (x) = 1 + x2 +
+
+ .
2!
3!
4!
R 1/2
Determine 0 P8 (x)dx:
Z 1/2 0
1 + x2 +
x4
x6
x8
+
+
2!
3!
4!
x=1/2
x3
x5
x7
x9
x+
+
+
+
3
5(2!) 7(3!) 9(4!) x=0
1
1
1
1
1
+
+
+
= +
2 24 320 5376 110592
2109491
=
= 0.544986720817 = b
p
3870720
dx =
Since
|p − b
p|
101−6
= 7.03442 × 10−7 <
= 5 × 106
|p|
2
then, the approximation b
p agrees with the true value to 6 significant digits.
Professor Henry Arguello Fuentes
Numerical methods
2020
48 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
49 / 80
Round-off Error
The accuracy of the representation of a real number stored in a
computer is determined by the precision of the mantissa.
Professor Henry Arguello Fuentes
Numerical methods
2020
50 / 80
Round-off Error
The accuracy of the representation of a real number stored in a
computer is determined by the precision of the mantissa.
The error occurred due to the mantissa precision is the round-off
error.
Professor Henry Arguello Fuentes
Numerical methods
2020
50 / 80
Round-off Error
The accuracy of the representation of a real number stored in a
computer is determined by the precision of the mantissa.
The error occurred due to the mantissa precision is the round-off
error.
The actual number that is stored in the computer may be
chopping or rounding of the last digit.
Professor Henry Arguello Fuentes
Numerical methods
2020
50 / 80
Round-off Error
The accuracy of the representation of a real number stored in a
computer is determined by the precision of the mantissa.
The error occurred due to the mantissa precision is the round-off
error.
The actual number that is stored in the computer may be
chopping or rounding of the last digit.
The computer hardware works with a limited number of digits in
machine numbers, errors are introduced and propagated in
successive computations.
Professor Henry Arguello Fuentes
Numerical methods
2020
50 / 80
Chopping Off versus Rounding Off
Example:
Consider p expressed in normalized decimal form:
p = ±0.d1 d2 d3 · · · dk dk+1 · · · × 10n ,
where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for j > 1.
Professor Henry Arguello Fuentes
Numerical methods
2020
51 / 80
Chopping Off versus Rounding Off
Example:
Consider p expressed in normalized decimal form:
p = ±0.d1 d2 d3 · · · dk dk+1 · · · × 10n ,
where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for j > 1.
If k is the maximum number of decimal digits; then the real number p is
represented by flchop (p), which is given by
flchop (p) = ±0.d1 d2 d3 · · · dk × 10n ,
(9)
Where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for 1 < j ≤ k. The number flchop (p) is
called the chopped floating-point representation of p.
Professor Henry Arguello Fuentes
Numerical methods
2020
51 / 80
Chopping Off versus Rounding Off
On the other hand, the rounded floating-point representation
flround (p) is given by
flround (p) = ±0.d1 d2 d3 · · · rk × 10n ,
(10)
where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for 1 < j < k and the last digit, rk , is
obtained by rounding the number dk dk+1 dk+2 · · · to the nearest integer.
Professor Henry Arguello Fuentes
Numerical methods
2020
52 / 80
Chopping Off versus Rounding Off
Example:
22
= 3.142857142857142857... has the following
The real number p =
7
six-digit representations:
flchop (p) = 0.314285 × 101 ,
flround (p) = 0.314286 × 101 .
For common purposes the chopping and rounding would be written as
3.14285 and 3.14286, respectively.
Professor Henry Arguello Fuentes
Numerical methods
2020
53 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
54 / 80
Loss of Significance
Consider p = 3.14155926536 ans q = 3.1415957341, which are
nearly equal and both carry 11 decimal digits of precision.
Their difference is formed: p − q = −0, 0000030805. Since the first
six digits of p and q are the same, their difference p − q contains
only five decimal digits of precision.
This phenomenon is called loss of significance.
Professor Henry Arguello Fuentes
Numerical methods
2020
55 / 80
Loss of Significance
Example:
Compare the results of calculating f (500) and g(500) using six digits and round√
√
x
ing. Where, f (x) = x( x + 1 − x) and g(x) = √
√ .
x+1+ x
Professor Henry Arguello Fuentes
Numerical methods
2020
56 / 80
Loss of Significance
Example:
Compare the results of calculating f (500) and g(500) using six digits and round√
√
x
ing. Where, f (x) = x( x + 1 − x) and g(x) = √
√ .
x+1+ x
For the first function,
f (500) =500
√
501 −
√
500
500(22.3830 − 22.3607) = 500(0.0223) = 11.1500
Professor Henry Arguello Fuentes
Numerical methods
2020
56 / 80
Loss of Significance
Example:
Compare the results of calculating f (500) and g(500) using six digits and round√
√
x
ing. Where, f (x) = x( x + 1 − x) and g(x) = √
√ .
x+1+ x
For the first function,
f (500) =500
√
501 −
√
500
500(22.3830 − 22.3607) = 500(0.0223) = 11.1500
For g(x)
500
√
501 + 500
500
500
=
= 11.1748.
22.3830 + 22.3607
44.7437
g(500) = √
Professor Henry Arguello Fuentes
Numerical methods
2020
56 / 80
Loss of Significance
Example:
Compare the results of calculating f (500) and g(500) using six digits and round√
√
x
ing. Where, f (x) = x( x + 1 − x) and g(x) = √
√ .
x+1+ x
For the first function,
f (500) =500
√
501 −
√
500
500(22.3830 − 22.3607) = 500(0.0223) = 11.1500
For g(x)
500
√
501 + 500
500
500
=
= 11.1748.
22.3830 + 22.3607
44.7437
g(500) = √
The second function, g(x), is algebraically equivalent to f (x), but the answer,
g(500) = 11.1748, involves less error and it is the same as that obtained by
rounding the true 11.174755300747198... to six digits.
Professor Henry Arguello Fuentes
Numerical methods
2020
56 / 80
Loss of Significance
Example: Compare the results of calculating f (0.01) and P(0.01) using six
digits and rounding, where
1 x
x2
ex − 1 − x
P(x)
=
+
+
f (x) =
and
2 6 24
x2
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
Professor Henry Arguello Fuentes
Numerical methods
2020
57 / 80
Loss of Significance
Example: Compare the results of calculating f (0.01) and P(0.01) using six
digits and rounding, where
1 x
x2
ex − 1 − x
P(x)
=
+
+
f (x) =
and
2 6 24
x2
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
f (0.01) =
Professor Henry Arguello Fuentes
e0.01 − 1 − 0.01
1.010050 − 1 − 0.01
=
= 0.5.
2
(0.01)
0.0001
Numerical methods
2020
57 / 80
Loss of Significance
Example: Compare the results of calculating f (0.01) and P(0.01) using six
digits and rounding, where
1 x
x2
ex − 1 − x
P(x)
=
+
+
f (x) =
and
2 6 24
x2
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
f (0.01) =
e0.01 − 1 − 0.01
1.010050 − 1 − 0.01
=
= 0.5.
2
(0.01)
0.0001
For the second function
P(0.01) =
1 0.01 0.0001
+
+
= 0.5 + 0.001667 + 0.000004 = 0.501671.
2
6
24
Professor Henry Arguello Fuentes
Numerical methods
2020
57 / 80
Loss of Significance
Example: Compare the results of calculating f (0.01) and P(0.01) using six
digits and rounding, where
1 x
x2
ex − 1 − x
P(x)
=
+
+
f (x) =
and
2 6 24
x2
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
f (0.01) =
e0.01 − 1 − 0.01
1.010050 − 1 − 0.01
=
= 0.5.
2
(0.01)
0.0001
For the second function
P(0.01) =
1 0.01 0.0001
+
+
= 0.5 + 0.001667 + 0.000004 = 0.501671.
2
6
24
The answer P(0.01) = 0.501671 contains less error and it is the same as that
obtained rounding the true answer 0.5016708416805... to six digits.
Professor Henry Arguello Fuentes
Numerical methods
2020
57 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
58 / 80
O(hn ) Order of Approximation
For functions
Definition 4.
The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)),
if there exist constants C and c such that:
|f (h)| ≤ C|g(h)|
Professor Henry Arguello Fuentes
whenever h ≥ c.
Numerical methods
(11)
2020
59 / 80
O(hn ) Order of Approximation
For functions
Definition 4.
The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)),
if there exist constants C and c such that:
|f (h)| ≤ C|g(h)|
whenever h ≥ c.
(11)
Example: Consider f (x) = x2 + 1 and g(x) = x3 .
Professor Henry Arguello Fuentes
Numerical methods
2020
59 / 80
O(hn ) Order of Approximation
For functions
Definition 4.
The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)),
if there exist constants C and c such that:
|f (h)| ≤ C|g(h)|
whenever h ≥ c.
(11)
Example: Consider f (x) = x2 + 1 and g(x) = x3 .
Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1
Professor Henry Arguello Fuentes
Numerical methods
2020
59 / 80
O(hn ) Order of Approximation
For functions
Definition 4.
The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)),
if there exist constants C and c such that:
|f (h)| ≤ C|g(h)|
whenever h ≥ c.
(11)
Example: Consider f (x) = x2 + 1 and g(x) = x3 .
Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1
it follows that x2 + 1 ≤ 2x3 for x ≥ 1.
Professor Henry Arguello Fuentes
Numerical methods
2020
59 / 80
O(hn ) Order of Approximation
For functions
Definition 4.
The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)),
if there exist constants C and c such that:
|f (h)| ≤ C|g(h)|
whenever h ≥ c.
(11)
Example: Consider f (x) = x2 + 1 and g(x) = x3 .
Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1
it follows that x2 + 1 ≤ 2x3 for x ≥ 1.
Therefore, f (x) = O(g(x)), whenever h ≥ 1.
The big Oh notation provides an useful way of describing the rate of
growth of a function in terms of the well-known elementary function (xn ,
x1/n , ax , loga (x), etc.).
Professor Henry Arguello Fuentes
Numerical methods
2020
59 / 80
O(hn ) Order of Approximation
For sequences
Definition 5.
Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said
to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants
C and N such that
|xn | ≤ C|yn | whenever n ≥ N.
Professor Henry Arguello Fuentes
Numerical methods
(12)
2020
60 / 80
O(hn ) Order of Approximation
For sequences
Definition 5.
Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said
to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants
C and N such that
|xn | ≤ C|yn | whenever n ≥ N.
(12)
Example:
n2 − 1
=O
n3
n2 − 1
n2
1
1
, since
≤
= whenever n ≥ 1.
3
3
n
n
n
n
Professor Henry Arguello Fuentes
Numerical methods
2020
60 / 80
O(hn ) Order of Approximation
Definition 6.
Assume that f (h) is approximated by the function p(h) and there exist a
real constant M > 0 and a positive integer n so that
|f (h) − p(h)|
≤ M for sufficiently small h.
hn
(13)
We say that p(h) approximates f (h) with order of approximation O(hn )
and write
f (h) = p(h) + O(hn )
(14)
When relation (13) is rewritten in the form |f (h) − p(h)| ≤ M|hn |, we see
that the notation O(hn ) stands in place of the error bound M|hn |.
Professor Henry Arguello Fuentes
Numerical methods
2020
61 / 80
O(hn ) Order of Approximation
Theorem 2. Order of approximation for basic operations
Assume that f (h) = p(h) + O(hn ), g(h) = q(h) + O(hm ), and
r = min(m, n). Then
f (h) + g(h) = p(h) + q(h) + O(hr ),
(15)
f (h)g(h) = p(h)q(h) + O(hr ),
(16)
and
p(h)
f (h)
=
+ O(hr ) provided that g(h) 6= 0 and q(h) 6= 0.
g(h)
q(h)
Professor Henry Arguello Fuentes
Numerical methods
2020
(17)
62 / 80
O(hn ) Order of Approximation
Theorem 3. (Taylor’s Theorem).
Assume f ∈ Cn+1 [a, b]. If both x0 and x = x0 + h lie in [a, b], then
f (x0 + h) =
n
X
f (k)(x0 )
k=0
k!
hk + O(hn+1 ).
(18)
Additional properties:
(i) O(hp ) + O(hp ) = O(hp ),
(ii) O(hp ) + O(hq ) = O(hr ), where r = min(p, q), and
(iii) O(hp )O(hq ) = O(hs ), where s = p + q.
Professor Henry Arguello Fuentes
Numerical methods
2020
63 / 80
O(hn ) Order of Approximation
Example:
Consider the Taylor polynomial expansions
eh = 1+h+
h2 h3
+ +O(h4 )
2! 3!
and
cos(h) = 1 −
h2 h4
+
+ O(h6 ).
2!
4!
Determine the order of approximation for their sum and product.
Professor Henry Arguello Fuentes
Numerical methods
2020
64 / 80
O(hn ) Order of Approximation
Example:
Consider the Taylor polynomial expansions
eh = 1+h+
h2 h3
+ +O(h4 )
2! 3!
and
cos(h) = 1 −
h2 h4
+
+ O(h6 ).
2!
4!
Determine the order of approximation for their sum and product.
For the sum we have
h2 h4
h2 h3
+
+ O(h4 ) + 1 −
+
+ O(h6 )
2!
3!
2!
4!
h3
h4
=2+h+
+ O(h4 ) +
+ O(h6 )
3!
4!
eh + cos(h) =1 + h +
Professor Henry Arguello Fuentes
Numerical methods
2020
64 / 80
O(hn ) Order of Approximation
Since O(h4 ) +
h4
= O(h4 ) and O(h4 ) + O(h6 ) = O(h4 ), this reduces to
4!
eh + cos(h) = 2 + h +
h3
+ O(h4 ),
3!
and the order of approximation is O(h4 ).
Professor Henry Arguello Fuentes
Numerical methods
2020
65 / 80
O(hn ) Order of Approximation
The product is treated similarly:
h3
h2
h4
h2
+
+ O(h4 )
1−
+
+ O(h6 )
eh cos(h) = 1 + h +
2!
3!
2!
4!
2
3
2
4
h
h
h
h
= 1+h+
+
1−
+
+
2!
3!
2!
4!
h2
h3
h2
h4
6
1+h+
+
O(h ) + 1 −
+
O(h4 ) + O(h4 )O(h6 )
2!
3!
2!
4!
=1 + h −
h3
5h4
h5
h6
h7
−
−
+
+
+ O(h6 ) + O(h4 ) + O(h4 )O(h6 ).
3
24
24 48 144
Professor Henry Arguello Fuentes
Numerical methods
2020
66 / 80
O(hn ) Order of Approximation
Since O(h4 )O(h6 ) = O(h10 ) and
−5h4
h5
h6
h7
−
+
+
+ O(h6 ) + O(h4 ) + O(h10 )
24
24 48 144
Since O(h6 ) + O(h4 ) + O(h10 ) = O(h4 ), the preceding equation is
simplified to yield
eh cos(h) = 1 + h +
h3
+ O(h4 ),
3
and the order of approximation is O(h4 ).
Professor Henry Arguello Fuentes
Numerical methods
2020
67 / 80
Order of Convergence of a Sequence
Convergence of a sequence
Definition 7.
Suppose that limn−→∞ xn = x and {rn }∞
n=1 is a sequence with
limn−→∞ rn = 0. We say that {xn }∞
converges
to x with the order
n=1
of convergence O(rn ), if there exists a constant K ≥ 0 such that
|xn − x|
≤ K for n sufficiently large.
|rn |
(19)
This is indicated by writing xn = x + O(rn ), or xn −→ x with order
of convergence O(rn )
Professor Henry Arguello Fuentes
Numerical methods
2020
68 / 80
Order of Convergence of a Sequence
Definition 7.
Example:
Let xn = cos(n)/n2 and rn = 1/n2 then,
limn−→∞ xn = 0
with a rate of convergence O(1/n2 ). This follows immediately from the
relation
|cos(n)/n2 |
= |cos(n) ≤ 1| for all n.
|1/n2 |
Professor Henry Arguello Fuentes
Numerical methods
2020
69 / 80
Contents
1
Introduction
2
Binary numbers
3
Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Professor Henry Arguello Fuentes
Numerical methods
2020
70 / 80
Propagation of Error
Addition consider two numbers p and q (the true values) with the
approximate values b
p and b
q, which contains errors p and q ,
respectively. Starting with p = b
p + p and q = b
q + q , the sum is
(20)
p + q = (b
p + p ) + (b
q + q ) = (b
p+b
q) + (p + q ).
Hence, for addition, the error in the sum is the sum of the errors in
the addends.
s = p + q .
Professor Henry Arguello Fuentes
Numerical methods
2020
71 / 80
Propagation of Error
The propagation of error in multiplication is more complicated. The
product is
(21)
pq = (b
p + p )(b
q + q ) = b
pb
q+b
pp + b
qp + p q .
Professor Henry Arguello Fuentes
Numerical methods
2020
72 / 80
Propagation of Error
The propagation of error in multiplication is more complicated. The
product is
(21)
pq = (b
p + p )(b
q + q ) = b
pb
q+b
pp + b
qp + p q .
Hence, if b
p and b
q are larger than 1 in absolute value, the terms b
pq and
b
qp show that there is a possibility of magnification of the original errors
p and q . Insights are gained if we look at the relative error. Rearrange
the terms in (21) to get
pq − b
pb
q=b
pq + b
qp + p q .
Professor Henry Arguello Fuentes
Numerical methods
(22)
2020
72 / 80
Propagation of Error
The propagation of error in multiplication is more complicated. The
product is
(21)
pq = (b
p + p )(b
q + q ) = b
pb
q+b
pp + b
qp + p q .
Hence, if b
p and b
q are larger than 1 in absolute value, the terms b
pq and
b
qp show that there is a possibility of magnification of the original errors
p and q . Insights are gained if we look at the relative error. Rearrange
the terms in (21) to get
pq − b
pb
q=b
pq + b
qp + p q .
(22)
Suppose that b
p 6= 0 and b
q 6= 0; then we can divide (22) by pq to obtain
the relative error in the product pq:
Rpq =
b
b
pq + b
qp + p q
pq b
qp p q
pq − b
pb
q
=
=
+
+
.
pq
pq
pq
pq
pq
Professor Henry Arguello Fuentes
Numerical methods
(23)
2020
72 / 80
Propagation of Error
Furthermore, suppose that b
p and b
q are good approximations for b
p and
b
q; then b
p/p ≈ 1, b
q/q ≈ 1, and Rp Rq = (p /p)(q /q) ≈ 0 (Rp and Rq are
the relative errors in the approximations b
p and b
q). Then making these
substitutions yields the simplified relationship
Rpq =
Professor Henry Arguello Fuentes
pq − b
pb
q
≈ q /q + p /p + 0 = Rq + Rp .
pq
Numerical methods
(24)
2020
73 / 80
Propagation of Error
Furthermore, suppose that b
p and b
q are good approximations for b
p and
b
q; then b
p/p ≈ 1, b
q/q ≈ 1, and Rp Rq = (p /p)(q /q) ≈ 0 (Rp and Rq are
the relative errors in the approximations b
p and b
q). Then making these
substitutions yields the simplified relationship
Rpq =
pq − b
pb
q
≈ q /q + p /p + 0 = Rq + Rp .
pq
(24)
This shows that the relative error in the product pq is approximately the
b and q
b.
sum of the relative errors in the approximations p
A quality that is desirable for any numerical process is that a small error
in the initial conditions will produce small changes in the final result.
An algorithm with this feature is called stable; otherwise, it is called
unstable.
Professor Henry Arguello Fuentes
Numerical methods
2020
73 / 80
Propagation of Error
Definition 8.
Suppose that represents an initial error and (n) represents the growth
of the error after n steps. If |(n)| ≈ n, the growth of error is said to be
linear. If |(n)| ≈ K n , the growth of error is called exponential. If
K > 1, the exponential error growns without bound as n −→ ∞, and if
0 < K < 1, the exponential error diminishes to zero as n −→ ∞.
Professor Henry Arguello Fuentes
Numerical methods
2020
74 / 80
Propagation of error
Example: Show that the following three schemes can be used with finiteprecision arithmetic to recursively generate the terms in the sequence {1/3n }∞
n=0 .
r0 = 1
and
rn =
1
rn−1
3
for n = 1, 2, · · · ,
(25)
p0 = 1, p1 =
1
,
3
and
pn =
4
1
pn−1 − pn−2
3
3
for n = 1, 2, · · · ,
(26)
q0 = 1, q1 =
1
,
3
and
qn =
10
qn−1 − qn−2
3
for n = 1, 2, · · · ,
(27)
Professor Henry Arguello Fuentes
Numerical methods
2020
75 / 80
Propagation of error
Formula (25) is obvious. In (26) the difference equation has the general solution pn = A(1/3n ) + B. This can be verified by direct substitution:
4
1
4
A
A
1
pn−1 − pn−2 =
+B −
+B
3
3
3 3n−1
3 3n−2
1
3
4 1
4
−
B = A n + B = pn
− n A−
=
n
3
3
3 3
3
Setting A = 1 and B = 0 will generate the sequence desired.
Professor Henry Arguello Fuentes
Numerical methods
2020
76 / 80
Propagation of error
Formula (25) is obvious. In (26) the difference equation has the general solution pn = A(1/3n ) + B. This can be verified by direct substitution:
4
1
4
A
A
1
pn−1 − pn−2 =
+B −
+B
3
3
3 3n−1
3 3n−2
1
3
4 1
4
−
B = A n + B = pn
− n A−
=
n
3
3
3 3
3
Setting A = 1 and B = 0 will generate the sequence desired. In (27) the
difference equation has the general solution qn = A(1/3n ) + B3n . This too
verified by substitution:
10
10
A
A
n−1
n−2
qn−1 − qn−2 =
+ B3
−
+ B3
3
3 3n−1
3n−2
10
9
1
=
− n A − (10 − 1)3n−1 B = A n + B3n = qn
3n
3
3
Professor Henry Arguello Fuentes
Numerical methods
2020
76 / 80
Propagation of error
Example:
Generate approximations to the sequences {xn } = 1/3n using hemes
r0 = 0.99996
and
1
rn = rn−1
3
p0 = 1, p1 = 0.33332,
and
4
1
pn = pn−1 − pn−2
3
3
for n = 1, 2, · · · ,
10
pn−1 − pn−2
3
for n = 1, 2, · · · ,
q0 = 1, q1 = 0.33332,
and
qn =
for n = 1, 2, · · · ,
(28)
(29)
(30)
In (28) the initial error in r0 is 0.00004, and in (29) and (30) the initial
errors in p1 and q1 are 0.000013. Investigate the propagation of error for
each scheme.
Professor Henry Arguello Fuentes
Numerical methods
2020
77 / 80
Propagation of error
Table: Sequence xn = 1/3n and the approximations rn , pn , and qn
n
0
1
2
3
4
5
6
7
8
9
10
xn
1.0000000000
0.3333333333
0.1111111111
0.0370370370
0.0123456790
0.0041152263
0.0013717421
0.0004572474
0.0001524158
0.0000508053
0.0000169351
Professor Henry Arguello Fuentes
rn
0.9999600000
0.3333200000
0.1111066667
0.0370355556
0.0123451852
0.0041150617
0.0013716872
0.0004572291
0.0001524097
0.0000508032
0.0000169344
pn
1.0000000000
0.3333200000
0.1110933333
0.0370177778
0.0123259259
0.0040953086
0.0013517695
0.0004372565
0.0001324188
0.0000308063
-0.0000030646
Numerical methods
qn
1.0000000000
0.3333200000
0.1110666667
0.0369022222
0.0119407407
0.0029002469
-0.0022732510
-0.0104777503
-0.0326525834
-0.0983641945
-0.2952280648
2020
78 / 80
Propagation of error
Table: Error sequences xn − rn , xn − pn , and xn − qn
n
0
1
2
3
4
5
6
7
8
9
10
xn − rn
0.0000400000
0.0000133333
0.0000044444
0.0000014815
0.0000004938
0.0000001646
0.0000000549
0.0000000183
0.0000000061
0.0000000020
0.0000000007
Professor Henry Arguello Fuentes
xn − pn
0.0000000000
0.0000133333
0.0000177778
0.0000192593
0.0000197531
0.0000199177
0.0000199726
0.0000199909
0.0000199970
0.0000199990
0.0000199997
Numerical methods
xn − qn
0.0000000000
0.0000133333
0.0000444444
0.0001348148
0.0004049383
0.0012149794
0.0036449931
0.0109349977
0.0328049992
0.0984149997
0.2952449999
2020
79 / 80
Propagation of error
−5
6
x 10
The error for {rn } is stable
and decreases in an exponential
manner.
n
x −r
n
4
2
The error {pn } is stable.
0
0
2
4
n
6
8
10
The error for {qn } is unstable and
grows at an exponential rate.
−5
2
x 10
xn−pn
1.5
1
0.5
0
0
2
4
6
8
10
6
8
10
n
0.4
Although the error for {pn } is stable,
the terms pn −→ 0 as n −→ ∞,
so that the error eventually dominates
and the terms past p8 have no significant digits.
xn−qn
0.3
0.2
0.1
0
0
2
4
n
Professor Henry Arguello Fuentes
Numerical methods
2020
80 / 80
Descargar