Numerical Methods Preliminaries Professor Ph.D. Henry Arguello Fuentes Universidad Industrial de Santander Colombia 2020 High Dimensional Signal Processing Group www.hdspgroup.com [email protected] Professor Henry Arguello Fuentes Numerical methods 2020 1 / 80 Outline 1 Introduction 2 Binary numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 2 / 80 Introduction: numerical methods applications (a) Model the probable evolution of a pathology (b) Model and simulate the growth of a tumor (c) Microscopy super-resolution (d) Thermal management Professor Henry Arguello Fuentes Numerical methods 2020 3 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 4 / 80 Base 2 numbers Base 10 numbers: Expanded form of the number 1563 1563 = (1 × 103 ) + (5 × 102 ) + (6 × 101 ) + (3 × 100 ). Professor Henry Arguello Fuentes Numerical methods 2020 5 / 80 Base 2 numbers Base 10 numbers: Expanded form of the number 1563 1563 = (1 × 103 ) + (5 × 102 ) + (6 × 101 ) + (3 × 100 ). Let N denote a positive integer; then the digits a0 , a1 , ..., ak exist so that N has the base 10 expansion Base 10 expansion N = (ak × 10k ) + (ak−1 × 10k−1 ) + · · · + (a1 × 101 ) + (a0 × 100 ), (1) Where the digits ak are chosen from 0, 1, ..., 8, 9. Professor Henry Arguello Fuentes Numerical methods 2020 5 / 80 Base 2 numbers Base 2 numbers: Expanded form of the number 1563 1563 =(1 × 210 ) + (1 × 29 ) + (0 × 28 ) + (0 × 27 ) + (0 × 26 ) + (0 × 25 )+ (1 × 24 ) + (1 × 23 ) + (0 × 22 ) + (1 × 21 ) + (1 × 20 ). So that: 1563 = 1024 + 512 + 16 + 8 + 2 + 1. Professor Henry Arguello Fuentes Numerical methods 2020 6 / 80 Base 2 numbers Let N denote a positive integer; the digits b0 , b1 , ..., bJ exist so that N has the base 2 expansion Base 2 expansion N = (bJ × 2J ) + (bJ−1 × 2J−1 ) + · · · + (b1 × 21 ) + (b0 × 20 ), (2) Where each digit bj is either a 0 or 1. Thus N is expressed in binary notation as N = bJ bJ−1 · · · b2 b1 b0two . (3) Professor Henry Arguello Fuentes Numerical methods 2020 7 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 8 / 80 Base 2 representation of the integer N Process: Generate sequences Qk and Rk of quotients and remainders, respectively. End the process when Qk = 0, for some integer k = J. Professor Henry Arguello Fuentes Numerical methods 2020 9 / 80 Base 2 representation of the integer N Process: Generate sequences Qk and Rk of quotients and remainders, respectively. End the process when Qk = 0, for some integer k = J. Example: 𝒌 1563 𝑸𝒌 𝑹𝒌 0 1563/2= 781 1 1 781/2= 390 1 2 390/2= 195 0 3 195/2= 97 1 4 97/2= 48 1 5 48/2= 24 0 6 24/2= 12 0 7 12/2= 6 0 8 6/2= 3 0 9 3/2= 1 1 10 1/2= 0 1 1 1 0 0 0 0 1 1 0 1 1 b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 Most Significant Bit -MSB Professor Henry Arguello Fuentes Numerical methods Least Significant Bit - LSB 2020 9 / 80 Base 2 representation of the integer N Exercise 1: Find the base 2 representation of 697 Professor Henry Arguello Fuentes Numerical methods 2020 10 / 80 Base 2 representation of the integer N Exercise 1: Find the base 2 representation of 697 Start by dividing the integer N from 2 to calculate Q0 and R0 . 697/2 = 348.5 → Q0 = 348 and R0 = 1 Professor Henry Arguello Fuentes Numerical methods 2020 10 / 80 Base 2 representation of the integer N Exercise 1: Find the base 2 representation of 697 Start by dividing the integer N from 2 to calculate Q0 and R0 . 697/2 = 348.5 → Q0 = 348 and R0 = 1 Continue the process until finding Qk = 0, for some integer k = J. Qk = Qk−1 /2 𝒌 0 𝑸𝒌 𝑹𝒌 697/2= 348 𝟔𝟗𝟕 1 1 2 3 4 5 6 7 8 9 b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0 Professor Henry Arguello Fuentes Numerical methods 2020 10 / 80 Base 2 representation of the integer N Solution 𝒌 𝑸𝒌 𝑹𝒌 0 697/2= 348 𝟔𝟗𝟕 1 1 348/2= 174 0 2 174/2= 87 0 3 87/2= 43 1 4 43/2= 21 1 5 21/2= 10 1 6 10/2= 5 0 7 5/2= 2 1 8 2/2= 1 0 9 1/2= 0 1 1 0 1 0 1 1 1 0 0 1 b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0 Then, 69710 = 10101110012 Professor Henry Arguello Fuentes Numerical methods 2020 11 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 12 / 80 Sequences and Series Commonly, when you express a rational number in decimal form, you require infinitely many digits. 1 = 0.3 , the symbol 3 means that the digit 3 is repeated 3 forever to form an infinite repeating decimal. For example, in But, the number 1 is the shorthand notation for the infinite series S 3 S = (3 × 10−1 ) + (3 × 10−2 ) + · · · + (3 × 10−∞ ) S= ∞ X k=1 Professor Henry Arguello Fuentes 1 3(10)−k = . 3 Numerical methods 2020 13 / 80 Sequences and Series Definition 1. The infinite series S S= ∞ X crn = c + cr + cr2 + · · · + crn + · · · , (4) n=0 where c 6= 0 and r 6= 0, is called a geometric series with ratio r. Professor Henry Arguello Fuentes Numerical methods 2020 14 / 80 Sequences and Series Definition 1. The infinite series S S= ∞ X crn = c + cr + cr2 + · · · + crn + · · · , (4) n=0 where c 6= 0 and r 6= 0, is called a geometric series with ratio r. Theorem 1. (Geometric Series) The geometric series has the following properties: If |r| < 1, then ∞ X crn = n=0 c . 1−r (5) If |r| > 1, then the series diverges. Professor Henry Arguello Fuentes Numerical methods 2020 14 / 80 Sequences and Series Example: The series S is given by n 1 2 ∞ X ∞ 1 1 1 1 7 S = (7) + (7) + · · · + (7) = , 7 7 7 7 n=1 Professor Henry Arguello Fuentes Numerical methods 2020 15 / 80 Sequences and Series Example: The series S is given by n 1 2 ∞ X ∞ 1 1 1 1 7 S = (7) + (7) + · · · + (7) = , 7 7 7 7 n=1 which is equal to − 7 + n ∞ X 1 7 , 7 n=0 Professor Henry Arguello Fuentes Numerical methods 2020 15 / 80 Sequences and Series Example: The series S is given by n 1 2 ∞ X ∞ 1 1 1 1 7 S = (7) + (7) + · · · + (7) = , 7 7 7 7 n=1 which is equal to − 7 + n ∞ X 1 7 , 7 n=0 and acording with (5) S = −7 + Then, 7 1 1− 7 = 7 = 1.16, 6 7 is the shorthand notation for the infinite series S 6 Professor Henry Arguello Fuentes Numerical methods 2020 15 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 16 / 80 Binary Fractions A binary fraction is a serie of sums with negative powers of 2, which is used to express a real number R that lies in the range 0 < R < 1. Professor Henry Arguello Fuentes Numerical methods 2020 17 / 80 Binary Fractions A binary fraction is a serie of sums with negative powers of 2, which is used to express a real number R that lies in the range 0 < R < 1. Binary fractions R = (d1 × 2−1 ) + (d2 × 2−2 ) + · · · + (dn × 2−n ) + · · · , (6) where dj ∈ 0, 1 and 0 < R < 1. Binary fraction Representation of R R= R = 0.d1 d2 · · · dn · · ·two Professor Henry Arguello Fuentes Numerical methods P∞ j=1 dj (2) −j 2020 17 / 80 Binary Fractions-Decimal to binary Process: Generate sequences dk and Fk multiplying by two. Professor Henry Arguello Fuentes Numerical methods 2020 18 / 80 Binary Fractions-Decimal to binary Process: Generate sequences dk and Fk multiplying by two. Example: d1 d2 d 3 d 4 d 5 d6 d7 0. 1 0 1 1 0 0 1 𝑗 0.7 𝐹𝑗 0.4 2 (0.4)(2) = 0.8 0 0.8 3 (0.8)(2) = 1.6 1 0.6 4 (0.6)(2) = 1.2 1 0.2 5 (0.2)(2) = 0.4 0 0.4 6 (0.4)(2) = 0.8 0 0.8 7 (0.8)(2) = 1.6 1 0.6 8 (0.6)(2) = 1.2 1 0.2 9 (0.2)(2) = 0.4 0 0.4 … d9 0 … 𝑑𝑗 𝑓𝑟𝑎𝑐 1 (0.7)(2) = 1.4 1 Professor Henry Arguello Fuentes d8 1 0.7 0.10110 2 Numerical methods 2020 18 / 80 Binary Fractions-Decimal to binary Exercise 2: Calculate the binary fraction for 0.6. Start by multiplying 0.6 by 2, to generate sequences dj and Fj d1 d2 d3 d4 d5 d6 d7 d8 d9 … 𝑗 0.6 𝐹𝑗 𝑑𝑗 𝑓𝑟𝑎𝑐 1 (0.6)(2) = 1.2 1 0.2 2 3 4 5 6 7 8 … 9 Professor Henry Arguello Fuentes Numerical methods 2020 19 / 80 Binary Fractions-Decimal to binary Solution d1 d2 d3 d4 d5 d6 d7 d8 d9 … 𝑗 0.6 𝐹𝑗 𝑑𝑗 𝑓𝑟𝑎𝑐 0.2 2 (0.2)(2) = 0.4 0 0.4 3 (0.4)(2) = 0.8 0 0.8 4 (0.8)(2) = 1.6 1 0.6 5 (0.6)(2) = 1.2 1 0.2 6 (0.2)(2) = 0.4 0 0.4 7 (0.4)(2) = 0.8 0 0.8 8 (0.8)(2) = 1.6 1 0.6 9 (0.6)(2) = 1.2 1 0.2 … 1 (0.6)(2) = 1.2 1 Professor Henry Arguello Fuentes 0.6 = 0. 1001 Numerical methods 2020 20 / 80 Binary Fractions-Binary to decimal The base 10 rational number R10 associated to a base 2 binary fraction R2 can be found using geometric series. Professor Henry Arguello Fuentes Numerical methods 2020 21 / 80 Binary Fractions-Binary to decimal The base 10 rational number R10 associated to a base 2 binary fraction R2 can be found using geometric series. Example: 0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · · the expression above is writted as Professor Henry Arguello Fuentes Numerical methods 2020 21 / 80 Binary Fractions-Binary to decimal The base 10 rational number R10 associated to a base 2 binary fraction R2 can be found using geometric series. Example: 0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · · the expression above is writted as ∞ ∞ X X = (2−2 )k = −1 + (2−2 )k k=1 = −1 + k=0 1 1− Professor Henry Arguello Fuentes 1 4 = −1 + 4 1 = . 3 3 Numerical methods 2020 21 / 80 Binary Fractions-Binary to decimal The base 10 rational number R10 associated to a base 2 binary fraction R2 can be found using geometric series. Example: 0.012 =(0 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ) + (1 × 2−4 ) · · · the expression above is writted as ∞ ∞ X X = (2−2 )k = −1 + (2−2 )k k=1 = −1 + k=0 1 1− then, 1 4 = −1 + 4 1 = . 3 3 1 is the 10 rational number associated to 0.012 3 Professor Henry Arguello Fuentes Numerical methods 2020 21 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 22 / 80 Binary shifting Let R be (7) R = 0.00000110002 . Professor Henry Arguello Fuentes Numerical methods 2020 23 / 80 Binary shifting Let R be (7) R = 0.00000110002 . Multiplying both sides of (7) by 25 = 32 will shift the binary point 5 places to the right 32R = 0.110002 . Professor Henry Arguello Fuentes Multiplying both sides of (7) by 210 = 1024 will shift the binary point 10 places to the right 1024R = 11000.110002 . Numerical methods 2020 23 / 80 Binary shifting Let R be (7) R = 0.00000110002 . Multiplying both sides of (7) by 25 = 32 will shift the binary point 5 places to the right 32R = 0.110002 . Multiplying both sides of (7) by 210 = 1024 will shift the binary point 10 places to the right 1024R = 11000.110002 . Taking the difference 1024R − 32R = 11000.110002 − 0.110002 , we obtain 992R = 110002 , given that 110002 = 2410 we find that, 992R = 24, Therefore R = Professor Henry Arguello Fuentes Numerical methods 3 . 124 10 2020 23 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 24 / 80 Scientific Notation The scientific notation is a standard way to present a real number. It is obtained by properly shifting the decimal point. Professor Henry Arguello Fuentes Numerical methods 2020 25 / 80 Scientific Notation The scientific notation is a standard way to present a real number. It is obtained by properly shifting the decimal point. Examples 0.0000747 = 7.47 × 10−5 31.4159265 = 3.14159265 × 10 9, 700, 000.000 = 9.7 × 109 The Avogadro’s constant used in chemistry = 6.02252 × 1023 . The quantity 1K = 1.024 × 103 used in computer science. Professor Henry Arguello Fuentes Numerical methods 2020 25 / 80 Contents 1 Introduction 2 Binary numbers Base 2 numbers Base 2 representation of the integer N Sequences and Series Binary Fractions Binary shifting Scientific Notation Machine Numbers 3 Error Analysis Professor Henry Arguello Fuentes Numerical methods 2020 26 / 80 Machine Numbers Machine Numbers A mathematical quantity x is stored in a computer as a binary approximation given by x ≈ ±q × 2n . (8) The finite binary number q is the mantissa, where 1/2 ≤ q ≤ 1. The integer n is the exponent. Professor Henry Arguello Fuentes Numerical methods 2020 27 / 80 Floating-point format A real number is stored in a computer as a set of binary numbers expressing: The sign The exponent The mantissa 𝑆𝑖𝑔𝑛 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡 𝑀𝑎𝑛𝑡𝑖𝑠𝑠𝑎 The sign is always one bit where, S = 0 if, x > 0 and S = 1, if x < 0. The amount of bits for the exponent and the mantissa depends on the precision of the machine. Professor Henry Arguello Fuentes Numerical methods 2020 28 / 80 Floating-point format-IEEE 754 standard Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Professor Henry Arguello Fuentes Numerical methods 2020 29 / 80 Floating-point format-IEEE 754 standard Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to be able to represent both tiny and huge values, but two’s complement. Professor Henry Arguello Fuentes Numerical methods 2020 29 / 80 Floating-point format-IEEE 754 standard Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to be able to represent both tiny and huge values, but two’s complement. Then, the exponent is biased by adjusting its value. Professor Henry Arguello Fuentes Numerical methods 2020 29 / 80 Floating-point format-IEEE 754 standard Precision Total Sign Exponent Man4ssa Exponent bias Single 32 bits 1 bit 8 bits 23 bits 127 Double 64 bits 1 bit 11 bits 52 bits 1023 Note: Biasing is done because exponents have to be signed values to be able to represent both tiny and huge values, but two’s complement. Then, the exponent is biased by adjusting its value. The exponent bias is calculated as bias = 2exp−1 −1, where exp indicates the amount of bits for the exponent. Example: if exp = 15 bits, then, bias = 215−1 − 1 = 16383 Professor Henry Arguello Fuentes Numerical methods 2020 29 / 80 Floating-point format-IEEE 754 standard Possible cases: Sign (S) Exponent (E) Man0ssa (M) Value 0-­‐1 All 0 < E < All 1 M (-­‐1)S (1.M)(2E-­‐bias) 0 1 E=all 1 E=all 1 M=0 M=0 +∞ -­‐∞ 0-­‐1 0-­‐1 0-­‐1 E=all 1 E=all 0 E=all 0 M≠0 M=0 M≠0 NaN 0 (-­‐1)S (0.M)(21-­‐bias) Professor Henry Arguello Fuentes Numerical methods 2020 30 / 80 Floating-point format Example: Determine the floating point format to stored the number 59.187510 in a computer with 32 bits of precision. Professor Henry Arguello Fuentes Numerical methods 2020 31 / 80 Floating-point format Example: Determine the floating point format to stored the number 59.187510 in a computer with 32 bits of precision. 1. Find the binary representation of the number 59.187510 59.187510 LBS Integer part Decimal part 5910 0.187510 59 2 1 29 2 1 14 2 2 0 7 1 3 2 1 1 2 MSB 1 0 0.1875×2 = 0.375 0 0.375×2 = 0.75 0 0.75×2 = 1.5 1 0.5×2 = 1.0 1 MSB LBS 111011.00112 Professor Henry Arguello Fuentes Numerical methods 2020 31 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 Professor Henry Arguello Fuentes Numerical methods 2020 32 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 28−1 − 1 = 127 Professor Henry Arguello Fuentes Numerical methods 2020 32 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 1101100112 Professor Henry Arguello Fuentes Numerical methods 2020 32 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 1101100112 5. Determine the exponent exp = 5 + bias = 5 + 127 = 13210 = 100001002 Professor Henry Arguello Fuentes Numerical methods 2020 32 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 1101100112 5. Determine the exponent exp = 5 + bias = 5 + 127 = 13210 = 100001002 S E M 0 10000100 11011001100000000000000 Professor Henry Arguello Fuentes Numerical methods 2020 32 / 80 Floating-point format Example: Determine the floating point format to stored the number 132.2812510 in a computer with 32 bits of precision. Professor Henry Arguello Fuentes Numerical methods 2020 33 / 80 Floating-point format Example: Determine the floating point format to stored the number 132.2812510 in a computer with 32 bits of precision. 1. Find the binary representation of the number 132.2812510 132.2812510 Decimal part Integer part LBS 132 2 13210 0 66 2 0 33 2 1 16 2 0 8 2 0 4 2 0 2 2 0 1 2 MSB 1 0 0.2812510 0.28125×2 = 0.5625 0 0.375×2 = 1.125 1 0.125×2 = 0.25 MSB 0 0.25×2 = 0.5 0 0.5×2 = 1.0 1 LBS 10000100.010012 Professor Henry Arguello Fuentes Numerical methods 2020 33 / 80 Floating-point format 2. Do the proper binary shifting 10000100.010012 = 1.0000100010012 × 27 Professor Henry Arguello Fuentes Numerical methods 2020 34 / 80 Floating-point format 2. Do the proper binary shifting 10000100.010012 = 1.0000100010012 × 27 3. Calculate the bias bias = 28−1 − 1 = 127 Professor Henry Arguello Fuentes Numerical methods 2020 34 / 80 Floating-point format 2. Do the proper binary shifting 10000100.010012 = 1.0000100010012 × 27 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 0000100010012 Professor Henry Arguello Fuentes Numerical methods 2020 34 / 80 Floating-point format 2. Do the proper binary shifting 10000100.010012 = 1.0000100010012 × 27 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 0000100010012 5. Determine the exponent exp = 7 + bias = 7 + 127 = 13410 = 100001102 Professor Henry Arguello Fuentes Numerical methods 2020 34 / 80 Floating-point format 2. Do the proper binary shifting 10000100.010012 = 1.0000100010012 × 27 3. Calculate the bias bias = 28−1 − 1 = 127 4. Determine the mantissa Mantissa = 0000100010012 5. Determine the exponent exp = 7 + bias = 7 + 127 = 13410 = 100001102 S E M 0 10000110 00001000100100000000000 Professor Henry Arguello Fuentes Numerical methods 2020 34 / 80 Floating-point format Example: Determine the floating point format to stored the number 59.187510 in a computer with 64 bits of precision. Professor Henry Arguello Fuentes Numerical methods 2020 35 / 80 Floating-point format Example: Determine the floating point format to stored the number 59.187510 in a computer with 64 bits of precision. 1. Find the binary representation of the number 59.187510 59.187510 LBS Integer part Decimal part 5910 0.187510 59 2 1 29 2 1 14 2 2 0 7 1 3 2 1 1 2 MSB 1 0 0.1875×2 = 0.375 0 0.375×2 = 0.75 0 0.75×2 = 1.5 1 0.5×2 = 1.0 1 MSB LBS 111011.00112 Professor Henry Arguello Fuentes Numerical methods 2020 35 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 Professor Henry Arguello Fuentes Numerical methods 2020 36 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 211−1 − 1 = 1023 Professor Henry Arguello Fuentes Numerical methods 2020 36 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 211−1 − 1 = 1023 4. Determine the mantissa Mantissa = 1101100112 Professor Henry Arguello Fuentes Numerical methods 2020 36 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 211−1 − 1 = 1023 4. Determine the mantissa Mantissa = 1101100112 5. Determine the exponent exp = 5 + bias = 5 + 1023 = 102810 = 100000001002 Professor Henry Arguello Fuentes Numerical methods 2020 36 / 80 Floating-point format 2. Do the proper binary shifting 111011.00112 = 1.1101100112 × 25 3. Calculate the bias bias = 211−1 − 1 = 1023 4. Determine the mantissa Mantissa = 1101100112 5. Determine the exponent exp = 5 + bias = 5 + 1023 = 102810 = 100000001002 S E M 0 10000000100 1101100110000000000000000000000000000000000000000000 Professor Henry Arguello Fuentes Numerical methods 2020 36 / 80 Floating-point format The real value associated with a given 32 bit binary is calculated as ! 23 X S −i value = (−1) 1 + d(23−i) 2 × 2(E−127) i=1 Where, S = The sign E = Exponent 127 = Bias dj = Bits of the mantissa Professor Henry Arguello Fuentes Numerical methods 2020 37 / 80 Floating-point format Exercise: Find the real value for the binary data: S E M 0 01010010 01101000000100100000000 Professor Henry Arguello Fuentes Numerical methods 2020 38 / 80 Floating-point format Exercise: Find the real value for the binary data: S E M 0 01010010 01101000000100100000000 value = (−1)S 1+ 23 X ! d(23−i) 2−i × 2(E−127) i=1 Professor Henry Arguello Fuentes Numerical methods 2020 38 / 80 Floating-point format Exercise: Find the real value for the binary data: E S M 0 01010010 01101000000100100000000 value = (−1)S 1+ 23 X ! d(23−i) 2−i × 2(E−127) i=1 In this example: S=0 P23 1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582 1 4 6 2(E−127) = 2((2 +2 +2 )−127) = 282−127 = 2−45 Professor Henry Arguello Fuentes Numerical methods 2020 38 / 80 Floating-point format Exercise: Find the real value for the binary data: E S M 0 01010010 01101000000100100000000 value = (−1)S 1+ 23 X ! d(23−i) 2−i × 2(E−127) i=1 In this example: S=0 P23 1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582 1 4 6 2(E−127) = 2((2 +2 +2 )−127) = 282−127 = 2−45 Thus value = 1.4065246582 × 2−45 Professor Henry Arguello Fuentes Numerical methods 2020 38 / 80 Floating-point format Example: Find the real value for the binary data: E S M 1 10000100 01000000000000000000000 31 30 23 22 0 In this example: S=1 P23 1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25 2(E−127) = 2(132−127) = 25 Professor Henry Arguello Fuentes Numerical methods 2020 39 / 80 Floating-point format Example: Find the real value for the binary data: E S M 1 10000100 01000000000000000000000 31 30 23 22 0 In this example: S=1 P23 1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25 2(E−127) = 2(132−127) = 25 Thus value = −1.25 × 25 = −40. Professor Henry Arguello Fuentes Numerical methods 2020 39 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 40 / 80 Absolute and relative error Definition 2. Suppose that b p is an approximation to p. The absolute error is Ep = |p − b p|, and the relative error is Rp = |p − b p|/|p|, provided that p 6= 0. Professor Henry Arguello Fuentes Numerical methods 2020 41 / 80 Absolute and relative error Definition 2. Suppose that b p is an approximation to p. The absolute error is Ep = |p − b p|, and the relative error is Rp = |p − b p|/|p|, provided that p 6= 0. The absolute error is the difference between the true value and the approximate value. The relative error expresses the error as a percentage of the true value. Professor Henry Arguello Fuentes Numerical methods 2020 41 / 80 Absolute and relative error Example: Find the absolute and relative error in the following three cases: Professor Henry Arguello Fuentes Numerical methods 2020 42 / 80 Absolute and relative error Example: Find the absolute and relative error in the following three cases: Real |p| Approximation p̂ Absolute Error Ep Relative Error Rp Professor Henry Arguello Fuentes x = 3.141592 b x = 3.14 Ex = |x − b x| = 0.001592 Rx = Ex /|x| = 5.067×10−4 Numerical methods y = 1, 000, 000 by = 999, 996 Ey = |y − by| =4 Ry = Ey /|y| = 0.000004 z = 0.000012 bz = 0.000009 Ez = |z − bz| = 0.000003 Rz = Ez /|z| = 0.25 2020 42 / 80 Absolute and relative error Example: Find the absolute and relative error in the following three cases: Real |p| Approximation p̂ Absolute Error Ep Relative Error Rp x = 3.141592 b x = 3.14 Ex = |x − b x| = 0.001592 Rx = Ex /|x| = 5.067×10−4 y = 1, 000, 000 by = 999, 996 Ey = |y − by| =4 Ry = Ey /|y| = 0.000004 z = 0.000012 bz = 0.000009 Ez = |z − bz| = 0.000003 Rz = Ez /|z| = 0.25 Observe that as |p| moves away from 1 (greater than or less than) the relative error Rp is a better indicator than Ep of the accuracy of the approximation. Professor Henry Arguello Fuentes Numerical methods 2020 42 / 80 Absolute and relative error Definition 3. The number b p is said to approximate p to d significant digits if d is the largest nonnegative integer for which b |p − p| 101−d < . |p| 2 Professor Henry Arguello Fuentes Numerical methods 2020 43 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| Professor Henry Arguello Fuentes Numerical methods 2020 44 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| 1−0 if d = 0: 2.07900 × 10− 3 < 102 = 5 Xsatisfies. However, as we need to find the largest integer d, we need to continue.. Professor Henry Arguello Fuentes Numerical methods 2020 44 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| 1−0 if d = 0: 2.07900 × 10− 3 < 102 = 5 Xsatisfies. However, as we need to find the largest integer d, we need to continue.. if d = 1: 1−1 2.07900 × 10− 3 < 102 Professor Henry Arguello Fuentes = 0.5 Xsatisfies Numerical methods 2020 44 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| 1−0 if d = 0: 2.07900 × 10− 3 < 102 = 5 Xsatisfies. However, as we need to find the largest integer d, we need to continue.. if d = 1: if d = 2: 1−1 2.07900 × 10− 3 < 102 2.07900 × 10− 3 < Professor Henry Arguello Fuentes 101−2 2 = 0.5 Xsatisfies = 0.05 Xsatisfies Numerical methods 2020 44 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| 1−0 if d = 0: 2.07900 × 10− 3 < 102 = 5 Xsatisfies. However, as we need to find the largest integer d, we need to continue.. if d = 1: if d = 2: if d = 3: 1−1 2.07900 × 10− 3 < 102 = 0.5 Xsatisfies 101−2 2.07900 × 10− 3 < 2 = 0.05 Xsatisfies 1−3 2.07900 × 10− 3 < 102 = 0.005 Xsatisfies Professor Henry Arguello Fuentes Numerical methods 2020 44 / 80 Absolute and relative error Example: Let ŵ be the approximation for w = 2.1645, then |2.1645 − 2.16| = 2.07900 × 10−3 |2.1645| 1−0 if d = 0: 2.07900 × 10− 3 < 102 = 5 Xsatisfies. However, as we need to find the largest integer d, we need to continue.. if d = 1: 1−1 2.07900 × 10− 3 < 102 = 0.5 Xsatisfies 101−2 if d = 2: 2.07900 × 10− 3 < 2 = 0.05 Xsatisfies 1−3 if d = 3: 2.07900 × 10− 3 < 102 = 0.005 Xsatisfies 1−4 if d = 4: 2.07900 × 10− 3 < 102 = 0.0005 X does not satisfy Then, ŵ approximate w to 3 significant digits. Professor Henry Arguello Fuentes Numerical methods 2020 44 / 80 Absolute and relative error Other examples: If x = 3.141592 and b x = 3.14, then |x − b x|/|x| = 0.000507 < 10−2 /2. Therefore, b x approximates x to three significant digits. If y = 1, 000, 000 and by = 999, 996, then |y − by|/|y| = 0.000004 < 10−5 /2. Therefore, by approximates y to six significant digits. If z = 0.000012 and bz = 0.000009, then |z − bz|/|z| = 0.25 < 10−0 /2. Therefore, bz approximates z to one significant digits. Professor Henry Arguello Fuentes Numerical methods 2020 45 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 46 / 80 Truncation Error Truncation error refers to errors introduced when a more complicated mathematical expression is "replaced" with a more elementary formula. Professor Henry Arguello Fuentes Numerical methods 2020 47 / 80 Truncation Error Truncation error refers to errors introduced when a more complicated mathematical expression is "replaced" with a more elementary formula. For example, the infinite Taylor series 2 ex = 1 + x 2 + x4 x6 x8 x2n + + + ··· + + ··· 2! 3! 4! n! x6 x8 x4 + + . might be replaced with just the first five terms 1 + x2 + 2! 3! 4! Then a truncation error appears. Professor Henry Arguello Fuentes Numerical methods 2020 47 / 80 Truncation Error R 1/2 2 Example: Given p = 0 ex dx = 0.544987104184. Determine the accu2 racy of the approximation obtained by replacing the integrand f (x) = ex x4 x6 x8 with the truncated Taylor series P8 (x) = 1 + x2 + + + . 2! 3! 4! R 1/2 Determine 0 P8 (x)dx: Professor Henry Arguello Fuentes Numerical methods 2020 48 / 80 Truncation Error R 1/2 2 Example: Given p = 0 ex dx = 0.544987104184. Determine the accu2 racy of the approximation obtained by replacing the integrand f (x) = ex x4 x6 x8 with the truncated Taylor series P8 (x) = 1 + x2 + + + . 2! 3! 4! R 1/2 Determine 0 P8 (x)dx: Z 1/2 0 1 + x2 + x4 x6 x8 + + 2! 3! 4! x=1/2 x3 x5 x7 x9 x+ + + + 3 5(2!) 7(3!) 9(4!) x=0 1 1 1 1 1 + + + = + 2 24 320 5376 110592 2109491 = = 0.544986720817 = b p 3870720 dx = Since |p − b p| 101−6 = 7.03442 × 10−7 < = 5 × 106 |p| 2 then, the approximation b p agrees with the true value to 6 significant digits. Professor Henry Arguello Fuentes Numerical methods 2020 48 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 49 / 80 Round-off Error The accuracy of the representation of a real number stored in a computer is determined by the precision of the mantissa. Professor Henry Arguello Fuentes Numerical methods 2020 50 / 80 Round-off Error The accuracy of the representation of a real number stored in a computer is determined by the precision of the mantissa. The error occurred due to the mantissa precision is the round-off error. Professor Henry Arguello Fuentes Numerical methods 2020 50 / 80 Round-off Error The accuracy of the representation of a real number stored in a computer is determined by the precision of the mantissa. The error occurred due to the mantissa precision is the round-off error. The actual number that is stored in the computer may be chopping or rounding of the last digit. Professor Henry Arguello Fuentes Numerical methods 2020 50 / 80 Round-off Error The accuracy of the representation of a real number stored in a computer is determined by the precision of the mantissa. The error occurred due to the mantissa precision is the round-off error. The actual number that is stored in the computer may be chopping or rounding of the last digit. The computer hardware works with a limited number of digits in machine numbers, errors are introduced and propagated in successive computations. Professor Henry Arguello Fuentes Numerical methods 2020 50 / 80 Chopping Off versus Rounding Off Example: Consider p expressed in normalized decimal form: p = ±0.d1 d2 d3 · · · dk dk+1 · · · × 10n , where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for j > 1. Professor Henry Arguello Fuentes Numerical methods 2020 51 / 80 Chopping Off versus Rounding Off Example: Consider p expressed in normalized decimal form: p = ±0.d1 d2 d3 · · · dk dk+1 · · · × 10n , where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for j > 1. If k is the maximum number of decimal digits; then the real number p is represented by flchop (p), which is given by flchop (p) = ±0.d1 d2 d3 · · · dk × 10n , (9) Where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for 1 < j ≤ k. The number flchop (p) is called the chopped floating-point representation of p. Professor Henry Arguello Fuentes Numerical methods 2020 51 / 80 Chopping Off versus Rounding Off On the other hand, the rounded floating-point representation flround (p) is given by flround (p) = ±0.d1 d2 d3 · · · rk × 10n , (10) where 1 ≤ d1 ≤ 9 and 0 ≤ dj ≤ 9 for 1 < j < k and the last digit, rk , is obtained by rounding the number dk dk+1 dk+2 · · · to the nearest integer. Professor Henry Arguello Fuentes Numerical methods 2020 52 / 80 Chopping Off versus Rounding Off Example: 22 = 3.142857142857142857... has the following The real number p = 7 six-digit representations: flchop (p) = 0.314285 × 101 , flround (p) = 0.314286 × 101 . For common purposes the chopping and rounding would be written as 3.14285 and 3.14286, respectively. Professor Henry Arguello Fuentes Numerical methods 2020 53 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 54 / 80 Loss of Significance Consider p = 3.14155926536 ans q = 3.1415957341, which are nearly equal and both carry 11 decimal digits of precision. Their difference is formed: p − q = −0, 0000030805. Since the first six digits of p and q are the same, their difference p − q contains only five decimal digits of precision. This phenomenon is called loss of significance. Professor Henry Arguello Fuentes Numerical methods 2020 55 / 80 Loss of Significance Example: Compare the results of calculating f (500) and g(500) using six digits and round√ √ x ing. Where, f (x) = x( x + 1 − x) and g(x) = √ √ . x+1+ x Professor Henry Arguello Fuentes Numerical methods 2020 56 / 80 Loss of Significance Example: Compare the results of calculating f (500) and g(500) using six digits and round√ √ x ing. Where, f (x) = x( x + 1 − x) and g(x) = √ √ . x+1+ x For the first function, f (500) =500 √ 501 − √ 500 500(22.3830 − 22.3607) = 500(0.0223) = 11.1500 Professor Henry Arguello Fuentes Numerical methods 2020 56 / 80 Loss of Significance Example: Compare the results of calculating f (500) and g(500) using six digits and round√ √ x ing. Where, f (x) = x( x + 1 − x) and g(x) = √ √ . x+1+ x For the first function, f (500) =500 √ 501 − √ 500 500(22.3830 − 22.3607) = 500(0.0223) = 11.1500 For g(x) 500 √ 501 + 500 500 500 = = 11.1748. 22.3830 + 22.3607 44.7437 g(500) = √ Professor Henry Arguello Fuentes Numerical methods 2020 56 / 80 Loss of Significance Example: Compare the results of calculating f (500) and g(500) using six digits and round√ √ x ing. Where, f (x) = x( x + 1 − x) and g(x) = √ √ . x+1+ x For the first function, f (500) =500 √ 501 − √ 500 500(22.3830 − 22.3607) = 500(0.0223) = 11.1500 For g(x) 500 √ 501 + 500 500 500 = = 11.1748. 22.3830 + 22.3607 44.7437 g(500) = √ The second function, g(x), is algebraically equivalent to f (x), but the answer, g(500) = 11.1748, involves less error and it is the same as that obtained by rounding the true 11.174755300747198... to six digits. Professor Henry Arguello Fuentes Numerical methods 2020 56 / 80 Loss of Significance Example: Compare the results of calculating f (0.01) and P(0.01) using six digits and rounding, where 1 x x2 ex − 1 − x P(x) = + + f (x) = and 2 6 24 x2 The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded about x = 0. Professor Henry Arguello Fuentes Numerical methods 2020 57 / 80 Loss of Significance Example: Compare the results of calculating f (0.01) and P(0.01) using six digits and rounding, where 1 x x2 ex − 1 − x P(x) = + + f (x) = and 2 6 24 x2 The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded about x = 0. For the first function f (0.01) = Professor Henry Arguello Fuentes e0.01 − 1 − 0.01 1.010050 − 1 − 0.01 = = 0.5. 2 (0.01) 0.0001 Numerical methods 2020 57 / 80 Loss of Significance Example: Compare the results of calculating f (0.01) and P(0.01) using six digits and rounding, where 1 x x2 ex − 1 − x P(x) = + + f (x) = and 2 6 24 x2 The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded about x = 0. For the first function f (0.01) = e0.01 − 1 − 0.01 1.010050 − 1 − 0.01 = = 0.5. 2 (0.01) 0.0001 For the second function P(0.01) = 1 0.01 0.0001 + + = 0.5 + 0.001667 + 0.000004 = 0.501671. 2 6 24 Professor Henry Arguello Fuentes Numerical methods 2020 57 / 80 Loss of Significance Example: Compare the results of calculating f (0.01) and P(0.01) using six digits and rounding, where 1 x x2 ex − 1 − x P(x) = + + f (x) = and 2 6 24 x2 The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded about x = 0. For the first function f (0.01) = e0.01 − 1 − 0.01 1.010050 − 1 − 0.01 = = 0.5. 2 (0.01) 0.0001 For the second function P(0.01) = 1 0.01 0.0001 + + = 0.5 + 0.001667 + 0.000004 = 0.501671. 2 6 24 The answer P(0.01) = 0.501671 contains less error and it is the same as that obtained rounding the true answer 0.5016708416805... to six digits. Professor Henry Arguello Fuentes Numerical methods 2020 57 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 58 / 80 O(hn ) Order of Approximation For functions Definition 4. The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)), if there exist constants C and c such that: |f (h)| ≤ C|g(h)| Professor Henry Arguello Fuentes whenever h ≥ c. Numerical methods (11) 2020 59 / 80 O(hn ) Order of Approximation For functions Definition 4. The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)), if there exist constants C and c such that: |f (h)| ≤ C|g(h)| whenever h ≥ c. (11) Example: Consider f (x) = x2 + 1 and g(x) = x3 . Professor Henry Arguello Fuentes Numerical methods 2020 59 / 80 O(hn ) Order of Approximation For functions Definition 4. The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)), if there exist constants C and c such that: |f (h)| ≤ C|g(h)| whenever h ≥ c. (11) Example: Consider f (x) = x2 + 1 and g(x) = x3 . Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1 Professor Henry Arguello Fuentes Numerical methods 2020 59 / 80 O(hn ) Order of Approximation For functions Definition 4. The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)), if there exist constants C and c such that: |f (h)| ≤ C|g(h)| whenever h ≥ c. (11) Example: Consider f (x) = x2 + 1 and g(x) = x3 . Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1 it follows that x2 + 1 ≤ 2x3 for x ≥ 1. Professor Henry Arguello Fuentes Numerical methods 2020 59 / 80 O(hn ) Order of Approximation For functions Definition 4. The function f (h) is said to be big Oh of g(h), denoted f (h) = O(g(h)), if there exist constants C and c such that: |f (h)| ≤ C|g(h)| whenever h ≥ c. (11) Example: Consider f (x) = x2 + 1 and g(x) = x3 . Since x2 ≤ x3 and 1 ≤ x3 for x ≥ 1 it follows that x2 + 1 ≤ 2x3 for x ≥ 1. Therefore, f (x) = O(g(x)), whenever h ≥ 1. The big Oh notation provides an useful way of describing the rate of growth of a function in terms of the well-known elementary function (xn , x1/n , ax , loga (x), etc.). Professor Henry Arguello Fuentes Numerical methods 2020 59 / 80 O(hn ) Order of Approximation For sequences Definition 5. Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants C and N such that |xn | ≤ C|yn | whenever n ≥ N. Professor Henry Arguello Fuentes Numerical methods (12) 2020 60 / 80 O(hn ) Order of Approximation For sequences Definition 5. Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants C and N such that |xn | ≤ C|yn | whenever n ≥ N. (12) Example: n2 − 1 =O n3 n2 − 1 n2 1 1 , since ≤ = whenever n ≥ 1. 3 3 n n n n Professor Henry Arguello Fuentes Numerical methods 2020 60 / 80 O(hn ) Order of Approximation Definition 6. Assume that f (h) is approximated by the function p(h) and there exist a real constant M > 0 and a positive integer n so that |f (h) − p(h)| ≤ M for sufficiently small h. hn (13) We say that p(h) approximates f (h) with order of approximation O(hn ) and write f (h) = p(h) + O(hn ) (14) When relation (13) is rewritten in the form |f (h) − p(h)| ≤ M|hn |, we see that the notation O(hn ) stands in place of the error bound M|hn |. Professor Henry Arguello Fuentes Numerical methods 2020 61 / 80 O(hn ) Order of Approximation Theorem 2. Order of approximation for basic operations Assume that f (h) = p(h) + O(hn ), g(h) = q(h) + O(hm ), and r = min(m, n). Then f (h) + g(h) = p(h) + q(h) + O(hr ), (15) f (h)g(h) = p(h)q(h) + O(hr ), (16) and p(h) f (h) = + O(hr ) provided that g(h) 6= 0 and q(h) 6= 0. g(h) q(h) Professor Henry Arguello Fuentes Numerical methods 2020 (17) 62 / 80 O(hn ) Order of Approximation Theorem 3. (Taylor’s Theorem). Assume f ∈ Cn+1 [a, b]. If both x0 and x = x0 + h lie in [a, b], then f (x0 + h) = n X f (k)(x0 ) k=0 k! hk + O(hn+1 ). (18) Additional properties: (i) O(hp ) + O(hp ) = O(hp ), (ii) O(hp ) + O(hq ) = O(hr ), where r = min(p, q), and (iii) O(hp )O(hq ) = O(hs ), where s = p + q. Professor Henry Arguello Fuentes Numerical methods 2020 63 / 80 O(hn ) Order of Approximation Example: Consider the Taylor polynomial expansions eh = 1+h+ h2 h3 + +O(h4 ) 2! 3! and cos(h) = 1 − h2 h4 + + O(h6 ). 2! 4! Determine the order of approximation for their sum and product. Professor Henry Arguello Fuentes Numerical methods 2020 64 / 80 O(hn ) Order of Approximation Example: Consider the Taylor polynomial expansions eh = 1+h+ h2 h3 + +O(h4 ) 2! 3! and cos(h) = 1 − h2 h4 + + O(h6 ). 2! 4! Determine the order of approximation for their sum and product. For the sum we have h2 h4 h2 h3 + + O(h4 ) + 1 − + + O(h6 ) 2! 3! 2! 4! h3 h4 =2+h+ + O(h4 ) + + O(h6 ) 3! 4! eh + cos(h) =1 + h + Professor Henry Arguello Fuentes Numerical methods 2020 64 / 80 O(hn ) Order of Approximation Since O(h4 ) + h4 = O(h4 ) and O(h4 ) + O(h6 ) = O(h4 ), this reduces to 4! eh + cos(h) = 2 + h + h3 + O(h4 ), 3! and the order of approximation is O(h4 ). Professor Henry Arguello Fuentes Numerical methods 2020 65 / 80 O(hn ) Order of Approximation The product is treated similarly: h3 h2 h4 h2 + + O(h4 ) 1− + + O(h6 ) eh cos(h) = 1 + h + 2! 3! 2! 4! 2 3 2 4 h h h h = 1+h+ + 1− + + 2! 3! 2! 4! h2 h3 h2 h4 6 1+h+ + O(h ) + 1 − + O(h4 ) + O(h4 )O(h6 ) 2! 3! 2! 4! =1 + h − h3 5h4 h5 h6 h7 − − + + + O(h6 ) + O(h4 ) + O(h4 )O(h6 ). 3 24 24 48 144 Professor Henry Arguello Fuentes Numerical methods 2020 66 / 80 O(hn ) Order of Approximation Since O(h4 )O(h6 ) = O(h10 ) and −5h4 h5 h6 h7 − + + + O(h6 ) + O(h4 ) + O(h10 ) 24 24 48 144 Since O(h6 ) + O(h4 ) + O(h10 ) = O(h4 ), the preceding equation is simplified to yield eh cos(h) = 1 + h + h3 + O(h4 ), 3 and the order of approximation is O(h4 ). Professor Henry Arguello Fuentes Numerical methods 2020 67 / 80 Order of Convergence of a Sequence Convergence of a sequence Definition 7. Suppose that limn−→∞ xn = x and {rn }∞ n=1 is a sequence with limn−→∞ rn = 0. We say that {xn }∞ converges to x with the order n=1 of convergence O(rn ), if there exists a constant K ≥ 0 such that |xn − x| ≤ K for n sufficiently large. |rn | (19) This is indicated by writing xn = x + O(rn ), or xn −→ x with order of convergence O(rn ) Professor Henry Arguello Fuentes Numerical methods 2020 68 / 80 Order of Convergence of a Sequence Definition 7. Example: Let xn = cos(n)/n2 and rn = 1/n2 then, limn−→∞ xn = 0 with a rate of convergence O(1/n2 ). This follows immediately from the relation |cos(n)/n2 | = |cos(n) ≤ 1| for all n. |1/n2 | Professor Henry Arguello Fuentes Numerical methods 2020 69 / 80 Contents 1 Introduction 2 Binary numbers 3 Error Analysis Absolute and relative error Truncation Error Round-off Error Loss of Significance Order of Approximation Propagation of Error Professor Henry Arguello Fuentes Numerical methods 2020 70 / 80 Propagation of Error Addition consider two numbers p and q (the true values) with the approximate values b p and b q, which contains errors p and q , respectively. Starting with p = b p + p and q = b q + q , the sum is (20) p + q = (b p + p ) + (b q + q ) = (b p+b q) + (p + q ). Hence, for addition, the error in the sum is the sum of the errors in the addends. s = p + q . Professor Henry Arguello Fuentes Numerical methods 2020 71 / 80 Propagation of Error The propagation of error in multiplication is more complicated. The product is (21) pq = (b p + p )(b q + q ) = b pb q+b pp + b qp + p q . Professor Henry Arguello Fuentes Numerical methods 2020 72 / 80 Propagation of Error The propagation of error in multiplication is more complicated. The product is (21) pq = (b p + p )(b q + q ) = b pb q+b pp + b qp + p q . Hence, if b p and b q are larger than 1 in absolute value, the terms b pq and b qp show that there is a possibility of magnification of the original errors p and q . Insights are gained if we look at the relative error. Rearrange the terms in (21) to get pq − b pb q=b pq + b qp + p q . Professor Henry Arguello Fuentes Numerical methods (22) 2020 72 / 80 Propagation of Error The propagation of error in multiplication is more complicated. The product is (21) pq = (b p + p )(b q + q ) = b pb q+b pp + b qp + p q . Hence, if b p and b q are larger than 1 in absolute value, the terms b pq and b qp show that there is a possibility of magnification of the original errors p and q . Insights are gained if we look at the relative error. Rearrange the terms in (21) to get pq − b pb q=b pq + b qp + p q . (22) Suppose that b p 6= 0 and b q 6= 0; then we can divide (22) by pq to obtain the relative error in the product pq: Rpq = b b pq + b qp + p q pq b qp p q pq − b pb q = = + + . pq pq pq pq pq Professor Henry Arguello Fuentes Numerical methods (23) 2020 72 / 80 Propagation of Error Furthermore, suppose that b p and b q are good approximations for b p and b q; then b p/p ≈ 1, b q/q ≈ 1, and Rp Rq = (p /p)(q /q) ≈ 0 (Rp and Rq are the relative errors in the approximations b p and b q). Then making these substitutions yields the simplified relationship Rpq = Professor Henry Arguello Fuentes pq − b pb q ≈ q /q + p /p + 0 = Rq + Rp . pq Numerical methods (24) 2020 73 / 80 Propagation of Error Furthermore, suppose that b p and b q are good approximations for b p and b q; then b p/p ≈ 1, b q/q ≈ 1, and Rp Rq = (p /p)(q /q) ≈ 0 (Rp and Rq are the relative errors in the approximations b p and b q). Then making these substitutions yields the simplified relationship Rpq = pq − b pb q ≈ q /q + p /p + 0 = Rq + Rp . pq (24) This shows that the relative error in the product pq is approximately the b and q b. sum of the relative errors in the approximations p A quality that is desirable for any numerical process is that a small error in the initial conditions will produce small changes in the final result. An algorithm with this feature is called stable; otherwise, it is called unstable. Professor Henry Arguello Fuentes Numerical methods 2020 73 / 80 Propagation of Error Definition 8. Suppose that represents an initial error and (n) represents the growth of the error after n steps. If |(n)| ≈ n, the growth of error is said to be linear. If |(n)| ≈ K n , the growth of error is called exponential. If K > 1, the exponential error growns without bound as n −→ ∞, and if 0 < K < 1, the exponential error diminishes to zero as n −→ ∞. Professor Henry Arguello Fuentes Numerical methods 2020 74 / 80 Propagation of error Example: Show that the following three schemes can be used with finiteprecision arithmetic to recursively generate the terms in the sequence {1/3n }∞ n=0 . r0 = 1 and rn = 1 rn−1 3 for n = 1, 2, · · · , (25) p0 = 1, p1 = 1 , 3 and pn = 4 1 pn−1 − pn−2 3 3 for n = 1, 2, · · · , (26) q0 = 1, q1 = 1 , 3 and qn = 10 qn−1 − qn−2 3 for n = 1, 2, · · · , (27) Professor Henry Arguello Fuentes Numerical methods 2020 75 / 80 Propagation of error Formula (25) is obvious. In (26) the difference equation has the general solution pn = A(1/3n ) + B. This can be verified by direct substitution: 4 1 4 A A 1 pn−1 − pn−2 = +B − +B 3 3 3 3n−1 3 3n−2 1 3 4 1 4 − B = A n + B = pn − n A− = n 3 3 3 3 3 Setting A = 1 and B = 0 will generate the sequence desired. Professor Henry Arguello Fuentes Numerical methods 2020 76 / 80 Propagation of error Formula (25) is obvious. In (26) the difference equation has the general solution pn = A(1/3n ) + B. This can be verified by direct substitution: 4 1 4 A A 1 pn−1 − pn−2 = +B − +B 3 3 3 3n−1 3 3n−2 1 3 4 1 4 − B = A n + B = pn − n A− = n 3 3 3 3 3 Setting A = 1 and B = 0 will generate the sequence desired. In (27) the difference equation has the general solution qn = A(1/3n ) + B3n . This too verified by substitution: 10 10 A A n−1 n−2 qn−1 − qn−2 = + B3 − + B3 3 3 3n−1 3n−2 10 9 1 = − n A − (10 − 1)3n−1 B = A n + B3n = qn 3n 3 3 Professor Henry Arguello Fuentes Numerical methods 2020 76 / 80 Propagation of error Example: Generate approximations to the sequences {xn } = 1/3n using hemes r0 = 0.99996 and 1 rn = rn−1 3 p0 = 1, p1 = 0.33332, and 4 1 pn = pn−1 − pn−2 3 3 for n = 1, 2, · · · , 10 pn−1 − pn−2 3 for n = 1, 2, · · · , q0 = 1, q1 = 0.33332, and qn = for n = 1, 2, · · · , (28) (29) (30) In (28) the initial error in r0 is 0.00004, and in (29) and (30) the initial errors in p1 and q1 are 0.000013. Investigate the propagation of error for each scheme. Professor Henry Arguello Fuentes Numerical methods 2020 77 / 80 Propagation of error Table: Sequence xn = 1/3n and the approximations rn , pn , and qn n 0 1 2 3 4 5 6 7 8 9 10 xn 1.0000000000 0.3333333333 0.1111111111 0.0370370370 0.0123456790 0.0041152263 0.0013717421 0.0004572474 0.0001524158 0.0000508053 0.0000169351 Professor Henry Arguello Fuentes rn 0.9999600000 0.3333200000 0.1111066667 0.0370355556 0.0123451852 0.0041150617 0.0013716872 0.0004572291 0.0001524097 0.0000508032 0.0000169344 pn 1.0000000000 0.3333200000 0.1110933333 0.0370177778 0.0123259259 0.0040953086 0.0013517695 0.0004372565 0.0001324188 0.0000308063 -0.0000030646 Numerical methods qn 1.0000000000 0.3333200000 0.1110666667 0.0369022222 0.0119407407 0.0029002469 -0.0022732510 -0.0104777503 -0.0326525834 -0.0983641945 -0.2952280648 2020 78 / 80 Propagation of error Table: Error sequences xn − rn , xn − pn , and xn − qn n 0 1 2 3 4 5 6 7 8 9 10 xn − rn 0.0000400000 0.0000133333 0.0000044444 0.0000014815 0.0000004938 0.0000001646 0.0000000549 0.0000000183 0.0000000061 0.0000000020 0.0000000007 Professor Henry Arguello Fuentes xn − pn 0.0000000000 0.0000133333 0.0000177778 0.0000192593 0.0000197531 0.0000199177 0.0000199726 0.0000199909 0.0000199970 0.0000199990 0.0000199997 Numerical methods xn − qn 0.0000000000 0.0000133333 0.0000444444 0.0001348148 0.0004049383 0.0012149794 0.0036449931 0.0109349977 0.0328049992 0.0984149997 0.2952449999 2020 79 / 80 Propagation of error −5 6 x 10 The error for {rn } is stable and decreases in an exponential manner. n x −r n 4 2 The error {pn } is stable. 0 0 2 4 n 6 8 10 The error for {qn } is unstable and grows at an exponential rate. −5 2 x 10 xn−pn 1.5 1 0.5 0 0 2 4 6 8 10 6 8 10 n 0.4 Although the error for {pn } is stable, the terms pn −→ 0 as n −→ ∞, so that the error eventually dominates and the terms past p8 have no significant digits. xn−qn 0.3 0.2 0.1 0 0 2 4 n Professor Henry Arguello Fuentes Numerical methods 2020 80 / 80