Laboratorio de Tecnologías de Información Arithmetic and Logic Unit First Part Arquitectura de Computadoras Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Laboratorio de Tecnologías de Información [email protected] Arquitectura de Computadoras ALU1- 1 Typical Operations Data Movement Laboratorio de Tecnologías de Información Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Shift shift left/right, rotate left/right Logical not, and, or, set, clear Arquitectura de Computadoras ALU1- 2 Operands for ALU instructions Laboratorio de Tecnologías de Información ♦ ALU instructions combine operands (e.g. ADD) ♦ Number of explicit operands ■ Two - destination equals one source ■ Three - orthogonal Arquitectura de Computadoras ALU1- 3 MIPS Addressing Modes/Instruction Formats Laboratorio de Tecnologías de Información • All instructions 32 bits wide Register (direct) op rs rt rd register Immediate Base+index op rs rt immed op rs rt immed register PC-relative • Register Indirect? Arquitectura de Computadoras op rs PC rt Memory + immed Memory + ALU1- 4 MIPS: Register State Laboratorio de Tecnologías de Información ♦ 32 integer registers ■ $0 is hardwared to 0 ■ $31 is the return address register ■ software convention for other registers ♦ 32 single-precision FP registers or 16 doubleprecision FP registers ♦ PC and other special registers Arquitectura de Computadoras ALU1- 5 MIPS I Operation Overview Laboratorio de Tecnologías de Información ♦ Arithmetic Logical: ■ Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU ■ AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI ■ SLL, SRL, SRA, SLLV, SRLV, SRAV ♦ Memory Access: ■ LB, LBU, LH, LHU, LW, LWL,LWR ■ SB, SH, SW, SWL, SWR Arquitectura de Computadoras ALU1- 6 MIPS arithmetic instructions Instruction add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply unsigned divide Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 mult $2,$3 multu$2,$3 div $2,$3 divide unsigned divu $2,$3 Move from Hi Move from Lo mfhi $1 mflo $1 Meaning $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 ÷ $3, Hi = $2 mod $3 Lo = $2 ÷ $3, Hi = $2 mod $3 $1 = Hi $1 = Lo Laboratorio de Tecnologías de Información Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Unsigned quotient & remainder Used to get copy of Hi Used to get copy of Lo Which add for address arithmetic? Which add for integers? Arquitectura de Computadoras ALU1- 7 MIPS logical instructions Instruction and or xor nor and immediate or immediate xor immediate shift left logical shift right logical shift right arithm. shift left logical shift right logical shift right arithm. Example and $1,$2,$3 or $1,$2,$3 xor $1,$2,$3 nor $1,$2,$3 andi $1,$2,10 ori $1,$2,10 xori $1, $2,10 sll $1,$2,10 srl $1,$2,10 sra $1,$2,10 sllv $1,$2,$3 srlv $1,$2, $3 srav $1,$2, $3 Arquitectura de Computadoras Meaning $1 = $2 & $3 $1 = $2 | $3 $1 = $2 ⊕ $3 $1 = ~($2 |$3) $1 = $2 & 10 $1 = $2 | 10 $1 = ~$2 &~10 $1 = $2 << 10 $1 = $2 >> 10 $1 = $2 >> 10 $1 = $2 << $3 $1 = $2 >> $3 $1 = $2 >> $3 Laboratorio de Tecnologías de Información Comment 3 reg. operands; Logical AND 3 reg. operands; Logical OR 3 reg. operands; Logical XOR 3 reg. operands; Logical NOR Logical AND reg, constant Logical OR reg, constant Logical XOR reg, constant Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable ALU1- 8 Details of the MIPS instruction set Laboratorio de Tecnologías de Información ♦ Register zero always has the value zero (even if you try to write it) ♦ Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture) ♦ All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …) ♦ Immediate arithmetic and logical instructions are extended as follows: ■ logical immediates ops are zero extended to 32 bits ■ arithmetic immediates ops are sign extended to 32 bits (including addu) ♦ The data loaded by the instructions lb and lh are extended as follows: ■ lbu, lhu are zero extended ■ lb, lh are sign extended ♦ Overflow can occur in these arithmetic and logical instructions: ■ add, sub, addi ■ it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu Arquitectura de Computadoras ALU1- 9 MIPS: Instruction Set Format Laboratorio de Tecnologías de Información ■ load/store architecture with 3 explicit operands (ALU ops) ■ fixed 32-bit instructions ■ 3 instruction formats » R-Type » I-Type » J-Type ■ 6 instruction set groups: » » » » » » load/store - data movement operations computational - arithmetic, logical, and shift operations jump/branch - including call and returns coprocessor - FP instructions coprocessor0 - memory management and exception handling special - accessing special registers, system calls, breakpoint instructions, etc. Arquitectura de Computadoras ALU1- 10 R2000/3000 Instruction Formats Laboratorio de Tecnologías de Información ♦ R-type (register) e.g. add $8, $17, $18 31 26 25 OpCode 31 21 20 rs 26 25 0 Arquitectura de Computadoras # $8 = $17 + $18 16 15 rt 21 20 17 11 10 rd 16 15 18 6 5 shamt 11 10 8 0 funct 6 5 0 0 32 ALU1- 11 R2000/3000 Instruction Formats • I-type (immediate) e.g. addi $8, $17, -44 lw $8, -44($17) beq $17, $8, label 31 26 25 OpCode 31 21 20 rs 26 25 “op” Arquitectura de Computadoras 17 # $8 = $17 -44 # $8 = M[$17 - 44] # if( $8 == $17) go to label: 16 15 rt 21 20 0 immediate 16 15 8 Laboratorio de Tecnologías de Información 0 -44 ALU1- 12 R2000/3000 Instruction Formats • J-type (jump) e.g. jump label 31 # call label: ; $31 = $pc + 8 26 25 OpCode 31 0 target 26 25 3 Arquitectura de Computadoras Laboratorio de Tecnologías de Información 0 -44 ALU1- 13 Laboratorio de Tecnologías de Información Arquitectura de Computadoras ALU1- 14 5 Steps of DLX Datapath Instruction Fetch Instruction Decode/ Execute Register Fetch Addr. Calc. Laboratorio de Tecnologías de Información Memory Access Write Back M u x 4 Add Zero ? NPC A PC Inst. Memory IR M u x Add Registers B M u x 16 ALU Output SMD LMD Data Memory M u x 32 Sign Extend Arquitectura de Computadoras ALU1- 15 Useful Circuits for Interconnection Laboratorio de Tecnologías de Información ♦ Four common and useful MSI circuits are: ■ ■ ■ ■ Decoder Demultiplexer Encoder Multiplexer ♦ Block-level outlines of MSI circuits: code input Arquitectura de Computadoras decoder mux select entity data entity data encoder demux select code output ALU1- 16 Decoders Laboratorio de Tecnologías de Información ♦ Codes are frequently used to represent entities ♦ These codes can be identified (or decoded) using a decoder. Given a code, identify the entity. ♦ Convert binary information from n input lines to (max. of) 2n output lines. ♦ Known as n-to-m-line decoder, or simply n:m or n×m decoder (m ≤ 2n). ♦ May be used to generate 2n (or fewer) minterms of n input variables. Arquitectura de Computadoras ALU1- 17 Decoders Laboratorio de Tecnologías de Información ♦ Example: if codes 00, 01, 10, 11 are used to identify four light bulbs, we may use a 2-bit decoder: 2x4 F0 X Dec F 2-bit code Bulb 0 Bulb 1 Bulb 2 Bulb 3 1 Y F2 F3 This is a 2×4 decoder which selects an output line based on the 2-bit code supplied. Truth table: X 0 0 1 1 Arquitectura de Computadoras Y F0 F1 0 1 0 1 0 1 0 0 0 1 0 0 F2 0 0 1 0 F3 0 0 0 1 ALU1- 18 Encoder Laboratorio de Tecnologías de Información ♦ Encoding is the converse of decoding. ♦ Given a set of input lines, where one has been selected, provide a code corresponding to that line. ♦ Contains 2n (or fewer) input lines and n output lines. ♦ Implemented with OR gates. ♦ An example: F0 Select via switches Arquitectura de Computadoras F1 F2 F3 D0 4-to-2 Encoder D1 2-bits code ALU1- 19 Encoder Laboratorio de Tecnologías de Información Truth table: F0 F1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 Arquitectura de Computadoras F2 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 F3 0 0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 D1 0 0 1 1 X X X X X X X X X X X X D0 0 1 0 1 X X X X X X X X X X X X ALU1- 20 Encoder Laboratorio de Tecnologías de Información ♦ With the help of K-map (and don’t care conditions), can obtain: ♦ D0 = F1 + F3 D1 = F2 + F3 which correspond to circuit: F0 F1 F2 F3 Arquitectura de Computadoras D0 Simple 4-to-2 encoder D1 ALU1- 21 Demultiplexer Laboratorio de Tecnologías de Información ♦ Given an input line and a set of selection lines, the demultiplexer will direct data from input to a selected output line. ♦ An example of a 1-to-4 demultiplexer: Outputs Y0 = D.S1'.S0' Data D demux Y1 = D.S1'.S0 Y2 = D.S1.S0' Y3 = D.S1.S0 S1 So 0 0 0 1 1 0 1 1 Y0 D 0 0 0 Y1 0 D 0 0 Y2 0 0 D 0 Y3 0 0 0 D S1 S0 select Arquitectura de Computadoras ALU1- 22 Demultiplexer Laboratorio de Tecnologías de Información ♦ The demultiplexer is actually identical to a decoder with enable, as illustrated below: S1 2x4 Decoder S0 Y0 = D.S1'.S0' Y1 = D.S1'.S0 Y2 = D.S1.S0' E Y3 = D.S1.S0 D Exercise: Provide the truth table for above demultiplexer. Arquitectura de Computadoras ALU1- 23 Multiplexer Laboratorio de Tecnologías de Información ♦ A multiplexer is a device which has (i) a number of input lines (ii) a number of selection lines (iii) one output line ♦ It steers one of 2n inputs to a single output line, using n selection lines. Also known as a data selector. inputs : 2n:1 Multiplexer output ... select Arquitectura de Computadoras ALU1- 24 Multiplexer Laboratorio de Tecnologías de Información Truth table for a 4-to-1 multiplexer: I0 d0 d0 d0 d0 I1 d1 d1 d1 d1 Inputs I0 I1 I2 I3 I2 d2 d2 d2 d2 I3 d3 d3 d3 d3 0 4:1 1 MUX Y 2 3 S1 S0 select Arquitectura de Computadoras S1 0 0 1 1 S0 0 1 0 1 Y d0 d1 d2 d3 Output S1 0 0 1 1 S0 0 1 0 1 Inputs I0 I1 I2 I3 Y I0 I1 I2 I3 mux Y S1 S0 select ALU1- 25 Laboratorio de Tecnologías de Información Arquitectura de Computadoras ALU1- 26 Binary Representation Laboratorio de Tecnologías de Información b31b30b29b28………………b3b2b1b0 b31 × 231 + b30 × 230 + b29 × 2 29 + b28 × 2 28 + ... + b2 × 2 2 + b1 × 21 + b0 × 20 0000 0000 0000 0000 0000 0000 0000 00002 = 010 0000 0000 0000 0000 0000 0000 0000 00012 = 110 0000 0000 0000 0000 0000 0000 0000 00102 = 210 0000 0000 0000 0000 0000 0000 0000 10112 = 1110 Arquitectura de Computadoras ALU1- 27 Signed Numbers Laboratorio de Tecnologías de Información ♦ Sign+Magnitude ♦ For n-bit numbers, the most significant bit is reserved for sign 0000 0000 0000 0000 0000 0000 0000 10112 = 1110 1000 0000 0000 0000 0000 0000 0000 10112 = -1110 Sign Arquitectura de Computadoras Magnitude ALU1- 28 Signed Numbers Laboratorio de Tecnologías de Información ♦ For n-bit numbers, the negation of B in two’s complement is 2n - B (this is one of the alternative ways of negating a two’s-complement number). - B = (2n - B) 0000 0000 0000 0000 0000 0000 0000 10112 = 1110 1111 1111 1111 1111 1111 1111 1111 01002 + 12 1111 1111 1111 1111 1111 1111 1111 01012 Arquitectura de Computadoras ALU1- 29 Signed Numbers Laboratorio de Tecnologías de Información ♦ For n-bit numbers, the negation of B in two’s complement is 2n - B (this is one of the alternative ways of negating a two’s-complement number). - B = (2n - B) 1111 1111 1111 1111 1111 1111 1111 01012 = -1110 0000 0000 0000 0000 0000 0000 0000 10102 + 12 0000 0000 0000 0000 0000 0000 0000 10112 = 1110 Arquitectura de Computadoras ALU1- 30 Signed Number Systems ♦ ♦ ♦ ♦ ♦ Here are all the 4-bit numbers in the different systems. Positive numbers are the same in all three representations. Signed magnitude and one’s complement have two ways of representing 0. This makes things more complicated. Two’s complement has asymmetric ranges; there is one more negative number than positive number. Here, you can represent -8 but not +8. However, two’s complement is preferred because it has only one 0, and its addition algorithm is the simplest. Arquitectura de Computadoras Laboratorio de Tecnologías de Información Decimal S.M. 1’s comp. 2’s comp. 7 6 5 4 3 2 1 0 -0 -1 -2 -3 -4 -5 -6 -7 -8 0111 0110 0101 0100 0011 0010 0001 0000 1000 1001 1010 1011 1100 1101 1110 1111 — 0111 0110 0101 0100 0011 0010 0001 0000 1111 1110 1101 1100 1011 1010 1001 1000 — 0111 0110 0101 0100 0011 0010 0001 0000 — 1111 1110 1101 1100 1011 1010 1001 1000 ALU1- 31 Sign extension ♦ ♦ Laboratorio de Tecnologías de Información In everyday life, decimal numbers are assumed to have an infinite number of 0s in front of them. This helps in “lining up” numbers. To subtract 231 and 3, for instance, you can imagine: 231 - 003 228 ♦ ♦ ♦ You need to be careful in extending signed binary numbers, because the leftmost bit is the sign and not part of the magnitude. If you just add 0s in front, you might accidentally change a negative number into a positive one! For example, going from 4-bit to 8-bit numbers: ■ 0101 (+5) should become 0000 0101 (+5). ■ But 1100 (-4) should become 1111 1100 (-4). ♦ The proper way to extend a signed binary number is to replicate the sign bit, so the sign is preserved. Arquitectura de Computadoras ALU1- 32 Two’s complement addition Laboratorio de Tecnologías de Información ♦ Negating a two’s complement number takes a bit of work, but addition is much easier than with the other two systems ♦ To find A + B, you just have to: ■ Do unsigned addition on A and B, including their sign bits. ■ Ignore any carry out. ♦ For example, to find 0111 + 1100, or (+7) + (-4): ■ First add 0111 + 1100 as unsigned numbers: 01 1 1 + 1 1 00 1 001 1 ■ Discard the carry out (1). ■ The answer is 0011 (+3). Arquitectura de Computadoras ALU1- 33 Another two’s complement example Laboratorio de Tecnologías de Información ♦ Let’s try adding two negative numbers—1101 + 1110, or (-3) + (-2) in decimal. ♦ Adding the numbers gives 11011: 1 1 01 + 1110 1 1 01 1 ♦ Dropping the carry out (1) leaves us with the answer, 1011 (-5). Arquitectura de Computadoras ALU1- 34 Why does this work? Laboratorio de Tecnologías de Información ♦ For n-bit numbers, the negation of B in two’s complement is 2n - B (this is one of the alternative ways of negating a two’s-complement number). A - B = A + (-B) = A + (2n - B) = (A - B) + 2n ♦ If A ≥ B, then (A - B) is a positive number, and 2n represents a carry out of 1. Discarding this carry out is equivalent to subtracting 2n, which leaves us with the desired result (A - B). ♦ If A < B, then (A - B) is a negative number and we have 2n - (A - B). This corresponds to the desired result, -(A - B), in two’s complement form. Arquitectura de Computadoras ALU1- 35 Signed overflow ♦ ♦ With two’s complement and a 4-bit adder, for example, the largest representable decimal number is +7, and the smallest is -8. What if you try to compute 4 + 5, or (-4) + (-5)? 01 00 + 01 01 01 001 ♦ ♦ Laboratorio de Tecnologías de Información (+4) (+5) (-7) 1 1 00 + 1 01 1 1 01 1 1 (-4) (-5) (+7) We cannot just include the carry out to produce a five-digit result, as for unsigned addition. If we did, (-4) + (-5) would result in +23! Also, unlike the case with unsigned numbers, the carry out cannot be used to detect overflow. ■ In the example on the left, the carry out is 0 but there is overflow. ■ Conversely, there are situations where the carry out is 1 but there is no overflow. Arquitectura de Computadoras ALU1- 36 Detecting signed overflow Laboratorio de Tecnologías de Información ♦ The easiest way to detect signed overflow is to look at all the sign bits. 01 00 + 01 01 01 001 (+4) (+5) (-7) 1 1 00 + 1 01 1 1 01 1 1 (-4) (-5) (+7) ♦ Overflow occurs only in the two situations above: ■ If you add two positive numbers and get a negative result. ■ If you add two negative numbers and get a positive result. ♦ Overflow cannot occur if you add a positive number to a negative number. Do you see why? Arquitectura de Computadoras ALU1- 37 Refined Requirements Laboratorio de Tecnologías de Información (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode outputs: 32-bit result S, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (powerview symbol, VHDL entity) 32 A c ovf 32 ALU B m 4 S 32 Arquitectura de Computadoras ALU1- 38 Gate-level Design: Half Adder Laboratorio de Tecnologías de Información ♦ Design procedure: 1) State Problem Example: Build a Half Adder to add two bits 2) Determine and label the inputs & outputs of circuit. Example: Two inputs and two outputs labeled, as follows: X Y Half Adder (X + Y) S C X 0 0 1 1 Y 0 1 0 1 C 0 0 0 1 S 0 1 1 0 3) Draw truth table. Arquitectura de Computadoras ALU1- 39 Gate-level Design: Half Adder 4) Obtain simplified Boolean function. Example: C = X.Y S = X'.Y + X.Y' = X⊕Y X 0 0 1 1 Laboratorio de Tecnologías de Información Y 0 1 0 1 C 0 0 0 1 S 0 1 1 0 5) Draw logic diagram. X Y S Half Adder C Arquitectura de Computadoras ALU1- 40 Gate-level Design: Full Adder Laboratorio de Tecnologías de Información ♦ Half-adder adds up only two bits. ♦ To add two binary numbers, we need to add 3 bits (including the carry). ♦ Example: + 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 carry X Y S Need Full Adder (so called as it can be made from two halfadders). X Y Z Arquitectura de Computadoras Full Adder (X + Y + Z) S C ALU1- 41 Gate-level Design: Full Adder Laboratorio de Tecnologías de Información Truth table: X 0 0 0 0 1 1 1 1 Y 0 0 1 1 0 0 1 1 Z 0 1 0 1 0 1 0 1 C 0 0 0 1 0 1 1 1 S 0 1 1 0 1 0 0 1 Note: Z - carry in (to the current position) C - carry out (to the next position) YZ X 00 1 1 YZ X 1 10 1 1 1 01 11 10 1 1 S 00 0 Arquitectura de Computadoras 01 11 0 Using K-map, simplified SOP form: C = X.Y + X.Z + Y.Z S = X'.Y'.Z + X'.Y.Z'+X.Y'.Z'+X.Y.Z C 1 1 ALU1- 42 Gate-level Design: Full Adder Laboratorio de Tecnologías de Información Alternative formulae using algebraic manipulation: C = X.Y + X.Z + Y.Z = X.Y + (X + Y).Z = X.Y + ((X⊕Y) + X.Y).Z = X.Y + (X⊕Y).Z + X.Y.Z = X.Y + (X⊕Y).Z S = X'.Y'.Z + X'.Y.Z' + X.Y'.Z' + X.Y.Z = X‘.(Y'.Z + Y.Z') + X.(Y'.Z' + Y.Z) = X'.(Y⊕Z) + X.(Y⊕Z)' = X⊕(Y⊕Z) or (X⊕Y)⊕Z Arquitectura de Computadoras ALU1- 43 Gate-level Design: Full Adder Laboratorio de Tecnologías de Información Circuit for above formulae: C = X.Y + (X⊕Y).Z S = (X⊕Y)⊕Z X Y (X⊕Y) S (XY) C Z Full Adder made from two Half-Adders (+ OR gate). Arquitectura de Computadoras ALU1- 44 Gate-level (SSI) Design: Full Adder Laboratorio de Tecnologías de Información Circuit for above formulae: C = X.Y + (X⊕Y).Z S = (X⊕Y)⊕Z X Y X Y Block diagrams. (X⊕Y) Sum X Y Half Adder Carry Sum S Half Adder (X.Y) Carry C Z Full Adder made from two Half-Adders (+ OR gate). Arquitectura de Computadoras ALU1- 45 4-bit Parallel Adder Laboratorio de Tecnologías de Información Consider a circuit to add two 4-bit numbers together and a carry-in, to produce a 5-bit result: X4 X3 X2 X1 C5 Y4 Y3 Y2 Y1 4-bit Parallel Adder S4 S3 S2 S1 C1 Black-box view of 4-bit parallel adder 5-bit result is sufficient because the largest result is: (1111)2+(1111)2+(1)2 = (11111)2 Arquitectura de Computadoras ALU1- 46 4-bit Parallel Adder Laboratorio de Tecnologías de Información Truth table for 9 inputs very big, i.e. 29=512 entries: X4X3X2X1 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 1 ... 1 1 1 1 Y4Y3Y2Y1 0 0 0 0 0 0 0 0 0 0 0 1 ... 1 1 0 1 ... 1 1 1 1 C1 0 1 0 ... 1 ... 1 C5 0 0 0 ... 1 ... 1 S4S3S2S1 0 0 0 0 0 0 0 1 0 0 0 1 ... 0 0 1 1 ... 1 1 1 1 Simplification very complicated. Arquitectura de Computadoras ALU1- 47 4-bit Parallel Adder Laboratorio de Tecnologías de Información ♦ Alternative design possible. ♦ Addition formulae for each pair of bits (with carry in), Ci+1Si = Xi + Yi + Ci has the same function as a full adder. Ci+1 = Xi .Yi + (Xi ⊕ Yi ) .Ci S i = Xi ⊕ Y i ⊕ C i Arquitectura de Computadoras ALU1- 48 4-bit Parallel Adder Laboratorio de Tecnologías de Información Cascading 4 full adders via their carries, we get: Y4 X4 Y3 X3 C4 C5 FA S4 Y2 X2 C2 C3 FA S3 Y1 X1 FA S2 FA C1 S1 Input Output Arquitectura de Computadoras ALU1- 49 Parallel Adders Laboratorio de Tecnologías de Información ♦ Note that carry propagated by cascading the carry from one full adder to the next. ♦ Called Parallel Adder because inputs are presented simultaneously (in parallel). Also, called Ripple-Carry Adder. Arquitectura de Computadoras ALU1- 50 16-bit Parallel Adder Laboratorio de Tecnologías de Información ♦ Larger parallel adders can be built from smaller ones. ♦ Example: a 16-bit parallel adder can be constructed from four 4-bit parallel adders: X16..X13 Y16..Y13 4 C17 X12..X9 Y12..Y9 4 4-bit // adder 4 S16..S13 4 C13 X8..X5 4 4-bit // adder 4 S12..S9 Y8..Y5 4 C9 X4..X1 4 4-bit // adder 4 S8..S5 Y4..Y1 4 C5 4 4-bit // adder C1 4 S4..S1 A 16-bit parallel adder Arquitectura de Computadoras ALU1- 51 But What about Performance? Laboratorio de Tecnologías de Información ♦ Critical Path of n-bit Rippled-carry adder is n*CP CarryIn0 A0 B0 A1 B1 A2 B2 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 Design Trick: Throw hardware at it 1-bit Result2 ALU CarryIn3 CarryOut2 A3 B3 1-bit ALU Result3 CarryOut3 Arquitectura de Computadoras ALU1- 52 Calculation of Circuit Delays Laboratorio de Tecnologías de Información In general, given a logic gate with delay, t. t1 t2 : tn : Logic Gate max (t1, t2, ..., tn ) + t If inputs are stable at times t1,t2,..,tn, respectively; then the earliest time in which the output will be stable is: max(t1, t2, .., tn) + t To calculate the delays of all outputs of a combinational circuit, repeat above rule for all gates. Arquitectura de Computadoras ALU1- 53 Calculation of Circuit Delays Laboratorio de Tecnologías de Información As a simple example, consider the full adder circuit where all inputs are available at time 0. (Assume each gate has delay t.) X Y 0 0 max(0,0)+t = t max(t,0)+t = 2t S t 2t max(t,2t)+t = 3t C Z 0 where outputs S and C, experience delays of 2t and 3t, respectively. Arquitectura de Computadoras ALU1- 54 Calculation of Circuit Delays Laboratorio de Tecnologías de Información More complex example: 4-bits parallel adder. Y4 X4 C4 0 0 C5 FA S4 Arquitectura de Computadoras Y3 X3 Y2 X2 C3 0 0 FA S3 Y1 X1 C2 0 0 FA S2 0 0 FA 0 C1 S1 ALU1- 55 Calculation of Circuit Delays Laboratorio de Tecnologías de Información Analyse the delay for the repeated block: Xi Yi Ci 0 0 mt Si Full Adder Ci+1 where Xi, Yi are stable at 0t, while Ci is assumed to be stable at mt. Performing the delay calculation gives: Xi 0 Yi 0 max(0,0)+t = t max(t,mt)+t Si t max(t,mt)+t max(t,mt)+2t Ci+1 Ci mt Arquitectura de Computadoras ALU1- 56 Calculation of Circuit Delays Laboratorio de Tecnologías de Información Calculating: When i=1, m=0: S1 = 2t and C2 = 3t. When i=2, m=3: S2 = 4t and C3 = 5t. When i=3, m=5: S3 = 6t and C4 = 7t. When i=4, m=7: S4 = 8t and C5 = 9t. In general, an n-bit ripple-carry parallel adder will experience: Sn = ((n-1)*2+2)t Cn+1 = ((n-1)*2+3)t as their delay times. Propagation delay of ripple-carry parallel adders is proportional to the number of bits it handles. Maximum Delay: ((n-1)*2+3)t Arquitectura de Computadoras ALU1- 57 Faster Circuits Laboratorio de Tecnologías de Información Three ways of improving the speed of these circuits: (i) Use better technology (e.g. ECL faster than TTL gates), BUT (a) faster technology is more expensive, needs more power, lower-level of integrations. (b) physical limits (e.g. speed of light, size of atom). (ii) Use gate-level designs to two-level circuits! (use sum- of-products/product-of-sums) BUT (a) complicated designs for large circuits. (b) product/sum terms need MANY inputs! (iii) Use clever look-ahead techniques BUT there are additional costs (hopefully reasonable). Arquitectura de Computadoras ALU1- 58 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información Consider the full adder: Pi Xi Yi Si Gi Ci+1 where intermediate signals are labelled as Pi, Gi, and defined as: Ci Pi = Xi⊕Yi Gi = Xi.Yi The outputs, Ci+1,Si, in terms of Pi ,Gi ,Ci , are: Si = Pi ⊕ Ci Ci+1 = Gi + Pi.Ci …(1) …(2) If you look at equation (2), Gi = Xi.Yi is a carry generate signal Pi = Xi ⊕ Yi is a carry propagate signal Arquitectura de Computadoras ALU1- 59 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información For 4-bit ripple-carry adder, the equations to obtain four carry signals are: Ci+1 = Gi + Pi.Ci Ci+2 = Gi+1 + Pi+1.Ci+1 Ci+3 = Gi+2 + Pi+2.Ci+2 Ci+4 = Gi+3 + Pi+3.Ci+3 These formula are deeply nested, as shown here for Ci+2: Ci Pi Ci+1 Gi Pi+1 Ci+2 Gi+1 4-level circuit for Ci+2 = Gi+1 + Pi+1.Ci+1 Arquitectura de Computadoras ALU1- 60 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información Nested formula/gates cause ripple-carry propagation delay. Can reduce delay by expanding and flattening the formula for carries. For example, Ci+2 Ci+2 = Gi+1 + Pi+1.Ci+1 = Gi+1 + Pi+1.(Gi + Pi.Ci ) = Gi+1 + Pi+1.Gi + Pi+1.Pi.Ci New faster circuit for Ci+2 Ci Pi Pi+1 Gi Pi+1 Ci+2 Gi+1 Arquitectura de Computadoras ALU1- 61 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información Other carry signals can also be similarly flattened. Ci+3= Gi+2 + Pi+2Ci+2 = Gi+2 + Pi+2(Gi+1 + Pi+1Gi + Pi+1PiCi) = Gi+2 + Pi+2Gi+1 + Pi+2Pi+1Gi + Pi+2Pi+1PiCi Ci+4 = Gi+3 + Pi+3Ci+3 = Gi+3 + Pi+3(Gi+2 + Pi+2Gi+1 + Pi+2Pi+1Gi + Pi+2Pi+1PiCi) = Gi+3 + Pi+3Gi+2 + Pi+3Pi+2Gi+1 + Pi+3Pi+2Pi+1Gi + Pi+3Pi+2Pi+1PiCi Notice that formulae gets longer with higher carries. Also, all carries are two-level “sum-of-products” expressions, in terms of the generate signals, Gs, the propagate signals, Ps, and the first carry-in, Ci. Arquitectura de Computadoras ALU1- 62 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información We employ the lookahead formula in this lookahead-carry adder circuit: Arquitectura de Computadoras ALU1- 63 Look-Ahead Carry Adder Laboratorio de Tecnologías de Información The 74182 IC chip allows faster lookahead adder to be built. Maximum propagation delay is 4t (t to get generate & propagate signals, 2t to get the carries and t for the sum signals) where t is the average gate delay. Arquitectura de Computadoras ALU1- 64 Making a subtraction circuit Laboratorio de Tecnologías de Información ♦ We could build a subtraction circuit directly, similar to the way we made unsigned adders ♦ However, by using two’s complement we can convert any subtraction problem into an addition problem. Algebraically, A - B = A + (-B) ♦ So to subtract B from A, we can instead add the negation of B to A ♦ This way we can re-use the unsigned adder hardware Arquitectura de Computadoras ALU1- 65 Why does this work? Laboratorio de Tecnologías de Información ♦ For n-bit numbers, the negation of B in two’s complement is 2n - B (this is one of the alternative ways of negating a two’s-complement number). A - B = A + (-B) = A + (2n - B) = (A - B) + 2n ♦ If A ≥ B, then (A - B) is a positive number, and 2n represents a carry out of 1. Discarding this carry out is equivalent to subtracting 2n, which leaves us with the desired result (A - B). ♦ If A < B, then (A - B) is a negative number and we have 2n - (A - B). This corresponds to the desired result, -(A - B), in two’s complement form. Arquitectura de Computadoras ALU1- 66 A two’s complement subtraction circuit Laboratorio de Tecnologías de Información ♦ To find A - B with an adder, we’ll need to: ■ Complement each bit of B. ■ Set the adder’s carry in to 1. ♦ The net result is A + B’ + 1, where B’ + 1 is the two’s complement negation of B. ♦ A3, B3 and S3 here are actually sign bits. Arquitectura de Computadoras ALU1- 67 Small differences Laboratorio de Tecnologías de Información ♦ The only differences between the adder and subtractor circuits are: ■ The subtractor has to negate B3 B2 B1 B0. ■ The subtractor sets the initial carry in to 1, instead of 0. ♦ It’s not too hard to make one circuit that does both addition and subtraction Arquitectura de Computadoras ALU1- 68 An adder-subtractor circuit ♦ ♦ ♦ Laboratorio de Tecnologías de Información XOR gates let us selectively complement the B input. X⊕0=X X ⊕ 1 = X’ When Sub = 0, the XOR gates output B3 B2 B1 B0 and the carry in is 0. The adder output will be A + B + 0, or just A + B. When Sub = 1, the XOR gates output B3’ B2’ B1’ B0’ and the carry in is 1. Thus, the adder output will be a two’s complement subtraction, A - B. Arquitectura de Computadoras ALU1- 69 Subtraction summary Laboratorio de Tecnologías de Información ♦ A good representation for negative numbers makes subtraction hardware much easier to design. ■ Two’s complement is used most often (although signed magnitude shows up sometimes, such as in floating-point systems) ■ Using two’s complement, we can build a subtractor with minor changes to the adder from last week. ■ We can also make a single circuit which can both add and subtract. ♦ Overflow is still a problem, but signed overflow is very different from the unsigned overflow ♦ Sign extension is needed to properly “lengthen” negative numbers. ♦ We will use most of the ideas we’ve seen so far to build an ALU – an important part of a processor. Arquitectura de Computadoras ALU1- 70 Homework 4 Laboratorio de Tecnologías de Información ♦ Computer Organization and Design: The Hardware and Software Interface. Third Edition. David A. Patterson and John L. Hennesy. Morgan and Kauffmann Publishers. USA. 2005. ♦ Solve the following exercises: ♦ Chapter 1. ■ Exercises: 1.47, 1.48, 1.50, 1.51, 1.52, 1.53, 1.54 ♦ Chapter 2. ■ Exercises: 2.6, 2.29, 2.30, 2.31, 2.32, 2.33, 2.37, 2.49, 2.51 ♦ Send a pdf file Due date: October 6th, 2008. Arquitectura de Computadoras ALU1- 71