Subido por Hiệp Nguyễn Quang

1-computer component-performance

Anuncio
COMPUTER
COMPONENTS
Pham Thanh Giang
Institute of Information Technology
[email protected]
COMPONENTS OF A COMPUTER
Same components for
all kinds of computer
 Desktop, server,
embedded
Input/output includes
 User-interface devices
 Display, keyboard, mouse
 Storage devices
 Hard disk, CD/DVD, flash
 Network adapters
 For communicating with
other computers
HARDWARE & SOFTWARE
Hardware
All of the electronic and mechanical equipment in a
computer is called the hardware. Examples include:
•
•
•
•
•
•
•
•
•
Motherboard
Hard disk
RAM
Power supply
Processor
Case
Monitor
Keyboard
Mouse
HARDWARE & SOFTWARE
Software
The term software is used to describe computer programs
that perform a task or tasks on a computer system.
Software can be grouped as follows:
• System software - Operating System etc.
• Utility programs - Antivirus etc.
•Applications Software - Word, SolidWorks etc.
PC COMPONENTS
Computer system - collection of electronic and
mechanical devices operating as a unit. The main parts
are:
1.System unit
2.Monitor
2
3.Keyboard
5
4.Mouse
5.Speakers
3
1
4
SYSTEM UNIT
The system unit is the main container for system devices. It
protects the delicate electronic and mechanical devices
from damage. Typical system unit devices include:
• Motherboard
• CPU (Processor)
• Memory
•Disk drives
• Ports - USB etc.
• Power supply
• Expansion cards - sound card,
network card, graphics card etc.
PERIPHERALS
Peripherals are devices that connect to the system unit
using cables or wireless technologies. Typical peripherals
include:
• Monitor
• Keyboard
•Printer
• Plotter
• Scanner
• Speakers
Plotter
PROCESSOR (CPU)
An integrated circuit (IC) supplied on a single silicon chip.
It’s function is to control all the computers functions. The
main processor manufacturers are:
• AMD - Athlon and Turion (mobile)
• Intel - Pentium and Centrino (mobile)
AMD
Processor
COMPUTER PROGRAM
Computer program - a series of instructions. When a
program is run, the processor carries out these instructions
in an orderly fashion. Typical instructions include:
• Arithmetic - addition, subtraction etc
• Logical - comparing data and acting according to the
result
• Move - move data from place to place within the
computer system - memory to the processor for
addition - memory to a printer or disk drive etc.
PROCESSOR SPEED
Processor speed - measured in megahertz (MHz) or
Gigahertz (GHz) - the speed of the system clock (clock
speed) within the processor and it controls how fast
instructions are executed:
• 1 MHz - 1 million clock ticks every second
• 1 GHz - 1 billion clock ticks every second
Latest trend - multi-core processors can have two, three or
four processor cores on a single chip.
RANDOM ACCESS MEMORY (RAM)
•Primary storage - main computer memory.
Data, programs currently in use are held in RAM
•Volatile - contents of memory are lost if the
computer is turned off
•Module - memory IC’s on a circuit board
Memory
Module
IC’s
MEMORY
Memory is sold in modules:
• DIMM’s (dual inline memory module) for desktop
computers
• SODIMM’s (small outline dual inline memory module) for
notebook computers.
DIMM
Module
SODIMM
Module
MEMORY
DIMM’s and SODIMM’s are available in modules of
256MB, 512MB, 1GB, 2GB, 4GB, 8GB
The current technology is called DDR (double data ram)
and there are three types: DDR1, DDR2, DDR3
Any particular computer system is only compatible with
one type.
Module capacity
Module name
Module type
Module speed
MOTHERBOARD
Mainboard or system board - the main circuit board for
the computer system. All device in the computer system
will either be part of the motherboard or connected to it.
Memory
Sockets
Processor
Socket
Chipset
PCI Slots
Ports
Graphics Slot
PROCESSOR SOCKET
Processor socket - different processors require different
sockets and a motherboard must be chosen to suit the
processor intended for use:
•Socket 478 - Intel Pentium IV
• Socket 775 - Intel Dual Core and Core Duo
• Socket 754 - AMD Athlon
•Socket 939 - AMD Athlon 64
• Socket AM2 - AMD Athlon X2
CHIPSET
Chipset - controls data flow around the computer. It
consists of two chips:
•Northbridge - data flow between memory and
processor - data flow between the processor and the
graphic's card
• Southbridge - controls data flow to the devices - USB,
IDE, SATA, LAN and Audio - controls PCI slots and
onboard graphics
BUSES
Buses - a path through which data can be
sent to the different parts of the computer
system. Main buses:
Processor
Front Side
Bus
Graphics Slot
PC-Express or AGP
Northbridge
Graphics Bus
RAM
Memory Bus
All Memory
Internal
Bus
Southbridge
PCI Slots
PCI Bus
IDE
SATA
USB
LAN
Audio
PCI Bus
Onboard
Graphics
POWER SUPPLY
A computer power supply has a number of functions:
• Converts Alternating current (AC) Direct current (DC)
• Transforms mains voltage (240 Volts) to the voltages
required by the computer. The main voltages are:
• 12 volts for the disk drives as they have motors
• 3.3 and 5 volts for the circuit boards in the computer
POWER SUPPLY
• Uses advances power management (APM) to allow the
computer go into a standby mode
• Some have a switch to toggle between 240 volt supplies
and 110 volt supplies.
• The main connections are:
3
1
4
2
1
Main connector
Connects to the motherboard and supplies the 3.3 and
5 volt supply for the board.
2
SATA connector
Connects SATA drives
3
Berg connector
Connects floppy disk drives
4
Molex connector
Connects IDE hard drives and optical drives.
PORTS
Computer ports are interfaces between peripheral
devices and the computer. They are mainly found at
the back of the computer but are often also built into
the front of the computer chassis for easy access.
Ports at the back of the computer
Ports at the front of the computer
COMPUTER PERFORMANCE
21
PERFORMANCE AND COST
Which computer is fastest?
Not so simple
Scientific simulation – FP performance
Program development – Integer
performance
Database workload – Memory, I/O
MEASURING EXECUTION TIME
Elapsed time
 Total response time, including all aspects
 Processing, I/O, OS overhead, idle time
 Determines system performance
CPU time
 Time spent processing a given job
 Discounts I/O time, other jobs’ shares
 Comprises user CPU time and system CPU time
 Different programs are affected differently by CPU
and system performance
DEFINING PERFORMANCE
What is important to whom?
Computer system user
 Minimize elapsed time for program:
tresp = tend – tstart
 Called response time
Computer center manager
 Maximize completion rate = #jobs/second
 Called throughput
WHAT IS PERFORMANCE FOR US?
For computer architects
 CPU time = time spent running a program
Intuitively, bigger should be faster, so:
 Performance = 1/X, where X is response time: CPU
execution, etc.
Elapsed time = CPU time + I/O wait
We will concentrate on CPU time
IMPROVE PERFORMANCE
Improve (a) response time or (b)
throughput?
Faster CPU
 Helps both (a) and (b)
Add more CPUs
 Helps (b) and perhaps (a) due to less queueing
CPU CLOCKING
Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle


e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second

e.g., 4.0GHz = 1/ 0.25ns = 4000MHz = 4.0×109Hz
CPU TIME
Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock rate
against cycle count
Number of clock cycles
Clock cycle
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
CPU TIME EXAMPLE
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock Rate B 
Clock Cycles B 1.2  Clock Cycles A

CPU Time B
6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  109
1.2  20  109 24  109
Clock Rate B 

 4GHz
6s
6s
INSTRUCTION COUNT AND CPI
Instruction Count for a program
 Determined by program, ISA and compiler
Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix
Clock Cycles  Instructio n Count  Cycles per Instructio n
CPU Time  Instructio n Count  CPI  Clock Cycle Time

Instructio n Count  CPI
Clock Rate
CPI EXAMPLE
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time
CPU Time
A
B
 Instructio n Count  CPI  Cycle Time
A
A
 I  2.0  250ps  I  500ps
 Instructio n Count  CPI  Cycle Time
B
B
 I  1.2  500ps  I  600ps
B  I  600ps  1.2
CPU Time
I  500ps
A
CPU Time
A is faster…
…by this much
CPI IN MORE DETAIL
If different instruction types take
different numbers of cycles
n
Clock Cycles   (CPIi  Instructio n Count i )
i1
 Weighted average CPI
n
Clock Cycles
Instructio n Count i 

CPI 
   CPIi 

Instructio n Count i1 
Instructio n Count 
Relative frequency
CPI EXAMPLE
Alternative compiled code sequences using
instructions in type INT, FP, MEM
Type

INT
FP
MEM
CPI for type
1
2
3
IC in Program 1
2
1
2
5
IC in Program 2
4
1
1
6
Program 1: IC = 5


Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0

Program 2: IC = 6


Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5
IRON LAW
PROCESSOR PERFORMANCE
Time
Execution time = --------------Program
=
Instructions
Program
Cycles
X
X
Instructions
Time
Cycle
(code size)
(CPI)
(cycle time)
Architecture --> Implementation --> Realization
Compiler Designer
Processor Designer
Chip Designer
IRON LAW
Instructions/Program
 Instructions executed, not static code size
 Determined by algorithm, compiler, ISA
Cycles/Instruction
 Determined by ISA and CPU organization
 Overlap among instructions reduces this term
Time/cycle
 Determined by technology, organization, clever circuit
design
IRON LAW EXAMPLE
Machine A: clock 1ns, CPI 2.0, for program x
Machine B: clock 2ns, CPI 1.2, for program x
Which is faster and how much?
Time/Program = instr/program x cycles/instr x sec/cycle
Time(A) = N x 2.0 x 1 = 2N
Time(B) = N x 1.2 x 2 = 2.4N
Compare: Time(B)/Time(A) = 2.4N/2N = 1.2
So, Machine A is 20% faster than Machine B for
this program
IRON LAW EXAMPLE
Keep clock(A) @ 1ns and clock(B) @2ns
For equal performance, if CPI(B)=1.2, what is
CPI(A)?
Time(B)/Time(A) = 1 = (Nx2x1.2)/(Nx1xCPI(A))
CPI(A) = 2.4
IRON LAW EXAMPLE
Keep CPI(A)=2.0 and CPI(B)=1.2
For equal performance, if clock(B)=2ns, what
is clock(A)?
Time(B)/Time(A) = 1 = (N x 2.0 x clock(A))/(N x 1.2 x 2)
clock(A) = 1.2ns
IRON LAW
Example 1:
How much is execution time of a program which executes 3
billion instructions in a processor. The processor spends 2 cycles
on each instruction and is working at 3GHz.
Example2:
A program contains 50 billion instruction whose composition is
as follows:
10 billion branch instructions, CPI=4
15 billion load instructions, CPI=2
5 billion store instructions, CPI=3
20 billion integer-type instructions, CPI=1
Evaluate the execution time for above program.
39
SUMMARY
Time and performance: Machine A n times faster
than Machine B
 If Time(B)/Time(A) = n
Iron Law: Performance = Time/program =
=
Instructions
Program
(code size)
X
Cycles
X
Instruction
(CPI)
Other Metrics: MIPS and MFLOPS
 Beware of peak and omitted details
Time
Cycle
(cycle time)
OTHER METRICS
MIPS and MFLOPS
 Million Instructions Per Second
 Million Floating Point Operations Per Second
MIPS
= instruction count/(execution time x 106)
= clock rate/(CPI x 106)
But MIPS has serious shortcomings
OTHER METRICS
MFLOPS = FP ops in program/(execution time x 106)
Assuming FP ops independent of compiler
and ISA
 Often safe for numeric codes: matrix size determines #
of FP ops/program
 However, not always safe:
 Missing instructions (e.g. FP divide)
 Optimizing compilers
Relative MIPS and normalized MFLOPS
 Adds to confusion
4
PROBLEMS WITH MIPS
E.g. without FP hardware, an FP op may take 50
single-cycle instructions
With FP hardware, only one 2-cycle instruction

Thus, adding FP hardware:
– CPI increases (why?)
– Instructions/program
decreases (why?)
– Total execution time decreases

BUT, MIPS gets worse!
50/50 => 2/1
50 => 1
50 => 2
50 MIPS => 2 MIPS
PROBLEMS WITH MIPS
Ignores program
Usually used to quote peak performance
 Ideal conditions => guaranteed not to exceed!
When is MIPS ok?
 Same compiler, same ISA
 E.g. same binary running on CPU type, such as: AMD
Jaguar, Intel Core i7
 Why? Instr/program is constant and can be factored out
AMDAHL’S LAW
Motivation for optimizing common case
Speedup = old time / new time = new rate / old rate
Let an optimization speed fraction f of time by a factor of s
New_time = (1-f) x old_time + f x (old_time/s)
Speedup = old_time / new_time
Speedup = old_time / ((1-f) x old_time + f x (old_time/s))
Speedup 

1 
1  f   f  oldtime
f
f   oldtime  
 oldtime
s
1
1 f 
f
s
AMDAHL’S LAW EXAMPLE
Your boss asks you to improve performance by:
 Improve the ALU used 95% of time by 10%
 Improve memory pipeline used 5% of time by 10x
f
95%
5%
5%
s
1.10
10
∞
Speedup
1.094
1.047
1.052
Speedup 
1
f
1 f 
s
AMDAHL’S LAW: LIMIT
1
1
lim

s 
f 1 f
1 f 
s
Speedup
Make
common
case
fast:
10
9
8
7
6
5
4
3
2
1
0
0
0.2
0.4
0.6
f
0.8
1
AMDAHL’S LAW: LIMIT
1
•Consider uncommon case!
•If (1-f) is nontrivial
1
lim

s 
f 1 f
1 f 
s
–Speedup is limited!
•Particularly true for exploiting parallelism in the large,
where large s is not cheap
–GPU with e.g. 1024 processors (shader cores)
–Parallel portion speeds up by s (1024x)
–Serial portion of code (1-f) limits speedup
E.g. 10% serial portion: 1/0.1 = 10x speedup with 1000 cores
4
POWER TRENDS
In CMOS IC technology
Power  Capacitive load  Voltage 2  Frequency
×30
5V → 1V
×1000
REDUCING POWER
Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction
Pnew Cold  0.85  (Vold  0.85) 2  Fold  0.85
4


0.85
 0.52
2
Pold
Cold  Vold  Fold
 The power wall
 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?
CONCLUDING REMARKS
Cost/performance is improving
 Due to underlying technology development
Hierarchical layers of abstraction
 In both hardware and software
Instruction set architecture
 The hardware/software interface
Execution time: the best performance measure
Power is a limiting factor
 Use parallelism to improve performance
Descargar