Memory technology

Anuncio
Laboratorio de
Tecnologías de Información
Memory Hierarchy
Arquitectura de Computadoras
Arturo Díaz Pérez
Centro de Investigación y de Estudios Avanzados del IPN
Laboratorio de Tecnologías de Información
[email protected]
Arquitectura de Computadoras
MemoryHierarchy- 1
The Big Picture: Where are We Now?
Laboratorio de
Tecnologías de Información
The Five Classic Components of a Computer
Processor
Input
Control
Memory
Datapath
Output
Today’s Topics:
■
■
Locality and Memory Hierarchy
Memory Organization
Arquitectura de Computadoras
MemoryHierarchy- 2
Memory Trends
Memory Technology Typical Access Time
Laboratorio de
Tecnologías de Información
$ per GB in 2004
SRAM
0.5-5 ns
$4000-$10,000
DRAM
50-70 ns
$100-$200
Magnetic Disck
5,000,000-20,000,000
ns
$0.50-$2
Arquitectura de Computadoras
MemoryHierarchy- 3
Technology Trends
Laboratorio de
Tecnologías de Información
Capacity
Logic:2x in 3 years
Speed (latency)
2x in 3 years
DRAM:
4x in 3 years
2x in 10 years
Disk:
4x in 3 years
2x in 10 years
DRAM
Size
1000:1! 64 Kb 2:1!
256 Kb
1 Mb
4 Mb
16 Mb
64 Mb
2 GB
Cycle Time
250 ns
220 ns
190 ns
165 ns
145 ns
120 ns
50-70ns
Year
1980
1983
1986
1989
1992
1995
2004
Arquitectura de Computadoras
MemoryHierarchy- 4
Who Cares About the Memory
Hierarchy?
Laboratorio de
Tecnologías de Información
Processor-DRAM Memory Gap (latency)
Performance
1000
CPU
“Moore’s Law”
µProc
60%/yr.
(2X/1.5yr)
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
“Less’ Law?”
DRAM
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1
Time
Arquitectura de Computadoras
DRAM
9%/yr.
(2X/10 yrs)
MemoryHierarchy- 5
Current Microprocessor
Laboratorio de
Tecnologías de Información
Rely on caches to bridge gap
Microprocessor-DRAM performance gap
time of a full cache miss in instructions executed
1st Alpha (7000):
340 ns/5.0 ns = 68 clks x 2 or
2nd Alpha (8400):
266 ns/3.3 ns = 80 clks x 4 or
3rd Alpha:
180 ns/1.7 ns =108 clks x 6 or
■
■
136 instructions
320 instructions
648 instructions
1/2X latency x 3X clock rate x 3X Instr/clock ⇒ -5X
Arquitectura de Computadoras
MemoryHierarchy- 6
Impact on Performance
Suppose a processor executes at
■
■
■
Inst Miss
(0.5)
16%
Clock Rate = 200 MHz (5 ns per cycle)
CPI = 1.1
50% arith/logic, 30% ld/st, 20% control
Suppose that 10% of memory
operations get 50 cycle
miss penalty
Laboratorio de
Tecnologías de Información
Ideal CPI
(1.1)
35%
DataMiss
(1.6)
49%
CPI = ideal CPI + average stalls per instruction
= 1.1(cyc) +( 0.30 (datamops/ins)
x 0.10 (miss/datamop) x 50 (cycle/miss) )
= 1.1 cycle + 1.5 cycle
= 2. 6
58 % of the time the processor
is stalled waiting for memory!
a 1% instruction miss rate would add
an additional 0.5 cycles to the CPI!
Arquitectura de Computadoras
MemoryHierarchy- 7
The Goal: illusion of large, fast,
cheap memory
Laboratorio de
Tecnologías de Información
Fact: Large memories are slow, fast
memories are small
How do we create a memory that is
large, cheap and fast (most of the time)?
■Hierarchy
■Parallelism
Arquitectura de Computadoras
MemoryHierarchy- 8
An Expanded View of the Memory
System
Laboratorio de
Tecnologías de Información
Processor
Speed: Fastest
Size: Smallest
Cost: Highest
Arquitectura de Computadoras
Memory
Memory
Datapath
Memory
Control
Memory
Memory
Slowest
Biggest
Lowest
MemoryHierarchy- 9
Why hierarchy works
Laboratorio de
Tecnologías de Información
The Principle of Locality:
■
Program access a relatively small portion of the address space at
any instant of time.
Probability
of reference
0
Arquitectura de Computadoras
Address Space
2n - 1
MemoryHierarchy- 10
Memory Hierarchy: How Does it Work?
Laboratorio de
Tecnologías de Información
Temporal Locality (Locality in Time):
Clustering in time: items referenced in the immediate past have a
high probability of being re-referenced in the immediate future
=> Keep most recently accessed data items closer to the processor
■
Spatial Locality (Locality in Space):
Clustering in space: items located physically near an item referenced
in the immediate past have a high probability of being re-referenced
in the immediate future
=> Move blocks consists of contiguous words to the upper levels
■
To Processor
From Processor
Arquitectura de Computadoras
Upper Level
Memory
Blk X
Lower
Level
Memory
Blk Y
MemoryHierarchy- 11
Visualizing Locality
Laboratorio de
Tecnologías de Información
The memory map [Hatfield and Gerald 1971]
Arquitectura de Computadoras
MemoryHierarchy- 12
Memory Hierarchy of a Modern
Computer System
Laboratorio de
Tecnologías de Información
By taking advantage of the principle of locality:
■
■
Present the user with as much memory as is available in the
cheapest technology.
Provide access at the speed offered by the fastest technology.
Processor
Control
On-Chip
Cache
Registers
Datapath
Second
Level
Cache
(SRAM)
Main
Memory
(DRAM)
Speed (ns): 1s
10s
100s
Size (bytes): 100s
Ks
Ms
Arquitectura de Computadoras
Secondary
Storage
(Disk)
Tertiary
Storage
(Tape)
10,000,000s 10,000,000,000s
(10s ms)
(10s sec)
Gs
Ts
MemoryHierarchy- 13
How is the hierarchy managed?
Laboratorio de
Tecnologías de Información
Registers <-> Memory
■
by compiler (programmer?)
cache <-> memory
■
by the hardware
memory <-> disks
■
■
by the hardware and operating system (virtual memory)
by the programmer (files)
Arquitectura de Computadoras
MemoryHierarchy- 14
Memory Hierarchy: Terminology
Laboratorio de
Tecnologías de Información
Hit: data appears in some block in the upper level
(example: Block X)
■
■
Hit Rate: the fraction of memory access found in the upper level
Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the
lower level (Block Y)
■
■
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Hit Time << Miss Penalty
To Processor
From Processor
Arquitectura de Computadoras
Upper
Level
Memory
Lower
Level
Memory
Blk X
Blk Y
MemoryHierarchy- 15
Memory Hierarchy Technology
Laboratorio de
Tecnologías de Información
Random Access:
■
■
■
“Random” is good: access time is the same for all locations
DRAM: Dynamic Random Access Memory
● High density, low power, cheap, slow
● Dynamic: need to be “refreshed” regularly
SRAM: Static Random Access Memory
● Low density, high power, expensive, fast
● Static: content will last “forever”(until lose power)
“Non-so-random” Access Technology:
■
■
Access time varies from location to location and from time to time
Examples: Disk, CDROM
Sequential Access Technology: access time linear in
location (e.g.,Tape)
We will concentrate on random access technology
■
The Main Memory: DRAMs + Caches: SRAMs
Arquitectura de Computadoras
MemoryHierarchy- 16
Main Memory Background
Laboratorio de
Tecnologías de Información
Performance of Main Memory:
■
■
Latency: Cache Miss Penalty
● Access Time: time between request and word arrives
● Cycle Time: time between requests
Bandwidth: I/O & Large Block Miss Penalty (L2)
Main Memory is DRAM: Dynamic Random Access
Memory
■
■
Dynamic since needs to be refreshed periodically (8 ms)
Addresses divided into 2 halves (Memory as a 2D matrix):
● RAS or Row Access Strobe
● CAS or Column Access Strobe
Cache uses SRAM : Static Random Access Memory
■
No refresh (6 transistors/bit vs. 1 transistor)
Size: DRAM/SRAM - 4-8
Cost/Cycle time: SRAM/DRAM - 8-16
Arquitectura de Computadoras
MemoryHierarchy- 17
Typical SRAM Organization
Din 3
Din 2
Din 1
Laboratorio de
Tecnologías de Información
Din 0
WrEn
A0
A1
SRAM
Memory 16x4
Dout 3
Arquitectura de Computadoras
Dout 2
Dout 1
A2
A3
Dout 0
MemoryHierarchy- 18
Typical SRAM Organization: 16-word x 4-bit
Din 3
Din 2
Din 1
Laboratorio de
Tecnologías de Información
Din 0
WrEn
Precharge
Wr Driver &
- Precharger+
Wr Driver &
- Precharger+
Wr Driver &
- Precharger+
Wr Driver &
- Precharger+
SRAM
Cell
SRAM
Cell
SRAM
Cell
SRAM
Cell
Word 1
SRAM
Cell
SRAM
Cell
SRAM
Cell
SRAM
Cell
:
:
:
:
Address Decoder
Word 0
A0
A1
A2
A3
Word 15
SRAM
Cell
SRAM
Cell
SRAM
Cell
SRAM
Cell
- Sense Amp
+
- Sense Amp
+
- Sense Amp
+
- Sense Amp
+
Dout 3
Arquitectura de Computadoras
Dout 2
Dout 1
Dout 0
MemoryHierarchy- 19
Classical DRAM Organization
Laboratorio de
Tecnologías de Información
bit (data) lines
r
o
w
d
e
c
o
d
e
r
row
address
Each intersection represents
a 1-T DRAM Cell
RAM Cell
Array
word (row) select
Column Selector &
I/O Circuits
data
Arquitectura de Computadoras
Column
Address
Row and Column Address
together:
■
Select 1 bit a time
MemoryHierarchy- 20
Main Memory Performance
Wide:
■
Simple:
■
CPU/Mux 1 word;
Mux/Cache, Bus,
Memory N words
(Alpha: 64 bits & 256
bits)
Laboratorio de
Tecnologías de Información
Interleaved:
■
CPU, Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved
CPU, Cache, Bus, Memory
same width
(32 bits)
Arquitectura de Computadoras
MemoryHierarchy- 21
Summary First Part
Laboratorio de
Tecnologías de Información
Two Different Types of Locality:
■
Temporal Locality (Locality in Time): If an item is referenced, it will
tend to be referenced again soon.
■
Spatial Locality (Locality in Space): If an item is referenced, items
whose addresses are close by tend to be referenced soon.
By taking advantage of the principle of locality:
■
■
Present the user with as much memory as is available in the
cheapest technology.
Provide access at the speed offered by the fastest technology.
DRAM is slow but cheap and dense:
■
Good choice for presenting the user with a BIG memory system
SRAM is fast but expensive and not very dense:
■
Good choice for providing the user FAST access time.
Arquitectura de Computadoras
MemoryHierarchy- 22
Descargar