Bài giảng Computer architecture: Part V

Chia sẻ: Codon_06 Codon_06 | Ngày: | Loại File: PPT | Số trang:68

Thêm vào BST

Báo xấu

46
lượt xem 3
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Memory System Design thuộc Part V của "Bài giảng Computer architecture" với các vấn đề cơ bản như: Main memory concepts; cache memory organization; mass memory concepts;... Cùng tìm hiểu để nắm bắt nội dung thông tin tài liệu.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Bài giảng Computer architecture: Part V

Part V Memory System Design Mar. 2006 Computer Architecture, Memory System Design Slide 1
About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated regularly by the author as part of his teaching of the upper- division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised Revised First July 2003 July 2004 July 2005 Mar. 2006 Mar. 2006 Computer Architecture, Memory System Design Slide 2
V Memory System Design Design problem – We want a memory unit that: • Can keep up with the CPU’s processing speed • Has enough capacity for programs and data • Is inexpensive, reliable, and energy-efficient Topics in This Part Chapter 17 Main Memory Concepts Chapter 18 Cache Memory Organization Chapter 19 Mass Memory Concepts Chapter 20 Virtual Memory and Paging Mar. 2006 Computer Architecture, Memory System Design Slide 3
17 Main Memory Concepts Technologies & organizations for computer’s main memory • SRAM (cache), DRAM (main), and flash (nonvolatile) • Interleaving & pipelining to get around “memory wall” Topics in This Chapter 17.1 Memory Structure and SRAM 17.2 DRAM and Refresh Cycles 17.3 Hitting the Memory Wall 17.4 Interleaved and Pipelined Memory 17.5 Nonvolatile Memory 17.6 The Need for a Memory Hierarchy Mar. 2006 Computer Architecture, Memory System Design Slide 4
17.1 Memory Structure and SRAM Output enable Chip select Storage Write enable cells Data in / D Q / / Data out g g g Address / FF h C Q 0 D Q / g FF C Q Address 1 decoder . . WE . D Q / D in g D out FF Addr C Q CS OE 2h ?1 Fig. 17.1 Conceptual inner structure of a 2h g SRAM chip and its shorthand representation. Mar. 2006 Computer Architecture, Memory System Design Slide 5
Data Multiple-Chip SRAM in 32 WE WE WE WE Address D in D in D in D in D out D out D out D out / / Addr Addr Addr Addr 18 17 CS OE CS OE CS OE CS OE MSB WE WE WE WE D in D in D in D in D out D out D out D out Addr Addr Addr Addr CS OE CS OE CS OE CS OE Data out, Data out, Data out, Data out, byte 3 byte 2 byte 1 byte 0 Fig. 17.2 Eight 128K 8 SRAM chips forming a 256K 32 memory unit. Mar. 2006 Computer Architecture, Memory System Design Slide 6
SRAM with Bidirectional Data Bus Output enable Chip select Write enable Data in/out / Address g / h Data in Data out Fig. 17.3 When data input and output of an SRAM chip are shared or connected to a bidirectional data bus, output must be disabled during write operations. Mar. 2006 Computer Architecture, Memory System Design Slide 7
17.2 DRAM and Refresh Cycles DRAM vs. SRAM Memory Cell Complexity Word line Word line Vcc Pass transistor Capacitor Compl. Bit Bit bit line line line (a) DRAM cell (b) Typical SRAM cell Fig. 17.4 Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity DRAM memory chips. Mar. 2006 Computer Architecture, Memory System Design Slide 8
DRAM Refresh Cycles and Refresh Rate Voltage 1 Written Refreshed Refreshed Refreshed for 1 Threshold voltage 10s of ms 0 Stored before needing Voltage refresh cycle Time for 0 Fig. 17.5 Variations in the voltage across a DRAM cell capacitor after writing a 1 and subsequent refresh operations. Mar. 2006 Computer Architecture, Memory System Design Slide 9
Loss of Bandwidth to Refresh Cycles Example 17.2 A 256 Mb DRAM chip is organized as a 32M 8 memory externally and as a 16K 16K array internally. Rows must be refreshed at least once every 50 ms to forestall data loss; refreshing a row takes 100 ns. What fraction of the total memory bandwidth is lost to refresh cycles? 16K Row decoder . Square or Write enable . almost square memory matrix 16K . / Data in g Data out / g / Address h Chip Output . . . select enable 14 Row buffer . . . Figure 2.10 Address / Row h Column Column mux 11 8 g bits data out Solution (a) SRAM block diagram (b) SRAM read mechanism Refreshing all 16K rows takes 16 1024 100 ns = 1.64 ms. Loss of 1.64 ms every 50 ms amounts to 1.64/50 = 3.3% of the total bandwidth. Mar. 2006 Computer Architecture, Memory System Design Slide 10
DRAM Packaging 24-pin dual in-line package (DIP) Vss D4 D3 CAS OE A9 A8 A7 A6 A5 A4 Vss Legend: 24 23 22 21 20 19 18 17 16 15 14 13 Ai Address bit i CAS Column address strobe Dj Data bit j NC No connection OE Output enable 1 2 3 4 5 6 7 8 9 10 11 12 RAS Row address strobe WE Write enable Vcc D1 D2 WE RAS NC A10 A0 A1 A2 A3 Vcc Fig. 17.6 Typical DRAM package housing a 16M 4 memory. Mar. 2006 Computer Architecture, Memory System Design Slide 11
DRAM 1000 Computer class Evolution Memory size Super- computers 1 TB Number of memory chips 256 100 Servers GB 64 GB 16 Work- GB stations 4 GB Large 1 PCs GB 256 10 MB Small 64 PCs MB 16 MB 4 MB Fig. 17.7 1 MB Trends in DRAM main 1 1980 1990 2000 2010 memory. Calendar year Mar. 2006 Computer Architecture, Memory System Design Slide 12
17.3 Hitting the Memory Wall 10 6 Relative performance Processor 10 3 Memory 1 1980 1990 2000 2010 Calendar year Fig. 17.8 Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace. Mar. 2006 Computer Architecture, Memory System Design Slide 13
Bridging the CPU-Memory Speed Gap Idea: Retrieve more data from memory with each access Wide- . . Narrow bus Wide- . . Wide bus access . . Mux to access . to . Mux memory processor mem ory . . . processor . (a) Buffer and mult iplex er (a) Buffer and mult iplex er at the memory side at the processor side Fig. 17.9 Two ways of using a wide-access memory to bridge the speed gap between the processor and memory. Mar. 2006 Computer Architecture, Memory System Design Slide 14
17.4 Pipelined and Interleaved Memory Memory latency may involve other supporting operations besides the physical access itself Virtual-to-physical address translation (Chap 20) Tag comparison to determine cache hit/miss (Chap 18) Row Column Tag Address decoding decoding comparison translation & read out & selection & validation Fig. 17.10 Pipelined cache memory. Mar. 2006 Computer Architecture, Memory System Design Slide 15
Memory Interleaving Module accessed Addresses that 0 are 0 mod 4 1 Addresses that 2 Add- are 1 mod 4 Data Dispatch 3 ress Return out (based on 2 LSBs of data 0 Data address) Addresses that in are 2 mod 4 1 2 Bus cycle Addresses that 3 are 3 mod 4 Memory cycle Time Fig. 17.11 Interleaved memory is more flexible than wide-access memory in that it can handle multiple independent accesses at once. Mar. 2006 Computer Architecture, Memory System Design Slide 16
17.5 Nonvolatile Memory S u p p ly vo l t a g e ROM Word contents PROM EPROM 1010 1001 Word lines 0010 1101 B i t li nes Fig. 17.12 Read-only memory organization, with the fixed contents shown on the right. Mar. 2006 Computer Architecture, Memory System Design Slide 17
Flash Memory S o u r c e l i n es Control gate Floating gate Source Word lines n p subs- trate n+ B i t lines Drain Fig. 17.13 EEPROM or Flash memory organization. Each memory cell is built of a floating-gate MOS transistor. Mar. 2006 Computer Architecture, Memory System Design Slide 18
17.6 The Need for a Memory Hierarchy The widening speed gap between CPU and main memory Processor operations take of the order of 1 ns Memory access requires 10s or even 100s of ns Memory bandwidth limits the instruction execution rate Each instruction executed involves at least one memory access Hence, a few to 100s of MIPS is the best that can be achieved A fast buffer memory can help bridge the CPU-memory gap The fastest memories are expensive and thus not very large A second (third?) intermediate cache level is thus often used Mar. 2006 Computer Architecture, Memory System Design Slide 19
Typical Levels in a Hierarchical Memory Capacity Access latency Cost per GB 100s B ns Reg’s $Millions 10s KB a few ns Cache 1 $100s Ks MBs 10s ns $10s Ks Cache 2 100s MB 100s ns Speed Main $1000s gap 10s GB 10s ms Secondary $10s TBs min+ Tertiary $1s Fig. 17.14 Names and key characteristics of levels in a memory hierarchy. Mar. 2006 Computer Architecture, Memory System Design Slide 20