Computer Organization and Design The Hardware Software Interface ARM Edition by David Patterson – Ebook PDF Instant Download/Delivery: 9780128018354, 0128018356
Full dowload Computer Organization and Design The Hardware Software Interface ARM Edition after payment
Product details:
• ISBN 10:0128018356
• ISBN 13:9780128018354
• Author:David Patterson
The new ARM Edition of Computer Organization and Design features a subset of the ARMv8-A architecture, which is used to present the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies, and I/O.
With the post-PC era now upon us, Computer Organization and Design moves forward to explore this generational change with examples, exercises, and material highlighting the emergence of mobile computing and the Cloud. Updated content featuring tablet computers, Cloud infrastructure, and the ARM (mobile computing devices) and x86 (cloud computing) architectures is included.
An online companion Web site provides links to a free version of the DS-5 Community Edition (a free professional quality tool chain developed by ARM), as well as additional advanced content for further study, appendices, glossary, references, and recommended reading.
Computer Organization and Design The Hardware Software Interface ARM Table of contents:
1 Computer Abstractions and Technology
1.1 Introduction
Traditional Classes of Computing Applications and Their Characteristics
Welcome to the Post-PC Era
What You Can Learn in This Book
1.2 Eight Great Ideas in Computer Architecture
Design for Moore’s Law
Use Abstraction to Simplify Design
Make the Common Case Fast
Performance via Parallelism
Performance via Pipelining
Performance via Prediction
Hierarchy of Memories
Dependability via Redundancy
1.3 Below Your Program
From a High-Level Language to the Language of Hardware
1.4 Under the Covers
Through the Looking Glass
Touchscreen
Opening the Box
A Safe Place for Data
Communicating with Other Computers
1.5 Technologies for Building Processors and Memory
1.6 Performance
Defining Performance
Measuring Performance
CPU Performance and Its Factors
Instruction Performance
The Classic CPU Performance Equation
1.7 The Power Wall
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors
1.9 Real Stuff: Benchmarking the Intel Core i7
SPEC CPU Benchmark
SPEC Power Benchmark
1.10 Fallacies and Pitfalls
1.11 Concluding Remarks
Road Map for This Book
1.12 Historical Perspective and Further Reading
The First Electronic Computers
Commercial Developments
Measuring Performance
The Quest for an Average Program
SPECulating about Performance
The Growth of Embedded Computing
A Half-Century of Progress
Further Reading
1.13 Exercises
2 Instructions: Language of the Computer
2.1 Introduction
2.2 Operations of the Computer Hardware
2.3 Operands of the Computer Hardware
Memory Operands
Constant or Immediate Operands
2.4 Signed and Unsigned Numbers
Summary
2.5 Representing Instructions in the Computer
LEGv8 Fields
2.6 Logical Operations
2.7 Instructions for Making Decisions
Loops
Bounds Check Shortcut
Case/Switch Statement
2.8 Supporting Procedures in Computer Hardware
Using More Registers
Nested Procedures
Allocating Space for New Data on the Stack
Allocating Space for New Data on the Heap
2.9 Communicating with People
Characters and Strings in Java
2.10 LEGv8 Addressing for Wide Immediates and Addresses
Wide Immediate Operands
Addressing in Branches
LEGv8 Addressing Mode Summary
Decoding Machine Language
2.11 Parallelism and Instructions: Synchronization
2.12 Translating and Starting a Program
Compiler
Assembler
Linker
Loader
Dynamically Linked Libraries
Starting a Java Program
2.13 A C Sort Example to Put it All Together
The Procedure swap
Register Allocation for swap
Code for the Body of the Procedure swap
The Full swap Procedure
The Procedure sort
Register Allocation for sort
Code for the Body of the Procedure sort
The Procedure Call in sort
Passing Parameters in sort
Preserving Registers in sort
The Full Procedure sort
2.14 Arrays versus Pointers
Array Version of Clear
Pointer Version of Clear
Comparing the Two Versions of Clear
2.15 Advanced Material: Compiling C and Interpreting Java
Compiling C
The Front End
High-Level Optimizations
Local and Global Optimizations
Global Code Optimizations
Implementing Local Optimizations
Implementing Global Optimizations
Register Allocation
Code Generation
Optimization Summary
Interpreting Java
Interpretation
Compiling for Java
Invoking Methods in Java
A Sort Example in Java
2.16 Real Stuff: MIPS Instructions
2.17 Real Stuff: ARMv7 (32-bit) Instructions
2.18 Real Stuff: x86 Instructions
Evolution of the Intel x86
x86 Registers and Data Addressing Modes
x86 Integer Operations
x86 Instruction Encoding
x86 Conclusion
2.19 Real Stuff: The Rest of the ARMv8 Instruction set
Full ARMv8 Integer Arithmetic Logic Instructions
Full ARMv8 Integer Data Transfer Instructions
Full ARMv8 Branch Instructions
2.20 Fallacies and Pitfalls
2.21 Concluding Remarks
2.22 Historical Perspective and Further Reading
Accumulator Architectures
General-Purpose Register Architectures
Compact Code and Stack Architectures
High-Level-Language Computer Architectures
Reduced Instruction Set Computer Architectures
A Brief History of the ARMv7
A Brief History of the x86
A Brief History of Programming Languages
A Brief History of Compilers
Further Reading
2.23 Exercises
3 Arithmetic for Computers
3.1 Introduction
3.2 Addition and Subtraction
Summary
3.3 Multiplication
Sequential Version of the Multiplication Algorithm and Hardware
Signed Multiplication
Faster Multiplication
Multiply in LEGv8
Summary
3.4 Division
A Division Algorithm and Hardware
Signed Division
Faster Division
Divide in LEGv8
Summary
3.5 Floating Point
Floating-Point Representation
Exceptions and Interrupts
IEEE 754 Floating-Point Standard
Floating-Point Addition
Floating-Point Multiplication
Floating-Point Instructions in LEGv8
Accurate Arithmetic
Summary
3.6 Parallelism and Computer Arithmetic: Subword Parallelism
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86
3.8 Real Stuff: The Rest of the ARMv8 Arithmetic Instructions
Full ARMv8 Integer and Floating-point Arithmetic Instructions
Full ARMv8 SIMD Instructions
3.9 Going Faster: Subword Parallelism and Matrix Multiply
3.10 Fallacies and Pitfalls
3.11 Concluding Remarks
3.12 Historical Perspective and Further Reading
The First Dispute
Diversity versus Portability
A Backward Step
The People Who Built the Bombs
Making the World Safe for Floating Point, or Vice Versa
The First IEEE 754 Chips
IEEE 754 Today
Further Reading
3.13 Exercises
4 The Processor
4.1 Introduction
A Basic LEGv8 Implementation
An Overview of the Implementation
4.2 Logic Design Conventions
Clocking Methodology
4.3 Building a Datapath
Creating a Single Datapath
4.4 A Simple Implementation Scheme
The ALU Control
Designing the Main Control Unit
Operation of the Datapath
Finalizing Control
Why a Single-Cycle Implementation is not Used Today
4.5 An Overview of Pipelining
Designing Instruction Sets for Pipelining
Pipeline Hazards
Structural Hazard
Data Hazards
Control Hazards
Pipeline Overview Summary
4.6 Pipelined Datapath and Control
Graphically Representing Pipelines
Pipelined Control
4.7 Data Hazards: Forwarding versus Stalling
Data Hazards and Stalls
4.8 Control Hazards
Assume Branch Not Taken
Reducing the Delay of Branches
Dynamic Branch Prediction
Pipeline Summary
4.9 Exceptions
How Exceptions are Handled in the LEGv8 Architecture
Exceptions in a Pipelined Implementation
4.10 Parallelism via Instructions
The Concept of Speculation
Static Multiple Issue
An Example: Static Multiple Issue with the LEGv8 ISA
Dynamic Multiple-Issue Processors
Dynamic Pipeline Scheduling
Energy Efficiency and Advanced Pipelining
4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines
The ARM Cortex-A53
The Intel Core i7 920
Performance of the Intel Core i7 920
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply
4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe
Using Verilog for Behavioral Specification with Simulation for the Five-Stage Pipeline
Implementing Forwarding in Verilog
The Behavioral Verilog with Stall Detection
Implementing the Branch Hazard Logic in Verilog
Using Verilog for Behavioral Specification with Synthesis
More Illustrations of Instruction Execution on the Hardware
No Hazard Illustrations
More Examples
Forwarding Illustrations
Illustrating Pipelines with Stalls and Forwarding
4.14 Fallacies and Pitfalls
4.15 Concluding Remarks
4.16 Historical Perspective and Further Reading
Improving Pipelining Effectiveness and Adding Multiple Issue
Compiler Technology for Exploiting ILP
Further Reading
4.17 Exercises
5 Large and Fast: Exploiting Memory Hierarchy
5.1 Introduction
5.2 Memory Technologies
SRAM Technology
DRAM Technology
Flash Memory
Disk Memory
5.3 The Basics of Caches
Accessing a Cache
Handling Cache Misses
Handling Writes
An Example Cache: The Intrinsity FastMATH Processor
Summary
5.4 Measuring and Improving Cache Performance
Reducing Cache Misses by More Flexible Placement of Blocks
Locating a Block in the Cache
Choosing Which Block to Replace
Reducing the Miss Penalty Using Multilevel Caches
Software Optimization via Blocking
Summary
5.5 Dependable Memory Hierarchy
Defining Failure
The Hamming Single Error Correcting, Double Error Detecting Code (SEC/DED)
5.6 Virtual Machines
Requirements of a Virtual Machine Monitor
(Lack of) Instruction Set Architecture Support for Virtual Machines
Protection and Instruction Set Architecture
5.7 Virtual Memory
Placing a Page and Finding It Again
Page Faults
Virtual Memory for Large Virtual Addresses
What about Writes?
Making Address Translation Fast: the TLB
The Intrinsity FastMATH TLB
Integrating Virtual Memory, TLBs, and Caches
Implementing Protection with Virtual Memory
Handling TLB Misses and Page Faults
Summary
5.8 A Common Framework for Memory Hierarchy
Question 1: Where Can a Block Be Placed?
Question 2: How Is a Block Found?
Question 3: Which Block Should Be Replaced on a Cache Miss?
Question 4: What Happens on a Write?
The Three Cs: An Intuitive Model for Understanding the Behavior of Memory Hierarchies
5.9 Using a Finite-State Machine to Control a Simple Cache
A Simple Cache
Finite-State Machines
FSM for a Simple Cache Controller
5.10 Parallelism and Memory Hierarchy: Cache Coherence
Basic Schemes for Enforcing Coherence
Snooping Protocols
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks
No Redundancy (RAID 0)
Mirroring (RAID 1)
Error Detecting and Correcting Code (RAID 2)
Bit-Interleaved Parity (RAID 3)
Block-Interleaved Parity (RAID 4)
Distributed Block-Interleaved Parity (RAID 5)
P + Q Redundancy (RAID 6)
RAID Summary
5.12 Advanced Material: Implementing Cache Controllers
SystemVerilog of a Simple Cache Controller
Basic Coherent Cache Implementation Techniques
An Example Cache Coherency Protocol
Implementing Snoopy Cache Coherence
5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory Hierarchies
Performance of the Cortex-A53 and Core i7 Memory Hierarchies
5.14 Real Stuff: The Rest of the ARMv8 System and Special Instructions
5.15 Going Faster: Cache Blocking and Matrix Multiply
5.16 Fallacies and Pitfalls
5.17 Concluding Remarks
5.18 Historical Perspective and Further Reading
The Development of Memory Hierarchies
Disk Storage
A Very Brief History of Flash Memory
A Brief History of Databases
RAID
Protection Mechanisms
A Brief History of Modern Operating Systems
Further Reading
5.19 Exercises
6 Parallel Processors from Client to Cloud
6.1 Introduction
6.2 The Difficulty of Creating Parallel Processing Programs
6.3 SISD, MIMD, SIMD, SPMD, and Vector
SIMD in x86: Multimedia Extensions
Vector
Vector versus Scalar
Vector versus Multimedia Extensions
6.4 Hardware Multithreading
6.5 Multicore and Other Shared Memory Multiprocessors
6.6 Introduction to Graphics Processing Units
An Introduction to the NVIDIA GPU Architecture
NVIDIA GPU Memory Structures
Putting GPUs into Perspective
6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors
Warehouse-Scale Computers
6.8 Introduction to Multiprocessor Network Topologies
Implementing Network Topologies
6.9 Communicating to the Outside World: Cluster Networking
The Role of the Operating System in Networking
Improving Network Performance
6.10 Multiprocessor Benchmarks and Performance Models
Performance Models
The Roofline Model
Comparing Two Generations of Opterons
6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the NVIDIA Tesla GPU
6.12 Going Faster: Multiple Processors and Matrix Multiply
6.13 Fallacies and Pitfalls
6.14 Concluding Remarks
6.15 Historical Perspective and Further Reading
SIMD Computers: Attractive Idea, Many Attempts, No Lasting Successes
Multimedia Extensions as SIMD Extensions to Instruction Sets
Other Early Experiments
Great Debates in Parallel Processing
More Recent Advances and Developments
The Development of Bus-Based Coherent Multiprocessors
Toward Large-Scale Multiprocessors
Clusters
Recent Trends in Large-Scale Multiprocessors
Looking Further
Further Reading
References
6.16 Exercises
Appendix A. The Basics of Logic Design
A.1 Introduction
A.2 Gates, Truth Tables, and Logic Equations
Truth Tables
Boolean Algebra
Gates
A.3 Combinational Logic
Decoders
Multiplexors
Two-Level Logic and PLAs
ROMs
Don’t Cares
Arrays of Logic Elements
A.4 Using a Hardware Description Language
Datatypes and Operators in Verilog
Structure of a Verilog Program
Representing Complex Combinational Logic in Verilog
A.5 Constructing a Basic Arithmetic Logic Unit
A 1-Bit ALU
A 64-Bit ALU
Tailoring the 64-Bit ALU to LEGv8
Defining the LEGv8 ALU in Verilog
A.6 Faster Addition: Carry Lookahead
Fast Carry Using “Infinite” Hardware
Fast Carry Using the First Level of Abstraction: Propagate and Generate
Fast Carry Using the Second Level of Abstraction
Summary
A.7 Clocks
A.8 Memory Elements: Flip-Flops, Latches, and Registers
Flip-Flops and Latches
Register Files
Specifying Sequential Logic in Verilog
A.9 Memory Elements: SRAMs and DRAMs
SRAMs
DRAMs
Error Correction
A.10 Finite-State Machines
A.11 Timing Methodologies
Level-Sensitive Timing
Asynchronous Inputs and Synchronizers
A.12 Field Programmable Devices
A.13 Concluding Remarks
Further Reading
A.14 Exercises
Index
Appendix B. Graphics and Computing GPUs
B.1 Introduction
A Brief History of GPU Evolution
GPU Graphics Trends
Heterogeneous System
GPU Evolves into Scalable Parallel Processor
Why CUDA and GPU Computing?
GPU Unifes Graphics and Computing
GPU Visual Computing Applications
B.2 GPU System Architectures
Heterogeneous CPU–GPU System Architecture
The Historical PC (circa 1990)
Game Consoles
GPU Interfaces and Drivers
Graphics Logical Pipeline
Mapping Graphics Pipeline to Unified GPU Processors
Basic Unified GPU Architecture
Processor Array
B.3 Programming GPUs
Programming Real-Time Graphics
Logical Graphics Pipeline
Graphics Shader Programs
Pixel Shader Example
Programming Parallel Computing Applications
Data Parallel Problem Decomposition
Scalable Parallel Programming with CUDA
The CUDA Paradigm
Restrictions
Implications for Architecture
B.4 Multithreaded Multiprocessor Architecture
Massive Multithreading
Multiprocessor Architecture
Single-Instruction Multiple-Thread (SIMT)
SIMT Warp Execution and Divergence
Managing Threads and Thread Blocks
Thread Instructions
Instruction Set Architecture (ISA)
Memory Access Instructions
Barrier Synchronization for Thread Communication
Streaming Processor (SP)
Special Function Unit (SFU)
Comparing with Other Multiprocessors
Multithreaded Multiprocessor Conclusion
B.5 Parallel Memory System
DRAM Considerations
Caches
MMU
Memory Spaces
Global memory
Shared memory
Local Memory
Constant Memory
Texture Memory
Surfaces
Load/Store Access
ROP
B.6 Floating-point Arithmetic
Supported Formats
Basic Arithmetic
Specialized Arithmetic
Texture Operations
Performance
Double precision
B.7 Real Stuff: The NVIDIA GeForce 8800
Streaming Processor Array (SPA)
Texture/Processor Cluster (TPC)
Streaming Multiprocessor (SM)
Instruction Set
Streaming Processor (SP)
Special Function Unit (SFU)
Rasterization
Raster Operations Processor (ROP) and Memory System
Scalability
Performance
Dense Linear Algebra Performance
FFT Performance
Sorting Performance
B.8 Real Stuff: Mapping Applications to GPUs
Sparse Matrices
Caching in Shared Memory
Scan and Reduction
Radix Sort
N-Body Applications on a GPU
N-Body Mathematics
Optimization for GPU Execution
Using Shared Memory
Using Multiple Threads per Body
Performance Comparison
Results
B.9 Fallacies and Pitfalls
B.10 Concluding Remarks
Acknowledgments
B.11 Historical Perspective and Further Reading
Graphics Pipeline Evolution
Fixed-Function Graphics Pipelines
Evolution of Programmable Real-Time Graphics
Unified Graphics and Computing Processors
GPGPU: an Intermediate Step
GPU Computing
Scalable GPUs
Recent Developments
Future Trends
Further Reading
Appendix C. Mapping Control to Hardware
C.1 Introduction
C.2 Implementing Combinational Control Units
Mapping the ALU Control Function to Gates
Mapping the Main Control Function to Gates
C.3 Implementing Finite-State Machine Control
A ROM Implementation
A PLA Implementation
C.4 Implementing the Next-State Function with a Sequencer
Optimizing the Control Implementation
C.5 Translating a Microprogram to Hardware
Organizing the Control to Reduce the Logic
C.6 Concluding Remarks
C.7 Exercises
Appendix D. A Survey of RISC Architectures for Desktop, Server, and Embedded Computers
D.1 Introduction
D.2 Addressing Modes and Instruction Formats
D.3 Instructions: the MIPS Core Subset
MIPS Core Instructions
Compare and Conditional Branch
D.4 Instructions: Multimedia Extensions of the Desktop/Server RISCs
D.5 Instructions: Digital Signal-Processing Extensions of the Embedded RISCs
D.6 Instructions: Common Extensions to MIPS Core
D.7 Instructions Unique to MIPS-64
Nonaligned Data Transfers
Remaining Instructions
D.8 Instructions Unique to Alpha
Remaining Instructions
D.9 Instructions Unique to SPARC v9
Register Windows
Fast Traps
Support for LISP and Smalltalk
Overlapped Integer and Floating-Point Operations
Remaining Instructions
D.10 Instructions Unique to PowerPC
Branch Registers: Link and Counter
Remaining Instructions
D.11 Instructions Unique to PA-RISC 2.0
Nullification
A Cornucopia of Conditional Branches
Synthesized Multiply and Divide
Decimal Operations
Remaining Instructions
D.12 Instructions Unique to ARM
Remaining Instructions
D.13 Instructions Unique to Thumb
D.14 Instructions Unique to SuperH
D.15 Instructions Unique to M32R
D.16 Instructions Unique to MIPS-16
D.17 Concluding Remarks
Further Reading
Answers to Check Yourself
Glossary
Further Reading
Back Cover
People also search for Computer Organization and Design The Hardware Software Interface ARM:
basic computer organization and design
computer organization and design pdf
computer organization and design arm edition pdf
computer organization and design risc-v edition pdf
computer organization and design 6th edition
Reviews
There are no reviews yet.