The
processor
1,Computer Organization&DesignThe Hardware/Software Interface,2024/2/16,1,The processor,2,Chapter Four:The processor,4.1 Introduction 4.2 Logic Design Conventions4.3 Building a datapath4.4 A Simple Implementation Scheme4.5 An Overview of Pipelining4.6 Pipelined Datapath and Control 4.7 Data Hazards:Forwarding versus Stalling 4.8 Control Hazards 4.9 Exceptions 4.10 Parallelism and Advanced Instruction-Level Parallelism 4.11 Real Stuff:the AMD Opteron X4 Pipeline,2024/2/16,2,3,Well look at an implementation of the MIPSSimplified to contain only:memory-reference instructions:lw,sw arithmetic-logical instructions:add,sub,and,or,sltcontrol flow instructions:beq,jAn Overview of the implementationFor every instruction,the first two step are identicalFetch the instruction from the memoryDecode and read the registersNext steps depend on the instruction classMemory-reference Arithmetic-logicalbranches,4.1 Introduction,What are steps?How many FUN.?,2024/2/16,3,4,Were ready to look at an implementation of the MIPS instruction setSimplified to contain onlyarithmetic-logic instructions:add,sub,and,or,sltmemory-reference instructions:lw,sw control-flow instructions:beq,j,Implementing MIPS,5,High-level abstract view of fetch/execute implementationuse the program counter(PC)to read instruction addressfetch the instruction from memory and increment PCuse fields of the instruction to select registers to readexecute depending on the instructionrepeat,Implementing MIPS:the Fetch/Execute Cycle,6,An abstract view of the implementation of MIPS,2024/2/16,6,7,Overview:Processor Implementation Styles,Single Cycleperform each instruction in 1 clock cycleclock cycle must be long enough for slowest instruction;therefore,disadvantage:only as fast as slowest instructionMulti-Cyclebreak fetch/execute cycle into multiple stepsperform 1 step in each clock cycleadvantage:each instruction uses only as many cycles as it needsPipelinedexecute each instruction in multiple stepsperform 1 step/instruction in each clock cycleprocess multiple instructions in parallel assembly line,8,Chapter Four:The processor,4.1 Introduction 4.2 Logic Design Conventions4.3 Building a datapath4.4 A Simple Implementation Scheme4.5 An Overview of Pipelining4.6 Pipelined Datapath and Control 4.7 Data Hazards:Forwarding versus Stalling 4.8 Control Hazards 4.9 Exceptions 4.10 Parallelism and Advanced Instruction-Level Parallelism 4.11 Real Stuff:the AMD Opteron X4 Pipeline,2024/2/16,8,9,Two types of functional elements in the hardware:elements that operate on data(called combinational elements)elements that contain data(called state or sequential elements),Functional Elements,10,Combinational Elements,Works as an input output function,e.g.,ALUCombinational logic reads input data from one register and writes output data to another,or same,registerread/write happens in a single cycle combinational element cannot store data from one cycle to a future one,Combinational logic hardware units,11,State Elements,State elements contain data in internal storage,e.g.,registers and memoryAll state elements together define the state of the machineWhat does this mean?Think of shutting down and starting up againFlipflops and latches are 1-bit state elements,equivalently,they are 1-bit memoriesThe output(s)of a flipflop or latch always depends on the bit value stored,i.e.,its state,and can be called 1/0 or high/low or true/falseThe input to a flipflop or latch can change its state depending on whether it is clocked or not,12,Set-Reset(SR-)latch(unclocked),equivalently with nor gates,Think of Sbar as S,the inverse of set(whichsets Q to 1),and Rbar as R,the inverse of reset.,A set-reset latch made from two cross-couplednand gates is a basic memory unit.When both Sbar and Rbar are 1,then either one of the following two states is stable:Q=1&Qbar=0 Q=0&Qbar=1and the latch will continue in the current stable state.If Sbar changes to 0(while Rbar remains at 1),then the latch is forced to the exactly one possible stable state(a).If Rbar changes to 0(while Sbar remains at 1),the latch is forced to the exactly one possible stable state(b).So,the latch remembers which of Sbar or Rbar was last 0 during the time they are both 1.When both Sbar and Rbar are 0 the exactly onestable state is Q=Qbar=1.However,if after that both Sbar and Rbar return to 1,the latch mustthen jump non-deterministically to one of stable states(a)or(b),which is undesirable behavior.,13,Clocks are used in synchronous logic to determine when a state element is to be updated in level-triggered clocking methodology either the state changes only when the clock is high or only when it is low(technology-dependent)in edge-triggered clocking methodology either the rising edge or falling edge is active(depending on technology)i.e.,states change only on rising edges or only on falling edgeLatches are level-triggeredFlipflops are edge-triggered,Synchronous Logic:Clocked Latches and Flipflops,14,Clocked SR-latch,State can change only when clock is highPotential problem:both inputs Sbar=0&Rbar=0 will cause non-deterministic behavior,15,State can change only when clock is highOnly single data input(compare SR-latch)No problem with non-deterministic behavior,Clocked D-latch,Timing diagram of D-latch,16,Clocked D-flipflop,Negative edge-triggeredMade from three SR-latches,17,Registers are implemented with arrays of D-flipflops,State Elements on the Datapath:Register File,Register file with two read ports and one write port,18,Read,Register File-Built using D flip-flops,2024/2/16,18,19,Register File,write,32 bits,rd or rt 5 bits,Write signals,2024/2/16,19,20,Single-cycle Implementation of MIPS,Our first implementation of MIPS will use a single long clock cycle for every instructionEvery instruction begins on one up(or,down)clock edge and ends on the next up(or,down)clock edgeThis approach is not practical as it is much slower than a multicycle implementation where different instruction classes can take different numbers of cyclesin a single-cycle implementation every instruction must take the same amount of time as the slowest instructionin a multicycle implementation this problem is avoided by allowing quicker instructions to use fewer cyclesEven though the single-cycle approach is not practical it is simple and useful to understand firstNote:we shall implement jump at the very end,21,Datapath:Instruction Store/Fetch&PC Increment,Three elements used to store and fetch instructions andincrement the PC,Datapath,22,Animating the Datapath,Instruction-MEMPCPC-PC+4,23,Datapath:R-Type Instruction,Two elements used to implementR-type instructions,Datapath,24,Animating the Datapath,add rd,rs,rt,Rrd-Rrs+Rrt;,25,Datapath:Load/Store Instruction,Two additional elements usedTo implement load/stores,Datapath,26,Animating the Datapath,lw rt,offset(rs),Rrt-MEMRrs+s_extend(offset);,27,Animating the Datapath,sw rt,offset(rs),MEMRrs+sign_extend(offset)-Rrt,28,Datapath:Branch Instruction,Datapath,No shift hardware required:simply connect wires from input to output,each shiftedleft 2 bits,29,Animating the Datapath,beq rs,rt,offset,if(Rrs=Rrt)then PC-PC+4+s_extend(offset2),30,MIPS Datapath I:Single-Cycle,Input is either register(R-type)or sign-extendedlower half of instruction(load/store),Combining the datapaths for R-type instructions and load/stores using two multiplexors,Data is either from ALU(R-type)or memory(load),31,Animating the Datapath:R-type Instruction,add rd,rs,rt,32,Animating the Datapath:Load Instruction,lw rt,offset(rs),33,Animating the Datapath:Store Instruction,sw rt,offset(rs),34,MIPS Datapath II:Single-Cycle,Adding instruction fetch,Separate instruction memoryas instruction and data readoccur in the same clock cycle,Separate adder as ALU operations and PC increment occur in the same clock cycle,35,MIPS Datapath III:Single-Cycle,