HO CHI MINH NATIONAL UNIVERSITY UNIVERSITY OF INFORMATION TECHNOLOGY COMPUTER ENGINEERING FACULTY VO HOANG NGUYEN TIN - 19522352 NGUYEN PHAN NHAT QUANG - 19522095 GRADUATION THESIS HIEN THUC MIPS 32-BIT SUPERSCALAR TREN FPGA IMPLEMENTATION OF MIPS 32-BIT SUPERSCALAR PROCESSOR ON FPGA ENGINEER OF COMPUTER ENGINEERING FACULTY TEACHER ADVISOR PhD. NGUYEN HOAINHAN HO CHI MINH CITY, 2023 ACKNOWLEDGMENTS First and foremost, we would like to express my sincere gratitude to my advisor, Dr. Nguyen Minh Son, for his dedicated guidance throughout the thesis process. He provided valuable assistance, equipment, and invested considerable time in reviewing and providing feedback, enabling me to complete the thesis in a successful manner.
In addition, we would like to extend my thanks to the faculty members, colleagues, and friends of the Computer Engineering Department, as well as the University of Information Technology - Vietnam National University Ho Chi Minh City, for their dedicated teaching, imparting invaluable knowledge, and supporting me throughout my academic journey at the University of Information Technology. The knowledge and experiences gained from them will serve as a solid foundation, instilling confidence as I embark on my future endeavors. Lastly, we would like to express my deep appreciation to my parents and family for their unwavering support, encouragement, and creating a strong emotional support system throughout my educational journey. During the course of conducting this thesis, I encountered inevitable difficulties and made mistakes due to certain limitations within my specialized field.
Therefore, I sincerely hope that the esteemed professors and colleagues will understand and provide constructive feedback to further develop and refine this thesis in the future. Once again, we would like to express my heartfelt gratitude. Ho Chi Minh City, June 29, 2023 The students Vo Hoang Nguyen Tin Nguyen Phan Nhat Quang TABLE OF CONTENTS Chapter 1. Compare CISC and RISC.
Development hiS(OTY. Executable instructions in the thesis. DESIGNING MIPS32 PROCESSOR. Overview in processor design.
Function blocks in processor. IP System cache. Program Counter DIOCK. Control Unit blocK.
Register High/Low bÌOCK. Sign Extend bBIOCK. Forwarding Unit blOCĂK. Hazard Unit block.
Resolving pipeline COnfÏiCS. Processor pipeline architecture. Load Data hazard. DESIGNING MIPS32 PROCESSOR WITH AXI4 PROTOCOL.
Package MIPS core with AXI4 protocol .ccccce cesses eseeeseseeeeneeeseeneeeseaee 3. Sàn HH gà 3. Specific signals of the AXI4 protocol in the thesis. Overview Simulation of five communication channels AX14.
Additional function BIOCK. CDMA Control block. BLOCK DESIGN ON VIVADO. Overview of Block Design architeCfUre.
Architecture design OV€TVI€W. Xilinx IPs used in Block Design 4. IP Memory Interface Generator (MIG 7 Series). IP Central Direct Memory Access (IP CDMA).
IP AXI Bram Controller. SIMULATION AND EVALUATTION.cceece cscs ee cece eseeceseseeesesesseeseseeeneeeseseseeneneas 52 5. ONS eR ee. CONCLUSION AND DEVELOPMENT.
Accumulated eXPeri€TIC€S. cà HH He. KIT FPGA XILINX VIRTEX-7 VC707. BO APPENDIX II.
GENERATE BITSTREAM STEPS. FIGURE CATALOG Figure 1.1: Processing stages Of DFOC€SSOT. ¿c5 tt St vườn 15 Figure 2. ST HH 0H 00g uy 16 Figure 2.3: Program Counter BÏoCK.--- + +5 s+ssxssseseeeeersrsrerrrrrrrrrrrrrrerrrrer TỶ Figure 2.4: Control Unit BIOCK.
55233 S2E2 tt ‡kExektrkekererrrkrrrrrrrrrrrrrree 18 Figure 2.5: Register File BÏOCK.6: Register High/Low Block Figure 2.8: Sign Extend Block.11 : MemLoad Processing BÌOCK.12: Forwarding Unit BlOCĂK. -- - «+ + 2 tS*EEEvEEvxekekrkrkekrkrkrkrkrkrkrkrke 28 Figure 2.13: Hazard Unit Block.14: Processor pinepline archit€CtU€.15: Forwarding data in pineline.---‹-ss+++s+ssexexerererrrrrrrrrrrrrerrrrrer OD) Figure 2.16: Forwarding for memory write inStFUCfÏON.17 : Hazard with data load instructions .18: Instruction to load data and calculate data .19: Branch instruction Occurs in pineline.-:‹¿- - + + 5++s+ss+>++s+£+£zxe 39 Figure 2.20: Branch instruction is at position B in the instruction pair Figure 2.21: The branch instruction needs data from the previous instruction.1: MIPS architecture after it’s packaged into an IP.2: General architecture of the AXI4 prOtOCOlÏ.---- 5-5-5 ++++++se+s+sxexexere 42 Figure 3.3: A read transaction of AXI4[8] protocol. ¿5-5252 5++cccc+scxzxsrerzee 43 Figure 3.4: A write transaction ghi of AXI4[8] protoeol.5: Simulate read channel AXI4 of prOC€SSOT.6: Simulate write channel AXI4 of processor Figure 3.7: Simulate write response channel AXI4 of processor.9: CDMA Controller BlOCK .1: Architecture design overview of Block Design .2: Block Design on Vivado SOWare.---- ¿St thư 49 Figure 4.3: IP Memory Interface Ge€In€ra(OT. -- ‹-¿- 5+ 5tr rệt 50 Figure 4.4: IP Central Direct Memory Access (CDMA).5: IP Bram Controller va IP Block Memory Generator.--- ¿5 + S212 2E E11 221212 11121 1 11 gioi DD Figure 5.2: Instruction in Initial.
--‹-¿- - 5< + kS‡+Ek#eEEEkEkEkEkck tr ưyn 53 Figure 5.3: Read request to Ï CaCHe.- - «¿6k + ke E1 HH 12 H1 11g vườn 53 Figure 5.4: I-CACHE requests a read transaction to load instruction in.5: Write value from AXI bus into I-CACHEE. + sssvcsrererere 54 +52 25+s*2*+*+ssscsxsxsxerrre 54 Figure 5.6: The execution process in processor [1] .7: Mult/Div block execute. cc eeeseseseseseseeeseseseseseeeeeeeeenecsessscscseseneneneeeeenseees 55 Figure 5.8: The execution process in processor [2]. ¿c5 5252 + sv£vrvrvrrrererere 56 Figure 5.9: Write value into Data Memory.10: Write value on AX4.11: The execution process in processor [3] .12: The execution process in processor [4].
¿+ 25+ 5+++‡cexss+zxscerczee 59 Figure 5.13: The execution process in processor [Š ]. ---- ¿5-5552 5ss++c++xszxerxrxe 60 Figure 5.14: Register File and High/Low Register results [ Ï].16: Comparing the simulation result with MARS [Í].17: Instructions from Main 1.18: Instructions in Sub Ï- Ì.19: Calculation result of the Sub Ï- Ì.--¿-¿-+++s+5++++s+>++xexesesx+xseerzxe 63 Figure 5.20: Calculation result of Subl-1 on MARS.21: Instructions in Sub Ï~2.22: Saved values in Data Memory on MARS.23: Saved values in Data Memory on Vivado [Ï],.24: Instructions in Sub1-3.25: Calculation result of the instruction in Sub ]-3.- - s5 + «+cec+zxe Figure 5.26: Calculation result of the instructions in Sub1-3 on MARS.27: Instructions in StuÐ Í ~4. + + + tt + +vrererersrererekrkrkrrrrrrrrrrrrrrrree Figure 5.28: Calculation result of the instructions in Sub Í-4.29: Calculation result of the instructions in Sub1-4 on MARS.30: Instructions in Main 2.31: Instructions in St22- Í. - + tk St St EkEvEEEeEekekekekskrkrkrkrkrrrrrrrrrrrre 69 Figure 5.32: Calculation result of the instructions in Sub2-1 .33: Calculation result of the instructions on MARS .34: Instructions in StIÐ2-2.- - + tt EvEkkerererererrskrkrrkrrrrrrrrrrrrre 70 Figure 5.35: Calculation result of the instructions in Sub2-2.36: Calculation result of the instructions on MARS .37: Results saved in register file after finishing the testing program.38: The result on MARS wo.39: The values in Data Memory on Vivado Figure 5.40: The values in Data Memory on MARS.-- - - tees eseeneneseeneneee 73 Figure 5.41: Resource ut[ÏiZatiOn.42: Resource utilization of each logic block in the prOC€SSOF.44: The clock speed in the block design.
- ¿+ 55+ S+sxexe+scexererkrke 75 Figure 5. -¿- ¿+5 + vs vxrrrrrrererererrrrrrrrrrrrrrrrrrrrrrrrrerroe TO Figure 5. óc SE ‡ksteerererererrrrrererrrrrrrrrerrrree TO Figure 5.47: Power summary from the old design.48: Utilization summary from the old đesign.- - - + - 55+ +£+£+zxe 78 Figure PI.1: KIT FPGA VC707[6] điagraim.- - - c- sSk‡ketEkrkekekrrrkererree 82 Figure PI.2: Structure of KIT FPGA VC707{6].1: Constraint file on ViVadO.-- ccsssteterirerrrrrrrrie ĐỘ, Figure PII.2: Modeling RTL code for Mux 2-to-1 64-bit on Vivado DANH MỤC TABLE Table 1.1: Register set in MIIPS. - set LO Table 1.2: Other TOQIStETS oo.
eeceees cece tees ee teee sees neaeseseeneeseeteaeetseseeeseeeeseseseceae LL Table 1.3: Instruction format structure Riu. cece - ‹-¿- 5e 5S xE‡+EekeEkErkEketrkekererkrke 11 Table 1.4: Instruction format structure Í.5: Instruction format structure J.6: Groups of processing instFUC{ÏOINS.1: Signal s of block .2: Signals of Program Counter BÏocK.3: Signal s Of Control Unit BIOCK .-¿- 5c 5+cc+ceeeeesseeeereeeeeeseeeeee-e 18 Table 2.4: The bit encoding convention for ALUControl.5: Branc Ins bit encoding CONVENTION .-- 555 ‡xk‡EErkekerrrkekerrrrke 20 Table 2.6: Signals of Register File BIOCK .7: Signal s of Register High/Low BlOCK. - 5c tt svrvrererererersrsrsvee 23 Table 2.8: Signa s of Comparator Block.9: Signals of Sign Extend Block.10: Signal of ALU BLOCK .11: Signals of Mult/Div blOCK. - 65c sv£vrererereeerstseseererreerervve 20 Table 2.12: Signals of MemLoad processing block.13: Signals of Forwarding Unit.14: Signals of Hazard Unit BÏOC.- - 56 2E E119 1 2 1E ke rriereeree 32 Table 2.15: Overview of conflict handling in pineline.
- --- -- 55+ + <++s+eex+exzeess 34 Table 3.1: Signals of AXI⁄4 Ðus.- 5c SG S2 1211351131153 911 1111111111 91 HH nghe 44 Table 3.3: Signal of COMA Controller BÏOCK.1: Instruction computing result [ Ï]|.- -- <6 55 +5 **+*++£E++eE+eeEeeereeerseeeeerse 55 Table 5.2: Instruction computing result [2]. <6 5 + 3x E33 E*kE£skEeskreserseesserre 56 Table 5.3: Instruction computing result [3] .4: Instruction computing result [4]. eee ceeeececeeeceeeeeeecesececeeeeseeeseeeeeeeeeteees 60 Table 5.5: Instruction computing result [S]|. << + x13 E3 Eskkskeeserseesserre 61 Table PI.1: Describe the location of components on the FPGA VC707 KIT.
83 ABBREVIATION GLOSSARY Abbreviations Expanded form Description IP Intellectual Property Xilinx's intellectual property core is used in the thesis Internet of Thing Connection of many electricalthings together to build a system useful for life Advanced eXtensible Interface Advanced extensible Interface protocol Complex Instruction Set Computer architecture with Complex Computer Instruction Set Reduced Instruction Set Computer architecture with Simplified Computer Instruction Set Advanced RISC Machines Machine Learning Developed from RISC Microprocessor without RISC Instruction Set Architecture Interlocked Pipeline Stages Developed by MIPS Technologies Random Access Memory Random access memory used in thesis Central Processing Unit Central processing unit using MIPS superscalar architecture Large-scale integrated circuits using Field Programmable Gate user-programmable logical element array structures Arithmetic Logic Unit Arithmetic logic unit using in the thesis Phase Locked Loop Closed-Loop frequency control system Register Transfer Level Register transfer level using to develop MIPS core Central Direct Memory Access Intellectual property of Xilinx support transfer data Hybrid mode clock management Mixed-Mode Clock Manager controller THESIS ABSTRACT The thesis topic includes two main contents revolving around the research of designing a local memory for MIPS processor inherited from previous thesis and packaging the processor as an IP following the AXI4 communication standard. The first content aims to study the design of a Superscalar MIPS processor. The processor consists of two ALU blocks to handle integer-related instructions (signed and unsigned), along with an additional Multiply/Divide block to perform integer multiplication and division operations. Additionally, the IP System Cache provided by Xilinx will be utilized as the Instruction Memory (I-Cache) and Data Memory (D- Cache).
The second content of the thesis is about packaging the MIPS processor. Most IPs in the Vivado software communicate with each other through the AXI4 or AXI3 bus. In this thesis, the MIPS processor will be packaged following the AXI4 communication standard. Subsequently, the MIPS processor will be connected to other IPs in the Vivado Block Design, establishing the interconnection between the two contents.
In the Block Design, the MIPS processor acts as a Master, requesting data read from Slaves through the AXI4 bus, performing computations, and storing the values in the Data Memory. It can also send the computed values back to the Slaves or output them to the UART. Through these two contents, I hope to contribute to the dissemination of the benefits of MIPS processors and propose an approach for designing a MIPS processor specifically, as well as the design of other processors in general and contribute to the development of integrated circuits in Vietnam. INTRODUCTION Entering the modern era of the 2lst century, wireless communication technology is considered a leading trend in the IoT era and a driving force behind the development of numerous useful IoT applications.
In addition, the rapid urbanization and a large number of IoT device users in modern life have created a tremendous demand for powerful processors. However, the high cost of complex processor fabrication, coupled with the challenges posed by material and labor shortages due to the ongoing COVID-19 pandemic, further emphasizes the importance of processors in IoT devices. From the aforementioned situation, we can see the necessity of optimizing processors. One of the proposed solutions is designing processors specifically tailored to IoT devices.
A processor with a simplified instruction set can reduce the number of logic gates in the design, resulting in lower power consumption for the processor. As a result, manufacturers can adjust the cost structure more reasonably, leading to reduced costs for end-users.