Case Study // 004

CHARIS RISC Processor Design — From Single-Cycle to Pipeline

Course Computer Organization Status Completed — 3 Phases

Project Overview

This project, developed across the Spring 2024–2025 semester for the Computer Organization (HRY 302) course at the Technical University of Crete, involved the full design and implementation of a RISC processor in VHDL. The processor is based on the CHARIS (CHAnia Risc Instruction Set) architecture — a MIPS-variant instruction set designed specifically for this course.

The project progressed through three distinct phases: a single-cycle processor, a multi-cycle processor with FSM-based control, and finally a pipelined processor with data hazard handling. Each phase built upon the previous one, requiring increasingly sophisticated design decisions around datapath organization, control signal generation, and timing.

Architecture

32-bit RISC (CHARIS ISA)

32 registers (R0 hardwired to zero), 32-bit instruction width, two instruction formats (R-type and I-type).

Instruction Set

20 Instructions

ALU operations (add, sub, and, or, not, shifts, rotates), immediates (li, lui, addi, andi, ori), branches (b, beq, bne), and memory (lb, sb, lw, sw).

Memory

1024 x 32-bit

Separate instruction ROM and data RAM (1024 entries each), byte-addressable with 4-byte alignment. Generated via Xilinx Core Generator.

HDL & Tools

VHDL / Xilinx ISE

Full structural and behavioral VHDL design with simulation and verification in the Xilinx ISE environment.

Phase 1 — Single-Cycle Processor

The first phase involved designing the fundamental building blocks of the processor: a 32-bit ALU supporting 11 operations (addition, subtraction, AND, OR, NOT, arithmetic/logical shifts, and rotates), a 32-register file with two read ports and one write port (with R0 hardwired to zero via a compare module), and the four main pipeline stages as standalone units.

The Instruction Fetch (IF) stage includes the PC register, instruction memory, a PC+4 incrementor, and a branch target adder with MUX selection. The Decode stage handles register file access, immediate extension (sign-extend or zero-fill with optional left shift), and write-back data selection. The Execute stage wraps the ALU with an input MUX for immediate operands. The Memory stage provides read/write access to a 1024x32 RAM. All four stages were then connected into a unified single-cycle datapath with a combinational control unit.

Phase 2 — Multi-Cycle Processor

The second phase converted the single-cycle design into a multi-cycle processor. Inter-stage registers were added to hold intermediate values between clock cycles, allowing each instruction to execute over multiple cycles rather than completing in one. This reduced the critical path and enabled resource sharing between stages.

The control unit was redesigned as a Finite State Machine (FSM) that generates the appropriate control signals for each stage at each clock cycle. The FSM takes the instruction opcode (and optionally flags like Zero) as input and sequences through the required states for each instruction type — fetch, decode, execute, memory access, and write-back.

Design Note:

A key challenge was ensuring that signals produced in one stage and needed in a later stage were properly registered. The NOP instruction ("00000000...") was handled by the control unit as a no-operation, with no control signals asserted.

Phase 3 — Pipelined Processor

The final phase transformed the multi-cycle processor into a pipelined architecture. Pipeline registers were inserted between all stages so that multiple instructions could be in flight simultaneously — one per stage. This significantly increases throughput, as a new instruction can be fetched every clock cycle once the pipeline is full.

The primary challenge in this phase was handling data hazards: situations where an instruction depends on the result of a preceding instruction that hasn't completed yet. Two mechanisms were implemented to resolve these hazards: forwarding (bypassing the result from a later stage back to an earlier stage) and stalling (inserting pipeline bubbles when forwarding alone cannot resolve the dependency, such as load-use hazards).

The control unit was adapted to propagate control signals through the pipeline stages alongside the data, and additional hazard detection logic was added to manage forwarding paths and stall conditions. Correctness was verified by running test programs containing sequences of li, lw, sw, and add instructions with deliberate data dependencies.