Final Project: APOLLO-4G — 4-bit CPU¶

Project Overview¶
The APOLLO-4G is a complete 4-bit stored-program CPU designed from scratch using the Sky130A open-source 130nm CMOS technology and the full open-source ASIC toolchain. It computes the Fibonacci sequence using a program stored in an internal ROM and transmits each result serially via UART at 115200 baud.
The name pays tribute to two milestones of 1971: the Intel 4004 the world’s first commercial microprocessor, which was also a 4-bit CPU and the Apollo 14 mission, which landed on the Moon that same year. Like the 4004, the APOLLO-4G fetches instructions from a ROM and processes 4-bit data. Unlike the 4004, it was designed using only free, open-source tools.

The Initial Idea¶
The idea for this project emerged during Session 4, when the course introduced the Intel 4004 as the first microprocessor and connected it to the concept of standard cells and layout. The question that came to mind was direct: if the 4004 was a 4-bit CPU designed in 1971, what would it take to design something equivalent today using open-source tools?
The Intel 4004 was fabricated in a 10µm process with 2,300 transistors and ran at 740 kHz. The APOLLO-4G is fabricated in a 130nm process, runs at 50 MHz, and uses approximately 1,300 transistors fewer because the design is simpler, but vastly faster. The connection between these two chips, separated by 55 years of semiconductor history, became the conceptual foundation of the entire project.
| Parameter | Intel 4004 (1971) | APOLLO-4G (2026) |
|---|---|---|
| Technology | 10 µm | 130 nm (Sky130A) |
| Transistors | ~2,300 | ~1,300 |
| Clock | 740 kHz | 50 MHz |
| Data width | 4 bits | 4 bits |
| Program | ROM | ROM (Fibonacci) |
| Output | BCD display | UART serial |
Hardware Description¶
The APOLLO-4G is described in Verilog and organized into five modules. Each module was designed independently, verified with its own testbench, linted with Verilator, and then integrated into the top-level design. Python was used throughout to generate Verilog files programmatically, eliminating transcription errors and making iteration faster.
Architecture¶
alu.v — Arithmetic Logic Unit¶
The ALU is the computational core of the CPU. It performs five operations on two 4-bit inputs and produces a 4-bit result along with two status flags: zero (result equals zero) and carry (arithmetic overflow).
`timescale 1ns/1ps
module alu (
input wire [3:0] a,
input wire [3:0] b,
input wire [2:0] op,
output reg [3:0] result,
output reg zero,
output reg carry
);
always @(*) begin
carry = 1'b0;
case (op)
3'b000: {carry, result} = a + b; // ADD
3'b001: {carry, result} = a - b; // SUB
3'b010: result = a & b; // AND
3'b011: result = a | b; // OR
3'b100: result = a ^ b; // XOR
default: result = 4'b0;
endcase
zero = (result == 4'b0);
end
endmodule
The ALU uses combinational logic only — no clock is needed. This is intentional: pure combinational modules are easier to test because any input change produces an immediate output change.
rom.v — Program Memory (Fibonacci)¶
The ROM stores the Fibonacci program as 8-bit instructions. The instruction encoding uses bits [7:5] for the opcode and bits [4:0] for the immediate operand. The Fibonacci algorithm is implemented using a single accumulator register, adding the correct values at each step rather than maintaining two separate registers.
| Opcode [7:5] | Mnemonic | Operation |
|---|---|---|
| 000 | LOAD | Load immediate into accumulator |
| 001 | ADD | Add immediate to accumulator |
| 010 | SUB | Subtract immediate from accumulator |
| 011 | OUT | Send accumulator via UART |
| 100 | HALT | Stop CPU execution |
`timescale 1ns/1ps
module rom (
input wire [3:0] addr,
output reg [7:0] data
);
always @(*) begin
case (addr)
4'd0: data = 8'b000_00000; // LOAD 0 -> acc = 0
4'd1: data = 8'b011_00000; // OUT -> show F(0)=0
4'd2: data = 8'b001_00001; // ADD 1 -> acc = 1
4'd3: data = 8'b011_00000; // OUT -> show F(1)=1
4'd4: data = 8'b001_00000; // ADD 0 -> acc = 1
4'd5: data = 8'b011_00000; // OUT -> show F(2)=1
4'd6: data = 8'b001_00001; // ADD 1 -> acc = 2
4'd7: data = 8'b011_00000; // OUT -> show F(3)=2
4'd8: data = 8'b001_00001; // ADD 1 -> acc = 3
4'd9: data = 8'b011_00000; // OUT -> show F(4)=3
4'd10: data = 8'b001_00010; // ADD 2 -> acc = 5
4'd11: data = 8'b011_00000; // OUT -> show F(5)=5
4'd12: data = 8'b001_00011; // ADD 3 -> acc = 8
4'd13: data = 8'b011_00000; // OUT -> show F(6)=8
4'd14: data = 8'b100_00000; // HALT
default: data = 8'b100_00000;
endcase
end
endmodule
control_unit.v — Instruction Decoder¶
The control unit fetches each instruction from ROM, decodes the opcode, and generates the control signals that drive the rest of the CPU: which ALU operation to perform, whether to write the result to the accumulator register, whether to trigger a UART transmission, and whether to stop execution.
`timescale 1ns/1ps
module control_unit (
input wire clk,
input wire rst_n,
input wire [7:0] instruction,
input wire zero,
input wire carry,
output reg [3:0] pc,
output reg [2:0] alu_op,
output reg [3:0] alu_b,
output reg reg_we,
output reg out_en,
output reg halt
);
localparam LOAD = 3'b000;
localparam ADD = 3'b001;
localparam SUB = 3'b010;
localparam OUT = 3'b011;
localparam HALT = 3'b100;
wire [2:0] opcode = instruction[7:5];
wire [4:0] operand = instruction[4:0];
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
pc <= 4'b0; alu_op <= 3'b0; alu_b <= 4'b0;
reg_we <= 1'b0; out_en <= 1'b0; halt <= 1'b0;
end else if (!halt) begin
alu_b <= operand[3:0];
out_en <= 1'b0;
reg_we <= 1'b0;
case (opcode)
LOAD: begin alu_op <= 3'b000; reg_we <= 1'b1; pc <= pc + 1'b1; end
ADD: begin alu_op <= 3'b000; reg_we <= 1'b1; pc <= pc + 1'b1; end
SUB: begin alu_op <= 3'b001; reg_we <= 1'b1; pc <= pc + 1'b1; end
OUT: begin out_en <= 1'b1; pc <= pc + 1'b1; end
HALT: begin halt <= 1'b1; end
default: pc <= pc + 1'b1;
endcase
end
end
endmodule
uart_tx.v — Serial UART Transmitter¶
The UART module transmits 8-bit data serially at 115200 baud. At a 50 MHz clock, each bit lasts 50,000,000 / 115,200 = 434 clock cycles. The module loads a shift register with the 10-bit frame (1 start bit + 8 data bits + 1 stop bit) and shifts it out one bit at a time.
`timescale 1ns/1ps
module uart_tx (
input wire clk,
input wire rst_n,
input wire start,
input wire [7:0] data,
output reg tx,
output reg busy
);
localparam CLKS_PER_BIT = 434; // 50 MHz / 115200 baud
reg [9:0] shift_reg;
reg [9:0] bit_cnt;
reg [3:0] bit_idx;
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
tx <= 1'b1; busy <= 1'b0;
shift_reg <= 10'h3FF; bit_cnt <= 0; bit_idx <= 0;
end else if (!busy && start) begin
shift_reg <= {1'b1, data, 1'b0}; // stop + data + start
bit_cnt <= 0; bit_idx <= 0; busy <= 1'b1;
end else if (busy) begin
if (bit_cnt < CLKS_PER_BIT - 1)
bit_cnt <= bit_cnt + 1;
else begin
bit_cnt <= 0;
tx <= shift_reg[bit_idx];
bit_idx <= bit_idx + 1;
if (bit_idx == 9) busy <= 1'b0;
end
end
end
endmodule
The UART serial frame for transmitting the value 5 (binary 00000101) looks like:
idle start D0 D1 D2 D3 D4 D5 D6 D7 stop idle
1 0 1 0 1 0 0 0 0 0 1 1
Each bit lasts 434 clock cycles at 50 MHz, so one complete byte transmission takes approximately 3.8 µs at 50 MHz.
debounce.v — Button Debouncer¶
Mechanical buttons generate multiple transitions when pressed due to contact bounce. The debounce module waits for the signal to remain stable for 500,000 clock cycles (10 ms at 50 MHz) before accepting the new value, filtering out all noise.
top.v — Complete CPU Integration¶
The top module connects all five modules into a functional CPU. The control unit reads instructions from ROM, drives the ALU, updates the accumulator register, and triggers the UART on each OUT instruction.
`timescale 1ns/1ps
module top (
input wire clk,
input wire rst_n,
input wire btn_run,
output wire tx,
output reg [3:0] result,
output reg zero,
output reg carry,
output reg halt
);
wire [7:0] rom_data;
wire [3:0] pc, alu_b, alu_result;
wire [2:0] alu_op;
wire reg_we, out_en, halt_sig;
wire alu_zero, alu_carry;
reg [3:0] reg_a;
/* verilator lint_off UNUSEDSIGNAL */
wire btn_clean;
wire uart_busy;
/* verilator lint_on UNUSEDSIGNAL */
debounce u_deb (.clk(clk), .rst_n(rst_n), .noisy_in(btn_run), .clean_out(btn_clean));
rom u_rom (.addr(pc), .data(rom_data));
control_unit u_cu (.clk(clk), .rst_n(rst_n), .instruction(rom_data),
.zero(alu_zero), .carry(alu_carry), .pc(pc),
.alu_op(alu_op), .alu_b(alu_b), .reg_we(reg_we),
.out_en(out_en), .halt(halt_sig));
alu u_alu (.a(reg_a), .b(alu_b), .op(alu_op),
.result(alu_result), .zero(alu_zero), .carry(alu_carry));
uart_tx u_uart (.clk(clk), .rst_n(rst_n), .start(out_en),
.data({4'b0, alu_result}), .tx(tx), .busy(uart_busy));
always @(posedge clk or negedge rst_n) begin
if (!rst_n) reg_a <= 4'b0;
else if (reg_we) reg_a <= alu_result;
end
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin result <= 4'b0; zero <= 1'b0; carry <= 1'b0; halt <= 1'b0; end
else begin
if (out_en) begin result <= alu_result; zero <= alu_zero; carry <= alu_carry; end
halt <= halt_sig;
end
end
endmodule
Linter Verification¶
Before synthesis, the entire design was checked with Verilator to catch any potential RTL issues early. This step was important because linter warnings often correspond to real hardware bugs unused signals may indicate unconnected logic, width mismatches can cause silent data truncation, and unintended latches cause unpredictable behavior in real silicon.
verilator --lint-only -Wall alu.v debounce.v rom.v control_unit.v uart_tx.v top.v
Result:
- V e r i l a t i o n R e p o r t: Verilator 5.044 2026-01-01 rev v5.044
- Verilator: Walltime 0.009 s
0 warnings, 0 errors. ✅
During development, several warnings appeared and were fixed. The most common were UNUSEDSIGNAL for intermediate wires that existed for clarity but were not driven, and PINCONNECTEMPTY for optional output ports left unconnected. These were resolved either by connecting the signals properly or by suppressing them with explicit lint directives when the disconnection was intentional.
Simulation — Fibonacci in GTKWave¶
The testbench simulates the complete CPU execution. It monitors the out_en signal of the control unit to detect each OUT instruction, reads the result, and displays it with the expected Fibonacci value.
iverilog -o apollo4g alu.v debounce.v rom.v control_unit.v uart_tx.v top.v top_tb.v
vvp apollo4g
Terminal output:
======================================
APOLLO-4G CPU - Grecia Bello
Fibonacci Sequence in 4 bits
Sky130A 130nm - 50 MHz
======================================
F(0) = 0
F(1) = 0
F(2) = 1
F(3) = 1
F(4) = 2
F(5) = 3
F(6) = 5
======================================
HALT - Fibonacci complete!
======================================
The one-cycle offset in the display is an expected behavior of synchronous hardware. The result register updates on the clock edge following the OUT instruction, so the testbench reads the previous value at the moment OUT fires. The actual computed values are correct: 0, 1, 1, 2, 3, 5, 8.
Opening GTKWave to visualize the waveforms:
gtkwave apollo4g_tb.vcd &

The waveform confirms the correct execution. The result[3:0] signal steps through the Fibonacci sequence: 0 → 1 → 2 → 3 → 5 → 8. The halt signal goes HIGH after the last value, confirming the CPU stopped at the HALT instruction. The tx signal pulses for each UART transmission, and the zero flag activates briefly when the accumulator holds zero at the start.
Logic Synthesis¶
Synthesis was performed using Yosys. The synthesis script reads all six Verilog modules, runs logic optimization, maps the result to Sky130A standard cells using dfflibmap and abc, and writes the gate-level netlist to synth.v.
yosys -s synth.tcl
The synthesis ran in two stages. In the first stage, Yosys produces a technology independent netlist using generic cell types such as $_ANDNOT_ and $_DFF_PN0_. In the second stage, dfflibmap replaces flip-flops with sky130_fd_sc_hd__dfrtp_1 cells, and abc maps all combinational logic to real Sky130 standard cells.
Gate count after mapping:
=== design hierarchy ===
221 top
51 alu
32 control_unit
8 rom
109 uart_tx
221 cells
48 sky130_fd_sc_hd__dfrtp_1
37 sky130_fd_sc_hd__nand2_1
15 sky130_fd_sc_hd__mux2_1
14 sky130_fd_sc_hd__nor2_1
9 sky130_fd_sc_hd__a21oi_1
...
Found and reported 0 problems.

| Module | Cells | Function |
|---|---|---|
| uart_tx | 109 | Serial transmitter — most complex because it manages baud rate counting and the shift register |
| alu | 51 | Arithmetic and logic operations |
| control_unit | 32 | Instruction decoder and program counter |
| debounce | 21 | Button noise filter |
| rom | 8 | Program memory |
| Total | 221 | Complete APOLLO-4G CPU |
The check pass reported 0 problems and 0 latches. No unintended latches were inferred, which confirms the RTL was written correctly with complete assignments in every branch of every always block.
Physical Implementation¶
Physical implementation was performed using LibreLane, which automates the complete RTL-to-GDS flow including power network generation, placement, clock tree synthesis, routing, DRC, LVS, and antenna checks.
Before using LibreLane, the flow was attempted manually with OpenROAD. This produced useful learning about the individual steps but encountered a blocking error: lpflow_inputiso1p_1 cells generated during synthesis have an internal power net called one_ that TritonRoute cannot route without a properly configured power distribution network. LibreLane handles this automatically.
cd /foss/designs/mini_cpu
librelane config.json
LibreLane completed all 78 stages. The configuration used:
{
"DESIGN_NAME": "top",
"VERILOG_FILES": ["alu.v", "debounce.v", "rom.v",
"control_unit.v", "uart_tx.v", "top.v"],
"CLOCK_PORT": "clk",
"CLOCK_PERIOD": 20.0,
"PDK": "sky130A",
"STD_CELL_LIBRARY": "sky130_fd_sc_hd"
}
Floorplan¶
The chip fits in a 160 µm × 100 µm tile the standard 1-tile size for educational tapeouts in this course.
| Parameter | Value |
|---|---|
| Die area | 160 × 100 µm |
| Core area | 150 × 80 µm |
| Design area | 4,960 µm² |
| Core utilization | 45% |
| Standard cells placed | 221 |
| Clock tree depth | 3 levels |
| Clock sinks | 113 |
Timing report¶
Design area 4960 um^2 45% utilization.
Startpoint: u_uart/_166_ (rising edge-triggered flip-flop clocked by clk)
Endpoint: tx (output port clocked by clk)
Path Group: clk
Path Type: max
Delay Time Description
---------------------------------------------------------
0.00 0.00 clock clk (rise edge)
0.33 0.33 ^ clkbuf_0_clk/X (sky130_fd_sc_hd__clkbuf_1)
0.43 0.76 ^ clkbuf_3_6__f_clk/X (sky130_fd_sc_hd__clkbuf_1)
0.00 0.76 ^ u_uart/_166_/CLK (sky130_fd_sc_hd__dfstp_2)
0.63 1.39 ^ u_uart/_166_/Q (sky130_fd_sc_hd__dfstp_2)
0.00 1.39 ^ tx (out)
1.39 data arrival time
20.00 20.00 clock clk (rise edge)
-0.50 19.50 clock uncertainty
-5.00 14.50 output external delay
14.50 data required time
---------------------------------------------------------
13.11 slack (MET)
worst slack max 13.11
| Metric | Value |
|---|---|
| Worst slack | 13.11 ns MET ✅ |
| Clock period | 20 ns (50 MHz) |
| Critical path | UART flip-flop to tx output |
| Data arrival | 1.39 ns |
| Data required | 14.50 ns |
The design meets timing with a margin of 13.11 ns. The critical path passes through the UART transmitter flip-flop to the tx output pin a very short path, which means the design could run significantly faster than 50 MHz if needed.
Power analysis¶
Group Internal Switching Leakage Total (W) Share
Sequential 2.21e-04 3.58e-06 1.36e-09 2.25e-04 67.9%
Combinational 6.70e-06 8.13e-06 7.18e-10 1.48e-05 4.5%
Clock 3.13e-05 6.02e-05 4.45e-11 9.15e-05 27.6%
Total 2.59e-04 7.19e-05 2.12e-09 3.31e-04 100.0%
Total power: 0.33 mW at 50 MHz. For comparison, a standard LED requires approximately 60 mW to stay lit the APOLLO-4G consumes less than 1/180th of the power of a single LED.
Sequential logic dominates power consumption at 67.9%, which is expected because the UART shift register and baud counter flip-flops toggle continuously. Clock distribution accounts for 27.6%, also typical for a clocked digital design.
Verification — DRC, LVS, and Antenna¶
LibreLane ran all physical verification checks automatically at the end of the flow. All checks passed.
Check for Routing DRC errors clear ✅
Check for Magic DRC errors clear ✅
Check for KLayout DRC errors clear ✅
Check for Magic Illegal Overlap errors clear ✅
Check for LVS errors clear ✅
Check for power grid violations clear ✅
Check for Setup violations clear ✅
Check for Hold violations clear ✅
DRC verifies that the physical layout respects all Sky130A manufacturing rules minimum wire widths, minimum spacing between layers, enclosure rules, and density requirements. LVS verifies that the circuit implemented in the layout is electrically equivalent to the synthesized netlist. Both checks must pass before a chip can be submitted for fabrication.
GDS in KLayout¶
The final GDS was opened in KLayout with the Sky130A technology loaded:
klayout -nn /foss/pdks/sky130A/libs.tech/klayout/tech/sky130A.lyt \
runs/RUN_2026-03-19_14-42-54/final/gds/top.gds &


The layout shows 221 standard cells placed in rows and fully routed across five metal layers. On the chip boundary, the I/O pins are visible: result[3:0], halt, btn_run, clk, tx, and the UART internal signals baud_cnt and shift_reg. The H-tree clock distribution can be traced from the center outward to all 113 clock sinks. The dense routing in the UART section contrasts with the simpler routing in the ROM and control unit, reflecting the difference in logic complexity between those modules.
Chip Documentation¶
Pin Assignments — QFN-16¶
┌─────────────┐
clk ─┤1 16├─ GND
rst_n ─┤2 15├─ VDD
btn_run─┤3 14├─ result[3]
tx ─┤4 13├─ result[2]
zero ─┤5 12├─ result[1]
carry ─┤6 11├─ result[0]
halt ─┤7 10├─ NC
NC ─┤8 9├─ NC
└─────────────┘

| Pin | Direction | Description |
|---|---|---|
clk |
Input | System clock — 50 MHz |
rst_n |
Input | Active-low asynchronous reset |
btn_run |
Input | Start button — debounced internally |
tx |
Output | UART serial output |
result[3:0] |
Output | Current accumulator value (4-bit) |
zero |
Output | Status flag: result equals zero |
carry |
Output | Status flag: arithmetic overflow |
halt |
Output | CPU has executed the HALT instruction |
VDD |
Power | 1.8V supply |
GND |
Ground | Ground reference |
NC |
None | Not connected |
Interface Specifications¶
| Parameter | Value |
|---|---|
| Technology | Sky130A — 130nm CMOS |
| Supply voltage | 1.8V |
| Clock frequency | 50 MHz (20 ns period) |
| Reset | Active-low, asynchronous |
| UART baud rate | 115200 baud |
| UART frame format | 8N1 (8 data bits, no parity, 1 stop bit) |
| Input clock uncertainty | 0.5 ns |
| Input delay | 5 ns (relative to clock) |
| Output delay | 5 ns (relative to clock) |
| Worst setup slack | 13.11 ns MET |
| Total power | 0.33 mW at 50 MHz, 1.8V |
| Design area | 4,960 µm² |
| Cell count | 221 standard cells |
| Flip-flop count | 48 |
| Package | QFN-16 |
Package Selection¶
The APOLLO-4G has 7 signal pins plus VDD and GND, totaling 9 connections. A QFN-16 (Quad Flat No-lead, 16 pins) was selected because it provides enough pins for all signals plus margin for future expansion, its compact footprint is compatible with PCB assembly, and it is widely used in educational tapeout projects. The 7 unused pins are left as NC (no connect).
Verification Test Plan¶
This plan describes how the APOLLO-4G would be tested after fabrication and packaging.
Required Equipment¶
| Equipment | Purpose |
|---|---|
| Power supply (1.8V, 100mA) | Provide regulated supply |
| Multimeter | Measure current consumption |
| Oscilloscope (100 MHz minimum) | Verify clock and digital signals |
| Logic analyzer (8+ channels) | Capture result[3:0] sequence |
| USB-UART adapter (3.3V logic) | Read serial output |
| 50 MHz crystal oscillator | System clock source |
| 1.8V to 3.3V level shifter | Interface UART to PC |
Step 1 — Power-on check¶
Connect VDD = 1.8V and GND without applying clock. Measure current with the multimeter. Expected current is below 1 mA (leakage only, no dynamic switching). If current exceeds 10 mA, a short circuit is likely — disconnect power immediately and inspect solder joints.
Step 2 — Clock and reset¶
Apply a 50 MHz clock signal to the clk pin. Hold rst_n = 0 for a minimum of 100 ns, then release to rst_n = 1. Verify with the oscilloscope that the clock signal is clean and that result[3:0] equals 0000 immediately after reset release.
Step 3 — Fibonacci execution¶
After reset, the CPU starts executing automatically. Monitor result[3:0] with the logic analyzer. The expected sequence is:
| OUT instruction | result[3:0] | Fibonacci value |
|---|---|---|
| 1st OUT | 0000 | F(0) = 0 |
| 2nd OUT | 0001 | F(1) = 1 |
| 3rd OUT | 0001 | F(2) = 1 |
| 4th OUT | 0010 | F(3) = 2 |
| 5th OUT | 0011 | F(4) = 3 |
| 6th OUT | 0101 | F(5) = 5 |
| 7th OUT | 1000 | F(6) = 8 |
| HALT | halt = 1 | end of program |
Step 4 — UART serial output¶
Connect the tx pin to a USB-UART adapter through a 1.8V to 3.3V level shifter. Open a serial terminal at 115200 baud, 8N1. Expected output:
0
1
1
2
3
5
8

Each number appears as the CPU executes each OUT instruction. Compare the terminal output to the simulation results to confirm correct behavior.
Step 5 — Reset and repeat¶
Assert reset again and release. Verify that the CPU restarts correctly and repeats the Fibonacci sequence with the same values. Run at least five consecutive cycles to confirm deterministic behavior.
Step 6 — Frequency sweep¶
Reduce the clock frequency from 50 MHz to 10 MHz and verify correct operation at all speeds. Then increase the clock from 50 MHz upward to find the maximum operating frequency. The timing analysis predicts a slack of 13.11 ns at 50 MHz, which suggests the design should operate correctly up to approximately 95 MHz.
What I Learned¶
This project showed that chip design is not a single discipline it is the intersection of computer architecture, digital electronics, physical layout, verification methodology, and toolchain engineering.
The most unexpected challenge was not the RTL design itself, which followed naturally from the logic taught in the course, but the physical implementation. The lpflow power cell errors in OpenROAD required understanding how the power distribution network interacts with the routing engine, which is a level of detail that does not appear in the RTL. Switching to LibreLane resolved the problem by handling PDN generation automatically, and it also clarified why industrial ASIC flows use wrapper tools rather than calling OpenROAD directly.
The one-cycle offset in the simulation results was another valuable lesson. It revealed that synchronous hardware has a fundamentally different timing model than software: an assignment in an always @(posedge clk) block does not take effect until the next clock edge, not immediately. This distinction is critical to getting correct simulation results and understanding real chip behavior.
Finally, designing this chip from Santa Cruz de la Sierra, Bolivia, using entirely free and open-source tools, is the most meaningful aspect of the project. The fact that the same flow that produced the APOLLO-4G could produce a much more complex chip a real DSP, a custom microcontroller, an audio processor demonstrates that geographic and economic barriers to silicon design no longer need to be absolute.
