A hardware-accelerated implementation of the Sobel edge detection algorithm in synthesizable Verilog. The design processes 24-bit RGB images using a 3×3 sliding window approach, computing horizontal and vertical intensity gradients in parallel before thresholding to produce a binary edge map.
- Overview
- Theory
- Architecture
- Project Structure
- Module Reference
- FSM State Machine
- Data Flow
- Prerequisites
- Usage
- Simulation Scripts
- Python Utilities
- Parameters
- Testbenches
The Sobel operator is a classical first-order image gradient method used for edge detection. This project implements the full pipeline in RTL Verilog, suitable for simulation and FPGA synthesis:
- A Python preprocessing script converts a PNG image into binary pixel data consumed by the simulator.
- The Verilog pipeline reads 3×3 sliding windows, computes horizontal (Gx) and vertical (Gy) gradients simultaneously, calculates the gradient magnitude via integer square root, applies adaptive thresholding, and writes the edge map to a binary output file.
- A Python postprocessing script reconstructs the edge-detected PNG from the simulator output.
Key design properties:
- Fully synthesizable RTL (no floating-point)
- Parameterizable data widths and image dimensions
- Dual-instance convolution for Gx/Gy parallelism
- Non-restoring integer square root
- Adaptive per-window thresholding
The Sobel operator applies two 3×3 convolution kernels to approximate the image gradient:
Horizontal kernel (Gx) — detects vertical edges:
┌──────────────────┐
│ -1 0 +1 │
│ -2 0 +2 │
│ -1 0 +1 │
└──────────────────┘
Vertical kernel (Gy) — detects horizontal edges:
┌──────────────────┐
│ -1 -2 -1 │
│ 0 0 0 │
│ +1 +2 +1 │
└──────────────────┘
For a 3×3 pixel neighbourhood with rows labelled top (+), middle (0), and bottom (−):
Gx = (top_right + 2·top_mid + top_left) − (bot_right + 2·bot_mid + bot_left)
Gy = (top_left + 2·top_mid + top_right) − (bot_left + 2·bot_mid + bot_right)
Magnitude = sqrt(Gx² + Gy²)
Multiplication by 2 is implemented as a single left-shift (<< 1), avoiding a multiplier.
After processing all windows in a block, the mean magnitude is computed. Pixels with magnitude above the mean are marked black (edge), all others are marked white (background).
┌──────────────────────────────────────────────────┐
│ sobel_top.v │
│ │
output_pixels.txt │ ┌─────────────┐ valid_data │
─────────────────►│ │ ├────────────────────┐ │
│ │ │ in_p{1a,2,1b}_x │ │
│ │ controller ├──────────────────►─┤ │
│ │ .v │ in_m{1a,2,1b}_x │ │
│ │ (FSM) │ │ │
│ │ │ in_p{1a,2,1b}_y ▼ │
│ │ ├──────────────────► sobel_inst_x │ ──► Gx (29b)
binary_output.txt │ │ │ in_m{1a,2,1b}_y │
◄─────────────────┤ │ │ sobel_inst_y │ ──► Gy (29b)
│ │ │◄─────────────────── conv_ready │
│ │ │◄─────────────────── data_out │
│ │ │ │
│ │ │ sqrt_num (60b) │
│ │ ├──────────────────► sqrt_inst │
│ │ │◄─────────────────── sqrt_sq(30b)│
│ └─────────────┘ │
└──────────────────────────────────────────────────┘
Sobel_edge_detection_verilog/
│
├── Verilog/ # RTL source files
│ ├── sobel_top.v # Top-level integration module
│ ├── controller.v # FSM controller (file I/O, pipeline control)
│ ├── sobel_matrix_conv.v # Sobel convolution kernel (instantiated twice)
│ └── sqrt.v # Non-restoring integer square root
│
├── tb/ # Testbenches
│ ├── sobel_top_tb.v # Integration testbench
│ ├── sobel_matrix_conv_tb.v # Convolution unit testbench (9 test cases)
│ └── sqrt_tb.v # Square root module testbench
│
├── sim/ # Questa simulation scripts (TCL .do files)
│ ├── run_GUI.do # Interactive waveform simulation
│ ├── run_batch.do # Headless batch simulation
│ └── run_sqrt.do # Isolated sqrt module test
│
├── Python/ # Image I/O utilities
│ ├── to_binary2.py # PNG → binary sliding-window text file
│ └── to_image2.py # Binary text file → PNG reconstruction
│
├── sobel_top.md # Auto-generated sobel_top module reference
├── sobel_top.svg # sobel_top block diagram
├── controller.md # Auto-generated controller module reference
├── controller.svg # controller block diagram
├── fsm_controller_00.svg # FSM state diagram
└── README.md # This file
Role: Top-level wrapper. Instantiates all sub-modules and connects internal wires.
Ports:
| Port | Direction | Width | Description |
|---|---|---|---|
clk |
input | 1 | System clock |
reset |
input | 1 | Active-high synchronous reset |
done |
output | 1 | Asserted when processing complete |
Parameters:
| Parameter | Default | Description |
|---|---|---|
data_size |
24 | Pixel width in bits (8R + 8G + 8B) |
window_count |
9 | Number of 3×3 windows to process (set per image) |
conv_out_data_size |
29 | Convolution result width (fixed) |
sqrt_in_data_size |
60 | Input width to sqrt: holds Gx² + Gy² (fixed) |
sqrt_out_data_size |
30 | Output width from sqrt (= sqrt_in_data_size / 2) |
Sub-module instantiations:
| Instance | Module | Purpose |
|---|---|---|
controller_inst |
controller |
FSM, file I/O, pipeline control |
sobel_inst_x |
sobel_matrix_conv |
Compute horizontal gradient Gx |
sobel_inst_y |
sobel_matrix_conv |
Compute vertical gradient Gy |
sqrt_inst |
sqrt |
Compute √(Gx² + Gy²) |
See
sobel_top.mdandsobel_top.svgfor the auto-generated port diagram.
Role: FSM-based controller. Drives the entire processing pipeline — it opens input/output files, fills the sliding-window buffer, coordinates the convolution modules, collects square-root results, performs thresholding, and writes the output.
Ports:
| Port | Direction | Width | Description |
|---|---|---|---|
clk |
input | 1 | System clock |
reset |
input | 1 | Active-high reset |
valid_data |
output | 1 | Signals valid pixel data to convolution modules |
in_p1a_x … in_m1b_x |
output | data_size |
Pixel inputs for Gx convolution (6 signals) |
in_p1a_y … in_m1b_y |
output | data_size |
Pixel inputs for Gy convolution (6 signals) |
edge_data_in_x |
input | conv_out_data_size |
Gx result from sobel_inst_x |
edge_data_in_y |
input | conv_out_data_size |
Gy result from sobel_inst_y |
conv_ready_x |
input | 1 | Gx result valid flag |
conv_ready_y |
input | 1 | Gy result valid flag |
sqrt_num |
output | sqrt_in_data_size |
|
sqrt_sq |
input | sqrt_out_data_size |
√( |
done |
output | 1 | High when all windows are processed |
Internal signals:
| Signal | Type | Description |
|---|---|---|
state |
reg [3:0] |
Current FSM state |
buffer[0:8] |
reg [data_size-1:0] |
3×3 sliding window pixel buffer |
window_index |
integer |
Index of the window currently being processed |
delayed_window_index |
integer |
One-cycle delayed index for pipelined reads |
Ix[0:window_count-1] |
reg [conv_out_data_size-1:0] |
Stored Gx values for all windows |
Iy[0:window_count-1] |
reg [conv_out_data_size-1:0] |
Stored Gy values for all windows |
sq[0:window_count-1] |
reg [sqrt_out_data_size-1:0] |
Stored magnitude values |
sum |
reg [sqrt_out_data_size+window_count-1:0] |
Accumulated magnitude sum for thresholding |
Threshold |
integer |
Mean magnitude = sum / window_count |
thrs[0:window_count-1] |
reg [data_size-1:0] |
Thresholded output pixels (black or white) |
Helper functions:
| Function | Signature | Description |
|---|---|---|
abs |
abs(input [conv_out_data_size-1:0] x) |
Two's-complement absolute value |
rounded_division |
rounded_division(numerator, denominator) |
Integer division with rounding |
See
controller.md,controller.svg, andfsm_controller_00.svgfor diagrams.
Role: Computes one Sobel gradient (Gx or Gy) for a 3×3 window in a single clock cycle. Two instances run in parallel inside sobel_top.
The convolution formula maps directly to the kernel weights:
result = (in_p1a + 2·in_p2 + in_p1b) − (in_m1a + 2·in_m2 + in_m1b)
The p prefix denotes the positive row (+1/+2 weight), the m prefix the negative row (−1/−2 weight). Multiplication by 2 uses a left-shift.
Ports:
| Port | Direction | Width | Description |
|---|---|---|---|
clk |
input | 1 | System clock |
valid_data |
input | 1 | Input data is valid this cycle |
in_p1a |
input | data_size |
Positive-weight pixel A |
in_p2 |
input | data_size |
Positive-weight pixel B (×2) |
in_p1b |
input | data_size |
Positive-weight pixel C |
in_m1a |
input | data_size |
Negative-weight pixel A |
in_m2 |
input | data_size |
Negative-weight pixel B (×2) |
in_m1b |
input | data_size |
Negative-weight pixel C |
conv_ready |
output | 1 | Output is valid on this cycle |
data_out |
output | conv_out_data_size |
Signed convolution result |
Pixel mapping for Gx (horizontal gradient):
Window layout: Buffer index mapping:
┌───┬───┬───┐ buffer[0] buffer[1] buffer[2]
│ 0 │ 1 │ 2 │ → buffer[3] buffer[4] buffer[5]
├───┼───┼───┤ buffer[6] buffer[7] buffer[8]
│ 3 │ 4 │ 5 │
├───┼───┼───┤
│ 6 │ 7 │ 8 │
└───┴───┴───┘
Gx positive row: buffer[2], buffer[5], buffer[8] (right column)
Gx negative row: buffer[0], buffer[3], buffer[6] (left column)
Gy positive row: buffer[0], buffer[1], buffer[2] (top row)
Gy negative row: buffer[6], buffer[7], buffer[8] (bottom row)
Role: Computes the integer square root using the non-restoring algorithm — a digit-by-digit binary method that processes 2 input bits per iteration without any multiplier. The module is fully combinatorial (always @(*)).
Ports:
| Port | Direction | Width | Description |
|---|---|---|---|
num |
input | WIDTH |
Input value (default 32-bit) |
sqr |
output | OUT_WIDTH |
Integer square root result |
Parameters:
| Parameter | Default | Description |
|---|---|---|
WIDTH |
32 | Input bit width |
OUT_WIDTH |
16 | Output bit width (= WIDTH / 2) |
In the top-level context: WIDTH = 60, OUT_WIDTH = 30.
The controller module implements a 7-state Mealy FSM:
reset
│
▼
┌─────────────┐
│ IDLE (0) │ Opens input_file & output_file
└──────┬───────┘
│ files OK
▼
┌─────────────┐
┌──│ READ (1) │ Reads 9 pixels (one 3×3 window) from input file
│ └──────┬───────┘
│ │ always
│ ▼
│ ┌─────────────┐
│ │ CONV (2) │ Sets valid_data=1; routes pixels to Gx/Gy convolution modules
│ │ │ Captures Ix[i], Iy[i] when conv_ready asserts
│ └──────┬───────┘
│ │ window_index < window_count-1
└─────────┘ (loop back to READ for next window)
│ window_index == window_count-1
▼
┌─────────────┐
│ SQRT (3) │ Iterates over all windows:
│ │ sqrt_num = |Ix[i]|² + |Iy[i]|²
│ │ Captures sq[i] = sqrt_sq; accumulates sum
└──────┬───────┘
│ all windows done
▼
┌─────────────┐
│ MAG (4) │ Computes Threshold = sum / window_count
│ │ thrs[i] = sq[i] > Threshold ? BLACK : WHITE
└──────┬───────┘
│ all windows done
▼
┌─────────────┐
│ WRTE (5) │ Writes each thrs[i] to output file as 24-bit binary
└──────┬───────┘
│ all windows done
▼
┌─────────────┐
│ DONE (6) │ Closes files; asserts done=1
└─────────────┘
See
fsm_controller_00.svgfor the rendered state diagram.
Input PNG
│
│ Python: to_binary2.py
▼
output_pixels.txt
(24-bit binary RGB per pixel,
9 pixels per 3×3 window block)
│
│ Verilog Simulation
▼
┌──────────────────────────────────────────────────────┐
│ For each 3×3 window: │
│ │
│ [READ] Load 9 pixels into buffer[0:8] │
│ │
│ [CONV] sobel_inst_x: Gx = right_col - left_col │
│ sobel_inst_y: Gy = top_row - bottom_row │
│ (both compute in the same clock cycle) │
│ │
│ [SQRT] sqrt_num = |Gx|² + |Gy|² │
│ sq[i] = √(sqrt_num) (30b) │
│ sum += sq[i] │
│ │
│ [MAG] Threshold = sum / window_count │
│ pixel[i] = sq[i] > Threshold │
│ ? 0x000000 (black / edge) │
│ : 0xFFFFFF (white / background)│
│ │
│ [WRTE] Write pixel[i] as 24-bit binary to file │
└──────────────────────────────────────────────────────┘
│
│ Python: to_image2.py
▼
binary_output.txt → output_image.png
| Tool / Library | Version | Purpose |
|---|---|---|
| Siemens Questa | 2021.1+ | Verilog simulation |
| Python | 3.8+ | Image pre/post processing |
Pillow (PIL) |
any recent | Image I/O in Python |
| NumPy | any recent | Pixel array manipulation |
Install Python dependencies:
pip install pillow numpyPlace your source image (e.g. test.png) in the Python/ directory, then run:
cd Python/
python to_binary2.pyThis generates output_pixels.txt containing the 3×3 sliding-window pixel data. Copy it to the directory from which you will run the simulation (i.e. alongside the Verilog files or the .do script working directory):
cp Python/output_pixels.txt Verilog/Note: The
window_countparameter in the testbench must match the number of 3×3 windows in your image. For an image of size W×H,window_count = (W-2) × (H-2). The default testbench value49729corresponds to a 225×225 image.
Navigate to the Verilog/ directory and launch Questa with one of the simulation scripts:
Interactive GUI (waveform viewer):
cd Verilog/
vsim -do ../sim/run_GUI.doBatch/headless mode:
cd Verilog/
vsim -do ../sim/run_batch.doSquare root module only:
cd Verilog/
vsim -do ../sim/run_sqrt.doAfter simulation completes, binary_output.txt will be written to the working directory.
cp binary_output.txt Python/
cd Python/
python to_image2.pyThe edge-detected image is saved as output_image.png.
Located in sim/:
| Script | Mode | Runtime | Description |
|---|---|---|---|
run_GUI.do |
Interactive | 5 ms | Opens Questa GUI, loads all signals into wave viewer |
run_batch.do |
Headless | 20 ms | Non-interactive run, suitable for scripted testing |
run_sqrt.do |
Headless | 100 ns | Isolated test of the sqrt module only |
All scripts compile source files in dependency order:
vlog sobel_matrix_conv.v sqrt.v controller.v sobel_top.v sobel_top_tb.vConverts an RGB PNG image into a text file of binary pixel data, structured as overlapping 3×3 sliding windows.
Output format (output_pixels.txt):
Line 1: <width> <height>
Lines 2–end: <24-bit binary RGB> (9 lines per window, row-major)
Each window block contains:
buffer[0] buffer[1] buffer[2] ← top row
buffer[3] buffer[4] buffer[5] ← middle row
buffer[6] buffer[7] buffer[8] ← bottom row
Reads binary_output.txt produced by the Verilog simulation and reconstructs a PNG image. Each line is a 24-bit binary string decoded as R[7:0], G[7:0], B[7:0].
The table below lists all top-level parameters and where to change them for a new image size:
| Parameter | File | Default | Notes |
|---|---|---|---|
data_size |
sobel_top.v |
24 | Fixed for 24-bit RGB; do not change |
window_count |
sobel_top_tb.v |
49729 | Set to (W-2)×(H-2) for your image |
conv_out_data_size |
sobel_top.v |
29 | Fixed; sized for max Gx/Gy value |
sqrt_in_data_size |
sobel_top.v |
60 | Fixed; holds Gx²+Gy² (two 29-bit squares) |
sqrt_out_data_size |
sobel_top.v |
30 | Fixed; = sqrt_in_data_size / 2 |
WIDTH |
sqrt.v |
32 | Overridden to 60 by top-level instantiation |
OUT_WIDTH |
sqrt.v |
16 | Overridden to 30 by top-level instantiation |
Located in tb/:
Instantiates the full sobel_top pipeline with window_count = 49729 (225×225 image). Applies a 2-cycle reset, then waits for the done signal before stopping simulation. Monitors clock and reset on every change.
9 directed test cases verifying the convolution arithmetic. Each test applies known pixel values and checks the signed output against the hand-calculated expected result.
| Test | Expected Output | Calculation |
|---|---|---|
| 0 | +14 | (9 + 12 + 8) − (10 + 0 + 5) = 14 |
| 1 | −18 | (4 + 4 + 4) − (9 + 12 + 9) = −18 |
| 2 | −22 | (0 + 4 + 3) − (9 + 12 + 8) = −22 |
| 3 | +10 | (6 + 16 + 5) − (0 + 10 + 7) = +10 |
| 4 | −15 | (2 + 8 + 4) − (6 + 18 + 5) = −15 |
| 5 | −16 | (2 + 6 + 3) − (6 + 16 + 5) = −16 |
| 6 | −1 | (8 + 10 + 8) − (5 + 14 + 8) = −1 |
| 7 | −12 | (4 + 8 + 5) − (9 + 10 + 10) = −12 |
| 8 | −17 | (3 + 6 + 0) − (8 + 10 + 8) = −17 |
Applies 9 integer inputs and displays the computed integer square root via $monitor. Useful for verifying the non-restoring algorithm across a range of values.