This project builds a FPGA implementing bilinear interpolation resizing RGB image from 100x100 to 60x60, 80x80, 120x120, 140x140.
Since communication between PC and FPGA is expected, a Finite State Machine is included to provide state control for other modules. The components:
On the FPGA side,
- UART: communicate with the PC side
- Finite State Machine (FSM): receive commands from UART and change state accordingly. Each state change will trigger the execution of other modules
- LEDs: state indicator for debugging
- Source memory A and B: 100x100xRGBx2 dual-port block memory for storing input image, in total providing 4 concurrent access to source memory
- Interpolation processor: pipelined interpolation processor Result memory: 140x140xRGB block memory for storing result image
On the PC side,
- UI: interaction with the FPGA by UART
-
UART
For quick development, the UART Verilog code is from http://www.nandland.com.
-
Source memory
During bilinear interpolation, one pixel in the resulting image requires interpolation during the nearest 4 pixels in the source image. To facilitate pipelined computation, two dual-port block ram of 100x100 bytes are set up for each RGB channel.
When receiving the source image from UART, both port A of the two dual-port memory are used to write incoming bytes concurrently.
-
Result memory
Since the memory receives one byte at a time, a single port block ram of 140x140 bytes is set up for each RGB channel.
-
FSM implementation
The FSM needs to coordinate the following events:
- receive an image from the PC and store it to source memory
- receive image resizing scale factor from PC
- start interpolation processor
- send result image to PC
- clear result image memory
As a result, it includes the following 6 states (including idle) and transitions:
All the state changes are initiated by the UART command from the PC and return to the idle state once the task is performed. The communication protocol is as shown below:
Command from PC | FPGA action | FPGA response |
---|---|---|
8'd0 | reset | 8'd0 |
8'd1 + 8'dx | set scaling factor x 0:0.6 1:0.8 2:1.2 3:1.4 |
8'd1 |
8'd2 + 8'dx + <3x100 bytes> | set one row of input image x is #th of row sample: [R0,G0,B0,R1,G1,B1,...] |
8'd2 (after receiving the row) 8'd6 (after receiving the whole image) |
8'd3 | initiate image processing | 8'd3 (upon receiving) 8'd4 (upon completion) |
8'd4 + 8'dx | read one row of output image x is #th of row sample: [R0,G0,B0,R1,G1,B1,...] |
<3x140 bytes> |
8'd5 | clear result memory | 8'd5 |
During actual computation, due to timing constraints, an 8 stage pipeline is used as shown below.
In the pipeline, dx[8:0] and dy[8:0] are used to hold the destination pixels and range from 0 to the result image dimension. The pipeline will terminate when dx[8] and dx[y] meet the end of the result image dimension.
Meanwhile, since the floating-point operation is not supported, values related to such operations are stored as integers and enlarged by 100 during computation. When the decimal is required, the values are modulo by 100 to get the decimal part.
# | Purpose |
---|---|
0 | breathing probe |
1 | uart: tx |
2 | uart: rx |
3 | state: idle |
4 | state: set scale factor |
5 | state: set write row number |
6 | state: set write row |
7 | state: process |
8 | state: set send row number |
9 | state: send row |
-
FPGA clock: 10MHz
-
UART baud rate: 230400
-
Theoretical time to send source image to FPGA:
- Total bytes to transfer: 100x100x3 (image) + 2x100 (command) = 30200 bytes
- Highest transfer rate: 28800 bytes/s
- Expected time: 1.05 seconds
- Actual time: ~3 seconds
- Possible reason: potential cause would be slowing buffering and response waiting on the PC side as the FPGA is only passively receiving image
-
Theoretical time to receive result image from FPGA:
- Total bytes to transfer: 140x140x3 (image) + 2x100 (command) = 59000 bytes
- Highest transfer rate: 28800 bytes/s
- Expected time: 2.05 seconds
- Actual time: ~5 seconds
-
Theoretical time to process an image (from 100x100 to 140x140):
- Stages needed: 140x140 + (8-1) = 19607
- Per stage time: 1/10MHz = 100ns
- Expected time: 19607x10 = 196070ns = 196ms = 0.2s
- Actual time: similar
An apple is interpolated from 100x100 to 140x140:
When the FPGA is overclocked from 10MHz to 100MHz (which violates the timing constraint), part of the resulting image is corrupted:
/documentation
: images and earlier documents/hw
: Vivado 2020.2 Project/img
: test images/ui
: UI implemented by Processing
Task | Description | Progress |
---|---|---|
UI | load image send command send image receive image |
✅ |
HW set up | ✅ | |
HW uart | can communicate with host | ✅ |
HW FSM | state control | ✅ |
HW image memory | source_image_[R|G|B] result_image_[R|G|B] |
✅ |
HW receive, copy, send image | test memory access capability from host | ✅ |
HW interpolate | design pipeline | ✅ |
HW interpolate | receive interpolate parameters | ✅ |
HW interpolate image | compute | ✅ |
Report and Slide | ✅ |
Task | Description | Progress |
---|---|---|
check mem | mem has a read latency of 2 cycles, what's the impact? | |
UI image border | border currently overlaps with image | |
HW | break top.v into modules | ✅ |
UART | try to speed faster than 115200 | ✅ |
UI | one button finish all | |
UI | image hot reload | ✅ |
UI | support reset |