TimVideo GSoC 2014: MJPEG Optimisation

Today I planned to complete the simulation of HDMI2USB but it turns out that the calibration_done signal is not yet high( it has been 6 hours now). I tried changing the tb, using some advance options like Joelw suggested but nothing seems to work. So I am either doing something terribly wrong or it is supposed to take a lot of time. So finally I decided to use pen and paper and try understand the code. It took me a lot of time because VHDL by nature is concurrent and I had not fully understood the working of Xilinx MCB but I guess I have done it correctly. Turns out that image buffer works fine.

The DDR2 read and write state machines are pretty complex because in general memory controller are complex. From what I could comprehend, there are three state machine in image buffer of HDMI2USB, one read from the RAM, second writes into the RAM and the third which controls the read and write state machine.

The third state machine looks something like this:
1) Wait for start of frame
2) If start of frame is detected, start writing the frame onto RAM until end of frame is detected (wr_img=1)
3) Once end of frame is detected, send start command to JPEG encoder and wait for "Jpeg is busy" signal(wr_img=0)
4) If "Jpeg is busy" detected, start reading from RAM till the entire frame has been read(rd_img=1)
5) Wait for done signal from encoder.(rd_img=0)
6) Go back to step (1)

I don't think read and write can be pipelined as DDR2 Ram do not allow simultaneous read and write.

Here the only optimisation I can see is that instead of waiting for done after completing reading of frame (step 4), the state machine can wait for start of next frame and start reading.

To understand the read and the write state machine, I dug the data sheet of MIG.
Read state machine looks something like this:

1) RESET: Wait for calibration to be done.
2) read_cmd: if rd_img=1. Put read command and address into the command fifo.
3) Wait for read data fifo of RAM to fill up. (64 words)
4) Once full, send the data into JPEG buffer if the Jpeg buffer is not full.
5) If 64 words are read goto step (2)

Write state machine:
1) Wait for calibration
2) If wr_img=1 and there is something to be written fill the write data fifo of RAM(64 words)
3) Once full, push write command and address to command fifo
4) Wait for write to complete.
5) Once done goto 2

The raw rgb data from image selector is first buffered using fifos and then sent to RAM. This helps prevent loss of data but adds to the latency.

Everything seems to be legit. Only optimisation I can see is that instead of one read port, two can be used to pipline read cycles. So when one read data fifo is completely read and there is still space in the JPEG buffer, data from second read data fifo can be used. But since this is a DDR2 ram operating at 325 Mhz read time from RAM to fifo should not be great, so using two ports won't change much.

Also, an inherent problem with the jpeg algorithm design is that 8 lines are required to start encoding. Since processing of frame is not pipelined as seen above and after processing of a frame the system resets, for a resolution of 1024x768, in every frame there is a stall of 1024*8 cycles.

Tomorrow (I mean today) I will try to test the bandwidth of USB. This article says maximum throughput is 40 MBps. So for 30 fps frame rate of 1024x768 resolution frames, the minimum bandwidth (assuming compression ratio of 10) should be (1024*768*24*30/10)/1024/1024 = 54MBps. Am I missing something?

TimVideo GSoC 2014: MJPEG Optimisation

Friday, 20 June 2014

No comments:

Post a Comment