One big question people have about high-level synthesis (HLS) is whether or not it is ready for mainstream use. In other words, does it really work (yet)? HLS has had a long history starting with products like Synopsys’s Behavioral Compiler and Cadence’s Visual Architect which never achieved any serious adoption. Then there was a next generation with companies like Synfora, Forte and Mentor’s Catapult. More recently still AutoESL and Cadence’s CtoSilicon.
I met Atul, CEO of AutoESL, last week and he gave me a copy of an interesting report that they had commissioned from Berkeley Design Technology (BDT) who set out to answer the question “does it work?” at least for the AutoESL product, AutoPilot. Since HLS is a competitive market, and the companies in the space are constantly benchmarking at customers and all are making some sales, I think it is reasonable to take this report as a proxy for all the products in the space. Yes, I’m sure each product has its own strengths and weaknesses and different products have different input languages (for instance, Forte only accepts SystemC, Synfora only accepts C and AutoESL accepts C, C++ and SystemC).
BDT ran two benchmarks. One was a video motion analysis algorithm and the other was a DQPSK (think wireless router) receiver. Both were synthesized using AutoPilot and then Xilinx’s tool-chain to create a functional FPGA implementation.
The video algorithm was examined in two ways: first, with a fixed workload at 60 frames per second, how “small” a design could be achieved. Second, given the limitations of the FPGA, how high a frame rate could be achieved. The wireless receiver had a spec of 18.75 megasamples/second, and was synthesized to see what the minimal resources required were to meet the required throughput.
For comparison, they implemented the video algorithms using Texas Instruments TMS320 DSP processors. This is a chip that costs roughly the same as the FPGA they were using, Xilinx’s XC3SD3400A, in the mid $20s.
The video algorithm used 39% of the FPGA but to achieve the same result using the DSPs required at least 12 of them working in parallel, obviously a much more costly and power hungry solution. When they looked at how high a frame rate could be achieved, the AutoPilot/FPGA solution could achieve 183 frames per second, versus 5 frames per second for the DSP. The implementation effort for the two solutions was roughly the same. This is quite a big design, using ¾ of the FPGA. Autopilot read 1600 lines of C and turned it into 38,000 lines of Verilog in 30 seconds.
For the wireless receiver they also had a hand-written RTL implementation for comparison. AutoPilot managed to get the design into 5.6% of the FPGA, and the hand-written implementation achieved 5.9%. I don’t think the difference is significant, and I think it is fair to say that AutoPilot is on a par with hand-coded RTL (at least for this example, ymmv). Using HLS also reduces the development effort by at least 50%.
BDT’s conclusion is that they “were impressed with the quality of results that AutoPilot was able to produce given that this has been a historic weakness for HLS tools in general.” The only real negative is that the tool chain is more expensive (since AutoESL doesn’t come bundled with your FPGA or your DSP).
It would, of course, be interesting to see the same reference designs put through other HLS tools, to know whether these results generalize. But it does look as if HLS is able to achieve results comparable with hand-written RTL at least for this sort of DSP algorithm. But, to be fair to hand-coders, these sort of DSP algorithms where throughput is more important than latency, is a sort of sweet-spot for HLS.
If you want to read the whole report, it’s here.