HIGH-THROUGHPUT, LOSSLESS DATA COMPRESSION ON FPGAS

    Abstract—Lossless compression is often used before writing data to a storage medium or transmitting across a transmission medium. Compression aids by saving storage space or transmission bandwidth; a decompression operation is performed when the data is subsequently read. Though this scheme has clear benefits, the execution time of compression and decompression is critical to its application in real-time systems. Software compression utilities are often slow, leading to degraded system performance. Hardware-based solutions, on the other hand, often drive large resource requirements and are not amenable to supporting future algorithmic changes. In the current article, we present a high-throughput, streaming, lossless compression algorithm and its efficient implementation on FPGAs. The proposed solution provides a peak throughput of 1GB/sec per engine, with a sustained overall measured throughput of 2.66GB/sec on a PCIe-based FPGA board with two compression and two decompression engines. This result represents an overall speedup of 13.6x over reference software implementation. The proposed design is very lean, and, with multiple engines running in parallel, provides a path to potential speedups of up to two orders of magnitude. In the current implementation, the achievable overall throughput is limited only by the available PCIe bus bandwidth.

DOWNLOAD PAPER

RELATED VIDEO                                                                                                                                      

DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS

    Abstract—Parallel-prefix adders (also known as carrytree adders) are known to have the best performance in VLSI designs. However, this performance advantage does not translate directly into FPGA implementations due to constraints on logic block configurations and routing overhead. This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder) and compares them to the simple Ripple Carry Adder (RCA) and Carry Skip Adder (CSA). These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay measurements were made with a high-performance logic analyzer. Due to the presence of a fast carry-chain, the RCA designs exhibit better delay performance up to 128 bits. The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256.

DOWNLOAD PAPER

RELATED VIDEO                                                                                                                                      

PREVIOUS PAGE|NEXT PAGE