HIGH-THROUGHPUT, LOSSLESS DATA COMPRESSION ON FPGAS
Abstract—Lossless compression is often used before writing
data to a storage medium or transmitting across a transmission
medium. Compression aids by saving storage space or
transmission bandwidth; a decompression operation is
performed when the data is subsequently read. Though this
scheme has clear benefits, the execution time of compression
and decompression is critical to its application in real-time
systems. Software compression utilities are often slow, leading
to degraded system performance. Hardware-based solutions,
on the other hand, often drive large resource requirements and
are not amenable to supporting future algorithmic changes. In
the current article, we present a high-throughput, streaming,
lossless compression algorithm and its efficient implementation
on FPGAs. The proposed solution provides a peak throughput
of 1GB/sec per engine, with a sustained overall measured
throughput of 2.66GB/sec on a PCIe-based FPGA board with
two compression and two decompression engines. This result
represents an overall speedup of 13.6x over reference software
implementation. The proposed design is very lean, and, with
multiple engines running in parallel, provides a path to
potential speedups of up to two orders of magnitude. In the
current implementation, the achievable overall throughput is
limited only by the available PCIe bus bandwidth.
DOWNLOAD PAPER
RELATED VIDEO

|
DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS
Abstract—Parallel-prefix adders (also known as carrytree
adders) are known to have the best performance in
VLSI designs. However, this performance advantage
does not translate directly into FPGA implementations
due to constraints on logic block configurations and
routing overhead. This paper investigates three types of
carry-tree adders (the Kogge-Stone, sparse Kogge-Stone,
and spanning tree adder) and compares them to the
simple Ripple Carry Adder (RCA) and Carry Skip
Adder (CSA). These designs of varied bit-widths were
implemented on a Xilinx Spartan 3E FPGA and delay
measurements were made with a high-performance logic
analyzer. Due to the presence of a fast carry-chain, the
RCA designs exhibit better delay performance up to 128
bits. The carry-tree adders are expected to have a speed
advantage over the RCA as bit widths approach 256.
DOWNLOAD PAPER
RELATED VIDEO

PREVIOUS PAGE|NEXT PAGE
|
|
|