Software Generation for the Cell Broadband Engine (Cell BE)

Overview

We are expanding the Spiral program generation system to generate fast code for the Cell BE. Our first targets are linear transforms, most importantly, the discrete Fourier transform (DFT).

Background

The Cell Broadband Engine is a chip-multiprocessor designed for high-density floating point computation. As shown in the figure below, its design includes multiple SIMD vector cores called SPEs (synergistic processing elements) with large register files. SPEs have their own local memory (local stores), and transfers from main memory to the local stores are handled explicitly by the programmer. These and other characteristics make the Cell BE difficult to program and to achieve high performance on.

The Cell BE is capable of a theoretical peak floating point performance of 204.8 Gflop/s using just the SPEs. The most affordable way of obtaining a Cell BE is by buying a Playstation 3 (PS3). However, only 6 SPEs in the PS3 are accessible by the programmer.

Cell BE structure (source: xtech06.usefulinc.com)
Cell BE based PS3

Benchmarks

Our experiments were conducted on Sony's PlayStation 3 (Cell processor at 3.2 GHz, 6 available SPEs), and the IBM Cell Blade QS20 (we used a single Cell processor with 8 SPEs). The plots show the performance of generate code for the 1D and 2D discrete Fourier transform (DFT) for various sizes and two input formats. The plots indicate where the input and output vectors are assumed to be resident: local stores (LS) or main memory. This is ongoing work.

Code

References

  1. Srinivas Chellappa, Franz Franchetti and Markus Püschel
    Computer Generation of Fast Fourier Transforms for the Cell Broadband Engine
    to appear in Proc. International Conference on Supercomputing (ICS), 2009
  2. Srinivas Chellappa, Franz Franchetti and Markus Püschel
    FFT Program Generation for the Cell BE
    Proc. International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), 2008

All our work on Cell BE.

Copyrights to many of the above papers are held by the publishers. The attached PDF files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Some links to papers above connect to IEEE Xplore with permission from IEEE, and viewers must follow all of IEEE's copyright policies.

More Information

Contact: Srinivas Chellappa (schellap@andrew.cmu, you have to add dot edu)