We are expanding the Spiral program generation system to generate fast code for the Cell BE. Our first targets are linear transforms, most importantly, the discrete Fourier transform (DFT).
The Cell Broadband Engine is a chip-multiprocessor designed for high-density floating point computation. As shown in the figure below, its design includes multiple SIMD vector cores called SPEs (synergistic processing elements) with large register files. SPEs have their own local memory (local stores), and transfers from main memory to the local stores are handled explicitly by the programmer. These and other characteristics make the Cell BE difficult to program and to achieve high performance on.
The Cell BE is capable of a theoretical peak floating point performance of 204.8 Gflop/s using just the SPEs. The most affordable way of obtaining a Cell BE is by buying a Playstation 3 (PS3). However, only 6 SPEs in the PS3 are accessible by the programmer.
Our experiments were conducted on Sony's PlayStation 3 (Cell processor at 3.2 GHz, 6 available SPEs), and the IBM Cell Blade QS20 (we used a single Cell processor with 8 SPEs). The plots show the performance of generate code for the 1D and 2D discrete Fourier transform (DFT) for various sizes and two input formats. The plots indicate where the input and output vectors are assumed to be resident: local stores (LS) or main memory. This is ongoing work.
Contact: Srinivas Chellappa (firstname.lastname@example.org, you have to add dot edu)