Generating Hyper-Portable Future-Proof Computational Kernels with SPIRAL

(FA8750-16-2-0033 — DARPA BRASS)

Franz Franchetti (PI), Mike Franusich (SpiralGen, Inc. Co-PI), James C. Hoe (Co-PI), Tze Meng Low (Co-PI), José; M. F. Moura, David Padua (UIUC, Co-PI)

PhD Students: Richard Veras (PhD student, advisor: Franchetti), Thom Popovici (PhD student, advisor: Franchetti)
Engineers: Brian Duff, Jason Larkin

Overview

Software implementations and algorithms that solve a particular problem are inherently not future-proof as an algorithm appropriate for the problem and execution environment has to be instantiated into software that looks in many choices. However, the actual operation—the answer or a given input—remains invariant through execution environment changes and changes in required fidelity. Our layered approach to application specification leverages this insight: we capture the systematic process by which the specification of a desired operation is turned into an algorithm, and subsequently instantiated into an actual implementation.

We target a DARPA relevant class of problems that is captured through regular computation over multi-dimensional data. Important signal and image processing operations like synthetic aperture radar (SAR), and space-time adaptive processing (STAP) for target detection fall into this class and run on DARPA relevant high performance embedded computing (HPEC) platforms. In this class are also simulations and computer-aided design/engineering (CAD/CAE) problems on traditional HPC platforms and embedded application kernels. For these problems, we propose a layered approach to specifying 1) the desired operation, 2) the paradigm (environment) in which the operation is computed, and 3) the specific details of the paradigm, i.e. the ISA, cache sizes, and the type of multi-threading available. These three layers map to changes required in the operation, algorithms, and implementions respectively. An implication of separating the specification into these three distinct but inter-related layers is that there is a natural separation of concerns, allowing the same specificaion to handle different changes to the environment. For example, when porting the operation to an architecture with a different ISA, this only affects the regions of the code touched by the platform specification, whereas changing from a sequential to multicore environment requires changes to the choice of algorithm.

The lowering of abstractions from operation specification to algorithms, and subsequently, to implementation is performed through a systematic process of algebraic term rewriting. This preserves the correctness of the operation specification, while allowing different algorithm specifications to be formally derived from the original operation specification. In addition, these systematic rewrites also allow analysis to carry over from one algorithm to another. We provide hooks for BRASS TA2 performers to fully query and reconfigure applications specified and built from our specifications. Our enabling technologies are the SPIRAL system that has demonstrated performance portability across a wide variety of platforms for the domain of spectral methods, a calculus of loop invariants that allows one to derive algorithms with desired performance characteristics, and language/compiler technology from the Polaris group at UIUC.

Approach

Our approach centers on the ideas that future-proof code requires a layered specification of semantics, quality/parameter relationships, performance/quality/resource trade-offs, quality measures and fault tolerance, and a powerful code sythesis engine that turns this specification into reconfigurable, extensible high performance implementations specialized to their execution environment. We base our proposed system on the SPIRAL code generation and autotuning system and philosophy, augmented with lessons learned from FFTW, BLIS, ATLAS, HTA, PetaBricks, and modern as well as traditional optimizing compiler approaches. The key idea is to turn SPIRAL's extreme cross-platform compile-time portability into an extensible post-deployment reconfiguration capability. SPIRAL becomes akin to an installation/reconfiguration-time JIT that regenerates an FFTW-like implementation for a given specification and execution target. However, the implementation is not used as autotuning library but as a framework for BRASS TA2 systems to reconfigure the application based on execution environment changes and changes in user/mission requirements. Finally, we define a user-friendly specification language that borrows from HTA, and functional programming ideas to allow users to fully express application intent and trade-offs as well as execution environments. To provide an even easier on-ramp for legacy code, a subset of the functionality will be accessible through a delayed-evaluation interface using BLAS, LAPACK, FFTW and a subset of C as domain-specific specification language.

perfect2

Selected SPIRAL Publications (Underlying Technology)

  1. M. Püschel, F. Franchetti, Y. Voronenko
    Spiral
    Encyclopedia of Parallel Computing, D. A. Padua (Editor), pp. 1920-1933, Springer 2011.
  2. F. Franchetti, F. de Mesmay, D. S. McFarlin, M. Püschel
    Operator Language: A Program Generation Framework for Fast Kernels
    Proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC), 2009.
    Best Paper Award
  3. D. S. McFarlin, V. Arbatov, F. Franchetti, M. Püschel
    Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
    Proceedings of International Conference on Supercomputing (ICS), 2011.
  4. P. A. Milder, F. Franchetti, J. C. Hoe, M. Püschel
    Computer Generation of Hardware for Linear Digital Signal Processing Transforms
    ACM Transactions on Design Automation of Electronic Systems, 17(2), Article 15, 2012.
    ACM TODAES Best Paper Award 2014
  5. S. Chellappa, F. Franchetti, M. Püschel
    Computer Generation of Fast Fourier Transforms for the Cell Broadband Engine
    Proceedings of International Conference on Supercomputing (ICS), 2009.
  6. P. D'Alberto, F. Franchetti, P. A. Milder, A. Sandryhaila, J. C. Hoe, J. M. F. Moura, M. Püschel
    Generating FPGA Accelerated DFT Libraries
    Proceedings of Field-Programmable Custom Computing Machines (FCCM) 2007.
  7. F. Franchetti, Y. Voronenko, M. Püschel
    FFT Program Generation for Shared Memory: SMP and Multicore
    Proceedings Supercomputing 2006.
  8. F. Franchetti, M. Püschel
    Short Vector Code Generation and Adaptation for DSP Algorithms
    Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 537-540, 2003.
  9. F. Franchetti, Y. Voronenko, G. Almasi
    Automatic Generation of the HPC Challenges Global FFT Benchmark for BlueGene/P
    Proceedings of High Performance Computing for Computational Science (VECPAR) 2012.
  10. F. Franchetti, Y. Voronenko, S. Chellappa, J. M. F. Moura, M. Püschel
    Discrete Fourier Transform on Multicores: Algorithms and Automatic Implementation
    IEEE Signal Processing Magazine, Special Issue on "Signal Processing on Platforms with Multiple Cores", 2009.
  11. M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, N. Rizzolo
    SPIRAL: Code Generation for DSP Transforms
    Proceedings of the IEEE Special Issue on "Program Generation, Optimization, and Adaptation", Vol. 93, No. 2, pp. 232-275, 2005.

Copyrights to many of the above papers are held by the publishers. The attached PDF files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Some links to papers above connect to IEEE Xplore with permission from IEEE, and viewers must follow all of IEEE's copyright policies.