We reimplemented a subset of the original Spiral system (as defined in [3,4]) using modern program language features [1]. In the process we make heavy use of embedded DSLs and staging utilizing the Lightweight Modular Staging framework [2] to achieve to following goals:
In the case of program generators its beneficial to selectively decide if code should be executed at compile time, or if it should be part of the final program. To avoid code duplication we can encode both behaviors by abstraction over the staging decision. Since LMS uses types to annotate staging, we can utilize parametric polymorphism to achieve this as depicted below:
def somefunction(x0: T) : T = { val x1: T = x0 * 2.0 x1 }
Where type T can now be either Double or Rep[Double] (the staged type). This allows us to write code that is universal for precomputation and as part of the final program.
We observe that moving the staging annotation in the code changes the code style:
Meta Program (Scala)  Resulting Intermediate Representation  Unparsed Result (C) 

def f3(x: Array[Rep[Double]], y: Array[Rep[Double]])= { for (i < 0 until 2) { y(2*i) = x(i*2) + y(i*2+1) y(2*i+1) = x(i*2)  y(i*2+1) } 
t0 = s0 + s1; t1 = s0  s1; t2 = s2 + s3; t2 = s2  s3; 

def f3(x: Rep[Array[Double]], y: Rep[Array[Double]])= { for (i < 0 until 2) { y(2*i) = x(i*2) + y(i*2+1) y(2*i+1) = x(i*2)  y(i* 2+1) } 
t0 = x[0]; t1 = x[1]; t2 = t0 + t1; y[0] = t2; t3 = t0  t1; y[1] = t3; t4 = x[0]; t5 = x[1]; t6 = t4 + x5; y[0] = t6; t7 = t4 – x5; y[3] = t7; 

def f3(x: Rep[Array[Double]] , y: Rep[Array[Double]])= { for (i < (0 until 2)): Rep[Range] { y(2*i) = x(i*2) + y(i*2+1) y(2*i+1) = x(i*2)  y(i*2+1) } 
for (int i=0;i < 2;i++) { t0 = x[i]; t1 = x[i+1]; t2 = t0 + t1; y[i] = t2; t3 = t0  t1; y[i+1] = t3; } 
We then extend the idea of selective staging to also abstract over all these code styles.
We modify the function from above to the form:
type NoRep[T] = T def f3[L[_],A[_],T](looptype: L, x: A[Array[T]], y: A[Array[T]])= { for ( i < (0 until 2)): L[Range] { y(2 * i) = x(i * 2) + y(i * 2 + 1) y(2 * i + 1) = x(i * 2)  y(i * 2 + 1) }
This allows us to instantiate all of the above code styles from a single function.
val x: Looped_SplitComplex< val y: Scalar_AVXUnrolled y = x // unrolling with scalarization, data format change and vectorizationThe actual implementation is quite involved and is shown in full detail in [1]
Contact: Georg Ofenbeck