18 views

Skip to first unread message

May 14, 2017, 6:23:29 PM5/14/17

to Accelerate

Hi!

The book Parallel and Concurrent Programming in Haskell[1] discusses how to write Accelerate code that generates reusable CUDA kernels:

My first question is: does this also apply to the LLVM.Native backend?

My second question is an example. I have written the code (sorry for the complex signature and my newbie coding in general):

The type signature for gradient is:

So, my intention is that it produces an ever increasing sequence of Accelerate programs that compute repeated iterations of the gradient descent algorithm. My question is: how can I make sure that the gradient code is reused, say, 10 times for 10 iterations, rather than one gigantic Accelerate program generated for all 10 iterations? In particular, is the reusing of program fragments supposed to be reflected in the Graphviz file when using -ddump-(simpl-)dot, because right now it certainly is a gigantic graph.

Thank you, best regards, Panos

[1] http://chimera.labs.oreilly.com/books/1230000000929/ch06.html#sec_par-accel-shortest-paths

The book Parallel and Concurrent Programming in Haskell[1] discusses how to write Accelerate code that generates reusable CUDA kernels:

When the program runs, the Accelerate library evaluates the expression passed to`run`

to make a series of CUDA fragments (calledkernels). Each kernel takes some arrays as inputs and produces arrays as outputs. In our example, each call to`step`

will produce a kernel, and when we compose a sequence of`step`

calls together, we get a series of kernels. Each kernel is a piece of CUDA code that has to be compiled and loaded onto the GPU; this can take a while, so Accelerate remembers the kernels it has seen before and tries to reuse them.Our goal with

`step`

is to make a kernel that will be reused. If we don’t reuse the same kernel for each`step`

, the overhead of compiling new kernels will ruin the performance.

My first question is: does this also apply to the LLVM.Native backend?

My second question is an example. I have written the code (sorry for the complex signature and my newbie coding in general):

`gradientDescent :: forall e is os . (Prelude.Floating e, A.Floating e, Lift Exp e, e ~ Plain e) => e -> Sing is -> Sing os -> SomeNeuralNetwork e is os -> ([PList ('(1, os) ': '[]) (ValueAndDerivative e)] -> Acc (Scalar (ValueAndDerivative e))) -> [Acc (Vector e)] -> Acc (Vector e) -> [Acc (Vector e)]`

gradientDescent eta sis sos nn f i p = let

g = gradient sis sos nn f i p

p' = zipWith (updateParam (the $ unit $ constant $ eta)) p g

in

p':(gradientDescent eta sis sos nn f i p')

where

updateParam :: Exp e -> Exp e -> Exp e -> Exp e

updateParam eta p g = p - eta * g

The type signature for gradient is:

`gradient :: forall e is os . (Prelude.Floating e, A.Floating e, Lift Exp e, e ~ Plain e) => Sing is -> Sing os -> SomeNeuralNetwork e is os -> ([PList ('(1, os) ': '[]) (ValueAndDerivative e)] -> Acc (Scalar (ValueAndDerivative e))) -> [Acc (Vector e)] -> Acc (Vector e) -> Acc (Vector e)`

So, my intention is that it produces an ever increasing sequence of Accelerate programs that compute repeated iterations of the gradient descent algorithm. My question is: how can I make sure that the gradient code is reused, say, 10 times for 10 iterations, rather than one gigantic Accelerate program generated for all 10 iterations? In particular, is the reusing of program fragments supposed to be reflected in the Graphviz file when using -ddump-(simpl-)dot, because right now it certainly is a gigantic graph.

Thank you, best regards, Panos

[1] http://chimera.labs.oreilly.com/books/1230000000929/ch06.html#sec_par-accel-shortest-paths

May 14, 2017, 8:45:42 PM5/14/17

to accelerat...@googlegroups.com

Hi Panos,

The general idea applies, but the actual caching was not part of the 1.0 release of the LLVM backends. That work is on this branch and almost complete though, so expect it soon. You get a similar result though if you can express your program in terms of `run1`

, just not across separate executions of your program. The `-ddump-phases`

debug flag will tell you how much time you are spending in compilation.

The graphviz output is actually a good place to look; each of the boxes on the graph corresponds to a kernel which will be compiled and executed (modulo a few operations such as `reshape`

and `#i aN`

(for some integers `i`

and `N`

) which don’t execute anything and are constant time), and so those are the operations which caching is going to cover.

Hope that helps.

-Trev

--

You received this message because you are subscribed to the Google Groups "Accelerate" group.

To unsubscribe from this group and stop receiving emails from it, send an email to accelerate-hask...@googlegroups.com.

Visit this group at https://groups.google.com/group/accelerate-haskell.

For more options, visit https://groups.google.com/d/optout.

May 18, 2017, 5:55:33 AM5/18/17

to Accelerate

Hi Trev,

Thank you very much for your reply, I found out about

Best regards, Panos

Thank you very much for your reply, I found out about

run1 a few hours after posting so this was poor research on my part, sorry!Best regards, Panos

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu