how to efficiently calculate sum of cumulative product on sliding window? #16652

frskplis · 2023-07-07T16:01:55Z

frskplis
Jul 7, 2023

I have 2D array, for which I need to calculate sum of rolling cumulative product.
I have tried 2 equivalent approaches:

from jax import jit, vmap, numpy as jnp
import numpy as np

def func1(x, y):
  my_lst = []
  for i in range(x.shape[0]):
    res = jnp.sum(jnp.cumprod(x[i:] * y[i]))
    my_lst.append(res)
  return jnp.stack(my_lst)

def func2(x, y):
  my_lst = []
  for i in range(x.shape[0]):
    s = 0.0
    p = 1.0
    for a in x[i:]:
      p*=a * y[i]
      s+=p
    my_lst.append(s)
  return jnp.stack(my_lst)

For which I tested it on array:

x = jnp.array(np.random.randn(100, 10))
y = jnp.array(np.random.randn(100, 10))

my_func1_jit = jit(vmap(func1))
my_func2_jit = jit(vmap(func2))

res1 = my_func1_jit(x, y)
res2 = my_func2_jit(x, y)

Because of loop unrolling in func2 the compilation time is very long (a few minutes), but XLA is more capable of optimizing the code. Func2 is faster by around 10-50x depending on array shape, but if I increase array shape to (100000, 400) func2 won't compile due to OOM. Is there another approach that is faster than func1?

Answered by jakevdp

Jul 7, 2023

It may be possible to do this more efficiently with a custom kernel, but keep in mind that the kinds of operations that are efficient on GPUs are the kinds of operations used in my solution (full-axis reductions over statically-sized arrays). You'd probably have to play some of the same tricks in your custom kernel that I did in func3, but I wouldn't be surprised if you could make it faster with some thought.

View full answer

jakevdp · 2023-07-07T16:26:59Z

jakevdp
Jul 7, 2023
Maintainer

I would probably rewrite this in terms of vectorized operations. This computes the same results as your two functions, without any explicit iteration:

def func3(x, y):
  assert x.ndim == y.ndim == 1
  assert x.shape == y.shape
  i = jnp.arange(x.shape[0])
  mask = i[None, :] < i[:, None]
  cumprod = jnp.where(mask, 1, x[None, :] * y[:, None]).cumprod(1)
  return jnp.where(mask, 0, cumprod).sum(1)

res3 = jit(vmap(func3))(x, y)

0 replies

frskplis · 2023-07-07T19:52:13Z

frskplis
Jul 7, 2023
Author

Thank you, @jakevdp. This is a very clever solution! For larger arrays, though, this solution tends to be slower than func1. I think the reason is the numerous multiplications by 1's that don't contribute to the final result, but are computationally intensive. Would it be possible to implement such a function as a custom operation as described here:
https://jax.readthedocs.io/en/latest/Custom_Operation_for_GPUs.html. Will it make it faster?

2 replies

jakevdp Jul 7, 2023
Maintainer

It may be possible to do this more efficiently with a custom kernel, but keep in mind that the kinds of operations that are efficient on GPUs are the kinds of operations used in my solution (full-axis reductions over statically-sized arrays). You'd probably have to play some of the same tricks in your custom kernel that I did in func3, but I wouldn't be surprised if you could make it faster with some thought.

Answer selected by frskplis

frskplis Jul 7, 2023
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to efficiently calculate sum of cumulative product on sliding window? #16652

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

how to efficiently calculate sum of cumulative product on sliding window? #16652

frskplis Jul 7, 2023

Replies: 2 comments · 2 replies

jakevdp Jul 7, 2023 Maintainer

frskplis Jul 7, 2023 Author

jakevdp Jul 7, 2023 Maintainer

frskplis Jul 7, 2023 Author

frskplis
Jul 7, 2023

Replies: 2 comments 2 replies

jakevdp
Jul 7, 2023
Maintainer

frskplis
Jul 7, 2023
Author

jakevdp Jul 7, 2023
Maintainer

frskplis Jul 7, 2023
Author