Add a fusion rewrite for `CAReduce`s with `Elemwise` inputs #1285

brandonwillard · 2022-11-04T22:37:41Z

This PR adds fusion rewrites for CAReduce nodes with Elemwise-derived inputs.

Make the Python backend work for the Composite Ops generated by this fusion
Do something about CAReduceDtype
It's a fairly redundant subclass that probably should be merged with CAReduce anyway.
Add more/better tests
- ~~E.g. test the axis parameter~~
Consider only performing the rewrite when not using the Python backend (for performance reasons)
~~[ ] Support multiple inputs (optional)~~
This will require some refactoring of CAReduce or a new subclass and should be split off into its own issue/PR. See Fuse CAReduces with multi-input Elemwises #1307.

ricardoV94 · 2022-11-06T09:18:42Z

Should we only fuse when the unreduced output has a single client, and therefore is definitely never needed?

codecov · 2022-11-06T15:55:29Z

Codecov Report

Merging #1285 (91f3438) into main (3ad936f) will increase coverage by 0.03%.
The diff coverage is 94.53%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1285      +/-   ##
==========================================
+ Coverage   74.12%   74.15%   +0.03%     
==========================================
  Files         174      174              
  Lines       48652    48706      +54     
  Branches    10366    10372       +6     
==========================================
+ Hits        36064    36119      +55     
- Misses      10299    10301       +2     
+ Partials     2289     2286       -3

Impacted Files	Coverage Δ
aesara/compile/function/pfunc.py	`84.18% <ø> (-0.24%)`	⬇️
aesara/compile/function/types.py	`79.16% <75.00%> (+0.16%)`	⬆️
aesara/tensor/elemwise.py	`88.07% <90.54%> (-0.52%)`	⬇️
aesara/tensor/rewriting/elemwise.py	`86.40% <94.44%> (+0.65%)`	⬆️
aesara/scalar/basic.py	`79.02% <95.16%> (+0.10%)`	⬆️
aesara/compile/mode.py	`84.47% <100.00%> (+1.22%)`	⬆️
aesara/tensor/math.py	`90.40% <100.00%> (+0.37%)`	⬆️

brandonwillard · 2022-11-06T18:20:58Z

Should we only fuse when the unreduced output has a single client, and therefore is definitely never needed?

Yeah, that and a few other things need/needed to be done before this stops being a draft. I just added it now, though—along with another fix.

brandonwillard · 2022-11-06T18:52:49Z

Some current results:

import numpy as np

import aesara
import aesara.tensor as at

from aesara.compile.mode import get_mode


fusion_mode = get_mode("FAST_RUN").including("local_careduce_fusion")
no_fusion_mode = get_mode("FAST_RUN").excluding("local_careduce_fusion")


x = at.matrix("x")
y = at.exp(x).sum(axis=1)

y_fn = aesara.function([x], y, mode=no_fusion_mode)

aesara.dprint(y_fn)
# Sum{axis=[1], acc_dtype=float64} [id A] 1
#  |Elemwise{exp,no_inplace} [id B] 0
#    |x [id C]

y_fusion_fn = aesara.function([x], y, mode=fusion_mode)

aesara.dprint(y_fusion_fn)
# CAReduce{Composite{(i0 + exp(i1))}}{axis=[1], acc_dtype=float64} [id A] 0
#  |x [id B]

rng = np.random.default_rng(23920)

x_small_val = rng.random((10, 10))
x_large_val = rng.random((5000, 2000))

%timeit y_fn(x_small_val)
# 6.58 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit y_fn(x_large_val)
# 198 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

res = y_fn(x_large_val)
exp_res = np.exp(x_large_val).sum(axis=1)
assert res.shape == exp_res.shape
assert np.allclose(res, exp_res)

%timeit y_fusion_fn(x_small_val)
# 6.25 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit y_fusion_fn(x_large_val)
# 55.3 ms ± 826 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

res = y_fusion_fn(x_large_val)
assert res.shape == exp_res.shape
assert np.allclose(res, exp_res)

- Lazily create and cache `FunctionGraph`s, the `Composite.perform` implementation, C code, and name values - Use `fgraph_to_python` for `Composite.perform` - Use the `HasInnerGraph` interface

brandonwillard marked this pull request as draft November 4, 2022 22:38

brandonwillard added the performance concern label Nov 4, 2022

brandonwillard linked an issue Nov 4, 2022 that may be closed by this pull request

Fuse CAReduces and Elemwises #1116

Closed

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch 2 times, most recently from cbf33e4 to b681459 Compare November 4, 2022 22:56

brandonwillard added the graph rewriting label Nov 5, 2022

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch 5 times, most recently from 914f7f6 to c371651 Compare November 6, 2022 05:18

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from c371651 to a9d8ca0 Compare November 6, 2022 14:50

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from a9d8ca0 to 9f9f2a0 Compare November 6, 2022 18:17

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from 9f9f2a0 to 01b8153 Compare November 11, 2022 23:55

brandonwillard self-assigned this Nov 20, 2022

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch 2 times, most recently from 34ca8c3 to d977ee4 Compare November 21, 2022 01:53

brandonwillard mentioned this pull request Nov 21, 2022

Fuse CAReduces with multi-input Elemwises #1307

Open

brandonwillard marked this pull request as ready for review November 21, 2022 01:58

brandonwillard added 2 commits November 20, 2022 20:02

Remove unnecessary logger and dunder setting in pfunc

5b3a6c8

Merge CAReduce and CAReduceDtype

4a4b084

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from d977ee4 to d3830c4 Compare November 21, 2022 02:02

brandonwillard requested a review from rlouf November 21, 2022 05:22

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from d3830c4 to e9839a1 Compare November 21, 2022 18:21

brandonwillard added 2 commits November 21, 2022 19:21

Refactor Composite Op

57fba5c

- Lazily create and cache `FunctionGraph`s, the `Composite.perform` implementation, C code, and name values - Use `fgraph_to_python` for `Composite.perform` - Use the `HasInnerGraph` interface

Add get_target_language function and remove tests.compile.test_modes

2228ddc

brandonwillard added 2 commits November 21, 2022 19:21

Set the global mode during compilation

c0e4734

Add a fusion rewrite for CAReduces with Elemwise inputs

91f3438

brandonwillard force-pushed the fuse-CAReduce-and-Elemwise branch from e9839a1 to 91f3438 Compare November 22, 2022 01:21

brandonwillard merged commit ae20174 into aesara-devs:main Nov 22, 2022

brandonwillard deleted the fuse-CAReduce-and-Elemwise branch November 22, 2022 15:57

brandonwillard mentioned this pull request Dec 12, 2022

How should we handle scalar constants in Elemwise fusions? #1270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fusion rewrite for `CAReduce`s with `Elemwise` inputs #1285

Add a fusion rewrite for `CAReduce`s with `Elemwise` inputs #1285

brandonwillard commented Nov 4, 2022 •

edited

Loading

ricardoV94 commented Nov 6, 2022

codecov bot commented Nov 6, 2022 •

edited

Loading

brandonwillard commented Nov 6, 2022

brandonwillard commented Nov 6, 2022 •

edited

Loading

Add a fusion rewrite for CAReduces with Elemwise inputs #1285

Add a fusion rewrite for CAReduces with Elemwise inputs #1285

Conversation

brandonwillard commented Nov 4, 2022 • edited Loading

ricardoV94 commented Nov 6, 2022

codecov bot commented Nov 6, 2022 • edited Loading

Codecov Report

brandonwillard commented Nov 6, 2022

brandonwillard commented Nov 6, 2022 • edited Loading

Add a fusion rewrite for `CAReduce`s with `Elemwise` inputs #1285

Add a fusion rewrite for `CAReduce`s with `Elemwise` inputs #1285

brandonwillard commented Nov 4, 2022 •

edited

Loading

codecov bot commented Nov 6, 2022 •

edited

Loading

brandonwillard commented Nov 6, 2022 •

edited

Loading