Limitations of forward mode AD #9328

ipcamit · 2022-01-25T23:43:53Z

ipcamit
Jan 25, 2022

My question might be more to do with general AD and JAX implementation of it, rather than JAX itself, so I apologize in advance.

Given the following function, how can one find its derivative with respect to input array in forward mode differentiation?

def vdw_energy(x):
     r = (
         (x1[0, 0] - x2[1, 0]) ** 2
         + (x1[0, 1] - x2[1, 1]) ** 2
         + (x1[0, 2] - x2[1, 2]) ** 2
     )
     E = 4 * (1 / r) ** 12 - (1 / r) ** 6
     return E

I ask because as per my understanding to get derivatives with respect to individual coordinates in above (each element of x) we need to "seed" the forward calculations with all elements being zero, except the desired variable. But as expected in above case it would lead to incorrect result as it changes the actual distances between particles. e.g.

import jax as jx
import jax.numpy as jnp

x = jnp.array([[10.0, 10.0, 10.0], [11.0, 10.0, 10.0]])
f_prime = jx.jacfwd(vdw_energy)(x)
# DeviceArray([[ 84.,   0.,   0.], [-84.,   0.,   0.]], dtype=float32)

x = jnp.array([[10.0, 0.0, 0.0], [11.0, 10.0, 10.0]]) # for x[0,0] component
f_prime = jx.jacfwd(vdw_energy)(x)
# DeviceArray([[-9.053339e-16, -9.053339e-15, -9.053339e-15],[ 9.053339e-16,  9.053339e-15,  9.053339e-15]],dtype=float32)

How exactly JAX is arriving at the correct results in forward mode?

Background: I would like to get same results using Boost autodiff framework, which only supports forward mode and only returns aggregated gradients. But I am unsure how exactly to go about it. Therefore i would like to know how jax did it in forward mode?

I have also asjed the same question in AI stackexchange, but no replies yet: https://ai.stackexchange.com/questions/34299/limitations-of-forward-mode-ad

Answered by YouJiacheng

Feb 16, 2022

brief explanation:
In forward-mode AD, we define jacobian-vector-product function jvp(f, x, v) for each basic operator f.
Where x is the point to differentiate the function, and v is the vector to product.
And jvp(f, x, v) == J_f[x]v for all x and v, where J_f[x] is the jacobian matrix of f at the point x.
For the function composition h(x)=g(f(x)), we can recursively obtain its jacobian-vector-product function.
Namely jvp(h, x, v) = jvp(g, f(x), jvp(f, x, v)), since J_h[x]v=J_g[f(x)]J_f[x]v.
Finally, we vectorize jvp w.r.t. v, and get jacfwd(f, x) = jvp_vectorized(f, x, I), where I is identity matrix.

Edited:
Actually, jax.jvp does compute gradient and function value in a single pass(the…

View full answer

YouJiacheng · 2022-02-16T12:31:44Z

YouJiacheng
Feb 16, 2022

brief explanation:
In forward-mode AD, we define jacobian-vector-product function jvp(f, x, v) for each basic operator f.
Where x is the point to differentiate the function, and v is the vector to product.
And jvp(f, x, v) == J_f[x]v for all x and v, where J_f[x] is the jacobian matrix of f at the point x.
For the function composition h(x)=g(f(x)), we can recursively obtain its jacobian-vector-product function.
Namely jvp(h, x, v) = jvp(g, f(x), jvp(f, x, v)), since J_h[x]v=J_g[f(x)]J_f[x]v.
Finally, we vectorize jvp w.r.t. v, and get jacfwd(f, x) = jvp_vectorized(f, x, I), where I is identity matrix.

Edited:
Actually, jax.jvp does compute gradient and function value in a single pass(the dual number approach).
This is jvp(f, x, v) == f(x), J_f[x]v and jvp(h, x, v) = jvp(g, *jvp(f, x, v))

0 replies

ipcamit · 2022-02-16T14:22:46Z

ipcamit
Feb 16, 2022
Author

Just to confirm, If I understand it correctly, JAX fwd mode does not use the conventional approach of pushing dual numbers through functions. Rather it used analytical expressions for derivatives of elementary operations to directly compute jvp. Am I correct?

8 replies

YouJiacheng Feb 16, 2022

Actually jax.jvp does compute gradient and function value in a single pass. I omit it for clarity, I don't know this will cause confusion.

ipcamit Feb 16, 2022
Author

Sorry I after posting my comment I just saw last comment. For differentiating with respect to vector valued input, would it not need to evaluate separately for each input value with correct "seeding". Sorry I am having bit hard time understanding how using dual numbers for a vector to scalar function, jax fwd mode is not giving accumulated gradients, like in boost.

YouJiacheng Feb 16, 2022

What is "accumulated gradients" mean? jvp(f, x, ones)?
JAX does need to evaluate "separately"(but vectorized) for each input dimension to obtain full jacobian, using same x and different one-hot v.

def jacfwd(f, x):
    assert x.ndim == 1
    in_tangents = jnp.eye(x.shape[0], dtype=x.dtype)
    pushfwd = partial(jax.jvp,f, (x,))
    y, jac = jax.vmap(pushfwd, out_axes=(None, -1))((in_tangents,))
    return jac

hawkinsp Feb 16, 2022
Maintainer

May I recommend the autodiff cook book: https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html#the-implementation-of-jacfwd-and-jacrev

ipcamit Feb 16, 2022
Author

Thanks @YouJiacheng that clarifies it.

For the record, by accumulated I meant, for a function f(x,y), boost returns (f, df) = (f(x,y) , (∂f/∂x * dx + ∂f/∂y * dy)). So for ∂f/∂y and ∂f/∂x, I need to do the one-hot step manually in a loop. I was wondering if Jax uses some more clever approach to overcome this limitation. Thank you again for your time.

hawkinsp · 2022-02-16T14:35:34Z

hawkinsp
Feb 16, 2022
Maintainer

Note: You can use jax.make_jaxpr(f)(...) to see exactly what JAX is computing and how it is computing it.

3 replies

YouJiacheng Feb 16, 2022

Do you know how to see XLA optimized HLO of a function? It seems jax.xla_computation(f)(x).as_hlo_text() only show the un-optimized one.

hawkinsp Feb 16, 2022
Maintainer

Sure:

In [24]: import jax, jax.numpy as jnp

In [25]: f = jax.jit(lambda x: jnp.cos(jnp.sin(x)))

In [26]: lowered = f.lower(42.5)

# Unoptimized IR in MHLO form
In [27]: print(lowered.compiler_ir(dialect="mhlo"))
module @jit__lambda_.22 {
  func public @main(%arg0: tensor<f32>) -> tensor<f32> {
    %0 = "mhlo.sine"(%arg0) : (tensor<f32>) -> tensor<f32>
    %1 = "mhlo.cosine"(%0) : (tensor<f32>) -> tensor<f32>
    return %1 : tensor<f32>
  }
}

# Unoptimized IR in HLO form
In [29]: print(lowered.compiler_ir(dialect="hlo").as_hlo_text())
HloModule jit__lambda_.22

ENTRY main.4 {
  Arg_0.1 = f32[] parameter(0)
  sine.2 = f32[] sine(Arg_0.1)
  ROOT cosine.3 = f32[] cosine(sine.2)
}

In [30]: compiled = lowered.compile()

# Optimized HLO IR:
In [31]: print(compiled.compiler_ir()[0].to_string())
HloModule jit__lambda_.22

%fused_computation (param_0.1: f32[]) -> f32[] {
  %param_0.1 = f32[] parameter(0)
  %sine.0 = f32[] sine(f32[] %param_0.1), metadata={op_name="jit(<lambda>)/jit(main)/sin" source_file="<ipython-input-23-3e3062811a39>" source_line=1}
  ROOT %cosine.0 = f32[] cosine(f32[] %sine.0), metadata={op_name="jit(<lambda>)/jit(main)/cos" source_file="<ipython-input-23-3e3062811a39>" source_line=1}
}

ENTRY %main.4 (Arg_0.1: f32[]) -> f32[] {
  %Arg_0.1 = f32[] parameter(0)
  ROOT %fusion = f32[] fusion(f32[] %Arg_0.1), kind=kLoop, calls=%fused_computation, metadata={op_name="jit(<lambda>)/jit(main)/cos" source_file="<ipython-input-23-3e3062811a39>" source_line=1}
}

YouJiacheng Feb 16, 2022

Thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limitations of forward mode AD #9328

{{title}}

Replies: 3 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Limitations of forward mode AD #9328

ipcamit Jan 25, 2022

Replies: 3 comments · 11 replies

YouJiacheng Feb 16, 2022

ipcamit Feb 16, 2022 Author

YouJiacheng Feb 16, 2022

ipcamit Feb 16, 2022 Author

YouJiacheng Feb 16, 2022

hawkinsp Feb 16, 2022 Maintainer

ipcamit Feb 16, 2022 Author

hawkinsp Feb 16, 2022 Maintainer

YouJiacheng Feb 16, 2022

hawkinsp Feb 16, 2022 Maintainer

YouJiacheng Feb 16, 2022

ipcamit
Jan 25, 2022

Replies: 3 comments 11 replies

YouJiacheng
Feb 16, 2022

ipcamit
Feb 16, 2022
Author

ipcamit Feb 16, 2022
Author

hawkinsp Feb 16, 2022
Maintainer

ipcamit Feb 16, 2022
Author

hawkinsp
Feb 16, 2022
Maintainer

hawkinsp Feb 16, 2022
Maintainer