How JAX implements JVP? Two passes of reverse mode? #10157

luweizheng · 2022-04-06T04:08:02Z

luweizheng
Apr 6, 2022

Hi,

I am new to JAX. As TensorFlow is mainly for deep learning, it is using reverse mode automatic differentiation. If we want to use forward mode automatic differentiation in TF, we must use two passes of reverse mode.

I want to know how JAX works on forward mode AD (JVP). There is a document Autodidax: JAX core from scratch. But I do not get the key ideas.

If JAX is using another way of forward mode, what about its computation overhead compared with TF's two passes of reverse mode? If JAX is using the same way as TF, maybe the differences between JAX and TF are numpy-like APIs, JVP API, aynchronous dispatch and other features...?

Could anyone explain it a little bit? Thanks!

Answered by soraros

Apr 6, 2022

Jax's forward-mode is implemented using the dual number approach, and reverse-mode using forward-mode and transpose.
I wouldn't expect Jax to have a larger tracing overhead, and there should be no numerical computing overhead compared with jitted TF.

View full answer

soraros · 2022-04-06T04:23:38Z

soraros
Apr 6, 2022

Jax's forward-mode is implemented using the dual number approach, and reverse-mode using forward-mode and transpose.
I wouldn't expect Jax to have a larger tracing overhead, and there should be no numerical computing overhead compared with jitted TF.

3 replies

mattjj Apr 6, 2022
Maintainer

The two-passes-of-reverse mode technique (which we developed while working on our predecessor project Autograd, see blog post and this issue where we surfaced the idea for TF) has worse memory complexity and has much higher overhead, unless also paired with jit or similar.

(The analogy to dual numbers is a pretty good one, though it's not perfect because it can be misleading for higher-order AD.)

soraros Apr 6, 2022

Isn't Jax using exactly the dual number approach (or, almost degenerate Taylor-mode)? By "dual number", I mean augmenting the arithmetic by packing push-forward with primal computation. Wikipedia also gave some examples about this:

mattjj Apr 7, 2022
Maintainer

The easiest way to tell the difference is to compare the generalization of jvp (namely jet, as in jax.experimental.jet) with the generalization of dual numbers (namely the ring R[x]/(x^k) rather than R[x]/(x^2)). It's like the difference between representing the terms in a (truncated) Taylor series versus the coefficients in a (truncated) Taylor series: the coefficient on x^2 would need to be a two-index object (e.g. have real dimension n^2 on a n-dimensional vector space) while the second element of the jet tuple only has one index (e.g. would only be n-dimensional, like the second term in the Taylor expansion).

Sorry if that's cryptic! It's not terribly important; I just wanted to share it as an aside.

YouJiacheng · 2022-04-06T04:46:16Z

YouJiacheng
Apr 6, 2022

A brief explanation: #9328 (comment)

In forward-mode AD, we define jacobian-vector-product function jvp(f, x, v) for each basic operator f.
Where x is the point to differentiate the function, and v is the vector to product.
And jvp(f, x, v) == J_f[x]v for all x and v, where J_f[x] is the jacobian matrix of f at the point x.
For the function composition h(x)=g(f(x)), we can recursively obtain its jacobian-vector-product function.
Namely jvp(h, x, v) = jvp(g, f(x), jvp(f, x, v)), since J_h[x]v=J_g[f(x)]J_f[x]v.
Finally, we vectorize jvp w.r.t. v, and get jacfwd(f, x) = jvp_vectorized(f, x, I), where I is identity matrix.
Actually, jax.jvp does compute gradient and function value in a single pass(the dual number approach).
This is jvp(f, x, v) == f(x), J_f[x]v and jvp(h, x, v) = jvp(g, *jvp(f, x, v))

The "recursion" jvp(h, x, v) = jvp(g, *jvp(f, x, v)) is only for explanation, since we can simply sequentially execute jvp of each step.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How JAX implements JVP? Two passes of reverse mode? #10157

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How JAX implements JVP? Two passes of reverse mode? #10157

luweizheng Apr 6, 2022

Replies: 2 comments · 3 replies

soraros Apr 6, 2022

mattjj Apr 6, 2022 Maintainer

soraros Apr 6, 2022

mattjj Apr 7, 2022 Maintainer

YouJiacheng Apr 6, 2022

luweizheng
Apr 6, 2022

Replies: 2 comments 3 replies

soraros
Apr 6, 2022

mattjj Apr 6, 2022
Maintainer

mattjj Apr 7, 2022
Maintainer

YouJiacheng
Apr 6, 2022