Function caching issue with broadcast #9869

pfitzseb · 2015-01-21T10:48:13Z

An issue with broadcasting came up in the Juno forums and I couldn't find anything relevant in the mailing list or here.

From a brief look at the code broadcast caches functions - this seems to lead to redefined methods being ignored when using broadcast again.

This means fun stuff like

julia> f(x) = x^2
f (generic function with 1 method)

julia> broadcast(f, [1 2 3])
1x3 Array{Int64,2}:
 1  4  9

julia> f(x) = x^3
f (generic function with 1 method)

julia> broadcast(f, [1 2 3])
1x3 Array{Int64,2}:
 1  4  9

I do hope that this is not the intended behaviour - should it be, this certainly needs to be added to the docs.

versioninfo:

Julia Version 0.3.5
Commit a05f87b* (2015-01-08 22:33 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU           W3530  @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

(Doesn't look like the relevant code has changed on master though, so it should 'work' there too)

The text was updated successfully, but these errors were encountered:

timholy · 2015-01-21T12:14:01Z

This is basically #265. (Like many julia developers, I didn't even have to look that up, I know that issue number by heart.) If you can solve that, @StefanKarpinski owes you a bottle of bourbon 😄.

As you noted, broadcasting keeps track of functions in a Dict; functions are hashed by object_id, and so what you're seeing comes down to this:

julia> f(x) = x^2
f (generic function with 1 method)

julia> object_id(f)
0x6582bd6b207f44ad

julia> f(x) = x^3
f (generic function with 1 method)

julia> object_id(f)
0x6582bd6b207f44ad

To me it looks as if it just comes down to the pointer.

pfitzseb · 2015-01-21T14:16:02Z

Yeah, I kinda expected #265 to be involved (as per Mikes comment).

Just a thought that would solve this problem:
If functions were hashed by their function body (or something) and their pointer, so that

object_id(f) == object_id(g)    =>    f(x) == g(x)  # or probably hash(f) == ...

(not <=>, though!), this issue would be solved. Not sure if there's a way to implement this and not get horrible performance issues... But since the function has to be parsed anyways, it might be possible to precompute the hash at compile time and keep it referenced? Probably a bad idea...

timholy · 2015-01-21T14:37:43Z

Many functions have many methods available. Which body were you planning on choosing for f?

But I should add that this is probably a "baby version" of #265, and might make a good step towards fixing it. The thing that makes this case far, far easier is that you explicitly call broadcast again, so it gives one an opportunity to intervene.

In what would be only a slight modification of your idea, conceivably one could just have a "version number" associated with each generic function which would be incremented each time a method was (re)defined, and hashing could incorporate the version number.

pfitzseb · 2015-01-21T14:55:59Z

The naive answer is "all of them", I guess. ;)
Two functions f and g are equal if and only if their pointer is the same (that's what hash does right now) and all of their methods have the same body, so that

f(x...) == g(x...)

for all possible x.

This idea might be expanded to introduce function hashing based only on the body of its methods, so that (in the case of broadcast) a cached function could be used even though it has a different pointer. Although this sounds pretty brittle, now that I think of it.

But I do think that introducing a "version number" per function is more elegant (and easier and faster, probably).

timholy · 2015-01-22T11:08:52Z

@Varanas, just to be clear: currently I'm not planning to tackle this, so if you need it I encourage you to give it a whirl.

MikeInnes · 2015-01-22T17:43:21Z

@timholy Mind if I ask how the caching in broadcast works in general? i.e. is it just memoising function/input combinations, or something else? (I'm guessing it's a way to avoid anonymous function slowness?)

This issue seems strange to me because I've never seen #265 happen with functions as values – so I'm wondering if it's caused only by the caching, independently of that issue. (Of course, if you're sure it's #265's fault, I'll believe you ;)

timholy · 2015-01-22T18:13:49Z

It's a close analog of #265, not #265 literally. The best answer to your question is here. Any questions? 😄

Briefly, what's happening is that you compile a (hidden) method specialized for (1) the input function (it inlines the input function, so it's fast), (2) the number of inputs arrays, and (3) the dimensionalities of the inputs. You stash this compiled function in a Dict so you can reuse it later; otherwise you'd have to pay the cost of compilation each time you want to do some broadcasting, and that would be awful.

The reason why it's similar to #265 is because julia's global method table is a lot like that Dict inside broadcast. Once you inline one function into another (which map doesn't do, which is why you've not seen this before), then #265-like behaviors pop up.

MikeInnes · 2015-01-24T10:52:55Z

Ah ok – thanks for the explanation. broadcast.jl has changed a lot since I last looked at it, and I could've spent a while digging through that code :)

If I understand correctly it sounds like it is essentially #265, in the sense that fixing 265 – i.e. recompiling the hidden method when the function it depends on is updated – would be sufficient to fix it.

It's actually possible to check if the method has been updated already, though. For example:

@which(fft([1])).func
(anonymous function)

This object will change if the method is updated, so broadcast could in principle check for changes and invalidate the cache if there are any. If I get some time I may have a go at this.

carlobaldassi · 2015-01-25T05:22:20Z

It's actually possible to check if the method has been updated already, though.

Just be aware of the performance issues which led to the current code, see JuliaLang/LinearAlgebra.jl#89 and #6107.

pabloferz · 2016-12-23T05:30:56Z

This should be closed now that #17057 is merged.

tkelman closed this as completed Dec 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function caching issue with broadcast #9869

Function caching issue with broadcast #9869

pfitzseb commented Jan 21, 2015

timholy commented Jan 21, 2015

pfitzseb commented Jan 21, 2015

timholy commented Jan 21, 2015

pfitzseb commented Jan 21, 2015

timholy commented Jan 22, 2015

MikeInnes commented Jan 22, 2015

timholy commented Jan 22, 2015

MikeInnes commented Jan 24, 2015

carlobaldassi commented Jan 25, 2015

pabloferz commented Dec 23, 2016

Function caching issue with broadcast #9869

Function caching issue with broadcast #9869

Comments

pfitzseb commented Jan 21, 2015

timholy commented Jan 21, 2015

pfitzseb commented Jan 21, 2015

timholy commented Jan 21, 2015

pfitzseb commented Jan 21, 2015

timholy commented Jan 22, 2015

MikeInnes commented Jan 22, 2015

timholy commented Jan 22, 2015

MikeInnes commented Jan 24, 2015

carlobaldassi commented Jan 25, 2015

pabloferz commented Dec 23, 2016