Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast should treat types as "scalar-like" #16966

Closed
davidanthoff opened this issue Jun 16, 2016 · 23 comments
Closed

broadcast should treat types as "scalar-like" #16966

davidanthoff opened this issue Jun 16, 2016 · 23 comments
Labels
broadcast Applying a function over a collection

Comments

@davidanthoff
Copy link
Contributor

Once #4883 is fixed, this would enable things like:

x=["1","2"]
y=parse.(Int64,x)
z=convert.(Float64,y)

These would be super handy when one is e.g. converting types of a DataFrame etc.

@tkelman
Copy link
Contributor

tkelman commented Jun 16, 2016

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 18, 2016

I don't like this behaviour at all. This will be terribly incompatible with map, which doesn't do this. Currently we throw an error, which is at least explicit. Think of the behaviour on collection-like scalars, such as String or Tuple. It would be horrible for length.(([1, 2], [3, 4], [5, 6])) to give 3, or even worse, [3]!

Why not something like

cell(x) = setindex!(Array(typeof(x)), x)

so we can explicitly do broadcast(cell(Int64), x)? cell is deprecated now so it can return by 0.6. It could be made into a type that behaves like the zero-dimensional array but is much faster.

@tkelman
Copy link
Contributor

tkelman commented Jun 18, 2016

I don't like using the cell name for that. Doesn't Ref already do that?

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 18, 2016

But Ref does not behave like an array type, and I'm not sure it's a good idea to overload its meaning that way.

@nalimilan
Copy link
Member

@TotalVerb map doesn't broadcast, so it's logical that it shouldn't work. I don't see the problem. And what's the relation with length.(([1, 2], [3, 4], [5, 6]))?

@TotalVerb
Copy link
Contributor

Isn't it better for that to throw an error than to return 3?

@nalimilan
Copy link
Member

Why would it return 3?

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 18, 2016

Because Tuple is a not an AbstractArray, so is "scalar-like", and will be auto-promoted to an array of tuples. And otherwise we would have to special-case Tuple as "vector-like". And same goes for String, or Set, or Dict, etc.

@nalimilan
Copy link
Member

But I don't think anybody argued that tuples should be treated as scalars.

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 18, 2016

Then what should be treated as a scalar? This is not clear cut. What is a scalar sometimes is not a scalar other times.

For instance, are strings scalars? What should broadcast(*, ["Hello", "Goodbye"], " World!") produce? Or are strings vectors? What about broadcast(+, 1, "Hello")?

Are sets scalars? What of broadcast(union, Set([1, 2, 3]), [Set([4]), Set([5])])? How about broadcast(x -> x^2, Set([1, 2, 3]))?

My point is that whether something is a scalar or not is not a property of the type. Any division of types into "scalar" and "tensor" is going to be inaccurate in some cases. In x + y both x and y are scalars, even if they are of type Vector. In x .+ y, they are both tensors, even if they are of type Number (since Number implements the tensor interface). And that is despite these being the same operation.

@nalimilan
Copy link
Member

Unfortunately, we don't have traits yet, but there's a section of the manual about collections, and everything listed there shouldn't be treated as a scalar: basically, everything that supports the iteration protocol or getindex. That clearly includes sets, tuples, dictionaries, and strings. String are a weird kind since I'd often like to treat them as scalars, but I think for consistency we need to consider them as collections everywhere.

@StefanKarpinski
Copy link
Member

There's a pretty solid argument to be made for making strings scalar and requiring chars(str) to iterate the characters of a string (similarly graphemes(str), etc.).

@davidanthoff
Copy link
Contributor Author

Could someone triage this? I.e. assign some labels and maybe a milestone to this? #16986 seems to solve this, but neither one is assigned any label/milestone, so probably both are being missed in any release planning process.

@tkelman
Copy link
Contributor

tkelman commented Jun 27, 2016

What label do you propose? I don't think this is release-blocking.

@davidanthoff
Copy link
Contributor Author

Definitely not release blocking, I'd say.

The current system of labels and milestones used in the repo is a bit of a mystery to me, so hard to tell what it should get assigned, but not having anything seems not ideal for something that is actively being worked on and therefor might go into a 0.5 release or not.

@stevengj
Copy link
Member

stevengj commented Jul 25, 2016

It would be nice to get this resolved for #16285, because in Julia 0.6 we were planning to make a .+ b etcetera be synonyms for (+).(a,b) and hence be a fusing broadcast operation. When I tried to implement this, one of the first problems that arose was the .== operator, because there is existing code that uses e.g. symbolarray .== symbol.

My first inclination is that broadcast should default to treating things as scalars, and only for certain exceptions like AbstractArray and perhaps Tuple should it iterate.

@stevengj
Copy link
Member

I don't think that we necessarily want map and broadcast to be consistent here. (If their behavior is identical, why even have both functions?) broadcast should be for "iterables with shapes" and should default to treating arguments as scalars, and map should be for "iterables without shapes" and should default to treating arguments as iterables.

@StefanKarpinski
Copy link
Member

That seems like a solid distinction to me and well expressed.

@ChrisRackauckas
Copy link
Member

Would it be crazy to propose an alternative to . as an operator for map?

@TotalVerb
Copy link
Contributor

I quite like using a binary operator for map in my code (usually a variant of the compose circle). I don't think we need more dot-like syntax.

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Sep 16, 2016

I'm just thinking that slightly expanding this broadcasting idea to a few other operators could lead to some really slick notation not just for mapping, but also for doing embarrassingly parallel/multithreaded operations if combined with some ideas from #17887. Running out of characters that would look nice, but a version of A.*B.+sin.(C) which automatically fuses the operations and applies a parallel map would be the sickest feature ever. And that would all be possible if there was just a way to specify it being map, pmap, pbroadcast, etc. (or a macro could be used to specify the "apply" function that the . operator does. That could be a cheap fix that would be easy to implement in a package).

@stevengj
Copy link
Member

@ChrisRackauckas, I agree; see the discussion at the end of #16285. The trick is getting the details right.

@stevengj
Copy link
Member

Closed by #16986 (@pabloferz), I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection
Projects
None yet
Development

No branches or pull requests

7 participants