Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to define "dot" vectorized operators. #14544

Closed
Ismael-VC opened this issue Jan 3, 2016 · 12 comments
Closed

Allow users to define "dot" vectorized operators. #14544

Ismael-VC opened this issue Jan 3, 2016 · 12 comments

Comments

@Ismael-VC
Copy link
Contributor

I think it wold be nice if users could define their own dot vectorized operators, some examples are in (.in/.∈), and is (.is / .===). This is the more julian thing to do IMHO (think of identifiers with a bang also foo!), this would also allow us to use this functions as infix operators, similar to the now vast array of symbols that we can choose from as infix operators for user defined purposes.

Refs:

This would be an easy solution for the user needs (see DataFramesMeta suggestion to use a macro within a macro and change non dot operator semantics, when a function would be enough) without having to implement such methods in Base, selectively add (and debate) such operators for just a few of them (most would never make it into Base), instead this proposal would generalize dot vectorized operators into a common julian thing to do.

cc @JeffBezanson @StefanKarpinski @nalimilan

@tkelman
Copy link
Contributor

tkelman commented Jan 3, 2016

See #8450. There's more of a desire to get rid of dot-operators and vectorization in general than to add more of them.

@Ismael-VC
Copy link
Contributor Author

@Ismael-VC
Copy link
Contributor Author

@tkelman after checking #8450, I red this comment from Jeff: #8450 (comment)

Paying attention to the next two comments:

@JeffBezanson I'm not sure what's the intention of your comment. Has anybody suggested to get rid of the .* syntax? -- nalimilan


No; I'm focusing on f and g here. This is an example where you can't just add over x at the end of the line. -- JeffBezanson

I think that my proposal would make a lot of sense now at least with the allowed Unicode operators, ie allow .∈ but not .in. #8450 is more than a year old and no concensus seems to be in sight, also I can't tell from that issue if map is already as fast as wanted in order to proceed with #8450.

Fragment from the reference in my last comment:

Similar to Jeff's remark above, there is no question that e.g. +̂ is an infix operator, so the only things you can do are either to disallow it or to parse it sensibly, and there is no reason that I can see not to parse it sensibly (e.g. the precedence is obvious). Similarly, if we are going to allow < and .<, then it doesn't make sense to me to allow ≪ but not .≪ etcetera. -- stevengj

@tkelman
Copy link
Contributor

tkelman commented Jan 3, 2016

#12406 (comment)

You should also watch Jeff's JuliaCon talk for context here. Faster map is imminent, ref #13412. Support in the parser for dot versions of more of the unicode infix operators that don't have existing definitions would probably be considered. Vectorized versions of operators that already operate on collections, not so much.

@Ismael-VC
Copy link
Contributor Author

Would it be more appealing if the PR is reduced just to the changes in the parser, so that .∈ is treated as a valid operator syntax? Then it would still be possible to define .∈ in the package(s).

@JeffBezanson If #8450 would lead to some new syntax for an easy vectorization, it would be fantastic.
The current state is, sort of, ambivalent: the wide unicode palette is made available for writing concise code, but in practice it results in more obscure and redundant lines (some DataFrame-insipred example below):

d[:is_selected] = map(id -> id ∈ sel_ids, d[:id]) & map(t -> t ∈ sel_types, d[:type])

@alyst got no response so I'd like to ask Jeff the same question again.

Yes, let's see if it's a real pain-point and at that point, it will be clearer what to do.

It is a real pain sometimes IMHO, at least with the DataFrames example, the last comment is by Steffan from 2013, DataFrames is from 2012, I didn't use DataFrames nor Julia back then, but I can tell this has been a pain for quite a while now.

Support in the parser for dot versions of more of the Unicode infix operators that don't have existing definitions would probably be considered.

I'm talking about generality, If by more you mean everyone of them, fine. If not, then I agree with Jeff not to special case any more. Currently I am able to choose the meaning I want for any allowed Unicode symbol, I just want to be able to have dot operators for every one of such symbols, so I can also do whatever I want with them.

I'm not asking to define each one of them in Base nor any at all, but to be able to define them for ourselves in our packages as we see fit.

Vectorized versions of operators that already operate on collections, not so much.

I don't want this in Base, but I still want to be able to use it by myself or in packages.

So with #13412 are dot operators going to disappear? If not, I insist to reconsider this for the sake of consistency.

I'm not a big fan of this.

Now I'm curious about the opinions and needs of any of the other ±447 contributors (which should be/are considered equally?). # not even talk about non contributors.

Some of them have already expressed contrary opinions to Jeff's of course. So there has been a need indeed for years now.

I'll go look that video now, thanks.

@tkelman
Copy link
Contributor

tkelman commented Jan 3, 2016

I can't find whether the proposed implementation #6929 (comment) was ever submitted as a PR.

Would it be more appealing if the PR is reduced just to the changes in the parser

No one responded to this so I think the answer is effectively "if you submit a PR (or make that change to the existing one) we'd consider it."

Consistency in terms of parser behavior is good in principle, if someone wants to implement the change. #13412 may change the performance justification for needing dot-operators as separate syntax, as they could be replaced by broadcast. But they likely won't be deprecated or anything right away, since people do seem to like the syntax - quoting Jeff from #8648 (comment), "You humans and your infix."

@alyst
Copy link
Contributor

alyst commented Jan 3, 2016

@tkelman

No one responded to this so I think the answer is effectively "if you submit a PR (or make that change to the existing one) we'd consider it."

With all my real and sincere respect to the core developers of Julia and their fantastic work, I think responding something like "That seems reasonable" in that thread would have make sense. Unless the guidelines for contributing explicitly mention "If your idea didn't get feedback, go ahead and implement it".

@tkelman
Copy link
Contributor

tkelman commented Jan 3, 2016

People are busy and things fall off the radar, sorry. There are a lot of issues and PR's and only so many person hours going into triage and review. I think adding a single new operator to the parser would be seen as a pretty minor change, adding more for consistency might get more feedback.

@nalimilan
Copy link
Member

I think we need a general solution to the vectorization issue (#8450). Adding dot versions of each operator won't solve the problem of functions that are not operators.

@Ismael-VC
Copy link
Contributor Author

I think that with hundreds of symbols to choose from, others surely can have more room for creativity, in your case I have suggested using \smallin, yet using a dotted version of in (.∈) would be better if available, because others would instantly know it's working element-wise.

Also, It's possible to alias a function that is not an operator, with an operator, as shown in #8450:

julia> foo(x) = x^2*3x
foo (generic function with 1 method)
julia>  = map
map (generic function with 39 methods)

julia> foo  1:5    # \Mapsto
5-element Array{Int64,1}:
   3
  24
  81
 192
 375

We could define or alias non operator function with ie, and also define ie, .⟰ to be the element-wise version for example. I don't know what is or what it could be used for without looking at it's documentation, but after seeing .⟰ I would instantly know it's the element-wise operator of .

@JeffBezanson
Copy link
Member

I think this can be separated into 2 proposals:

  1. Allow identifiers to begin with .
  2. Allow every operator to be prefixed with ., yielding another operator with the same fixity and precedence.

My general, non-committal response is that I just don't think elementwise-ness is that important. I do think it's important enough for map and its cousins (comprehensions, broadcast) to have special syntax. But changing the lexical syntax of identifiers for this seems like overkill.

My preference is not to have any dot operators, but many of them just have to be there by tradition, like .*. Longstanding tradition and/or having many use cases overcomes my objection.

Yes, it's true that I'm fine with having tons of operators available to be defined. However, I think there needs to be some notion of what they might mean, or that they make sense in some way. If we agree to x .op y meaning broadcast(op, x, y), it could start to make sense. We could even make it syntax for that, similar to how x+=1 expands to x=x+1.

It could be awkward if many, but not all, dot operators are defined in Base. Currently it's a small fixed set, summarized as "the simplest math operators". But if "dot operator" became a more generally-used concept, we'd have the problem of it being hard to guess which are defined.

Here's another way to look at it. There are close to zero good standard ways to augment identifier names, like the ! suffix. Dot prefixes might be possible too. But how many other things like this are there? Almost none. So why should we spend this exceedingly rare syntax on broadcast? It doesn't quite seem right to me for mutation and broadcasting to be the two concepts that get this level of syntactic privilege. I think it would be better to focus on picking good, composable syntax for map and broadcast.

@stevengj
Copy link
Member

stevengj commented Aug 2, 2016

This is closed by #17393, which parses .⨳ as a binary operator for most operators . So, in 0.5 you can overload them as you wish. However, in 0.6 (due to #17623) you won't need to overload them — x .⨳ y will automatically call broadcast(⨳, x, y).

@stevengj stevengj closed this as completed Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants