-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
headless anonymous function (->) syntax #38713
Comments
After that long, inconclusive debate, I think I've also come to the conclusion that having an explicit marker is better. Headless |
To clarify, the question is which of these
Both could potentially make sense. My suggestion is to make it an error and force the user to either use a normal lambda or not use |
One solution is to say that inside of a headless anonymous function
There is precedent for this in Clojure:
and Mathematica:
|
that's exactly what I meant when saying
so yes, I'd clearly be in favor of making them bind the tightest.
for types the 2nd approach should work in the most cases since |
regarding |
The other advantage of numbering is that it lets you re-use the same argument: sort(xs, by= -> _.re + _.im/100) # x -> x.re + x.im/100
sort(xs, lt= -> _1.re < _2.re) # (x,y) -> x.re < y.re Edit -- You could argue that the first line only saves one character, and the second 3 not 5. But what it does still save is the need to invent a name for the variable. Re-using the same symbol with different meanings in an expression seems confusing to me. Is it ever actually ambiguous? (The order of symbols in Edit' -- Not the most convincing example, but notice that 1,2,3 occur in order in the lowered version of this, as
|
You can continue to use In Mathematica's and Clojure So @mcabbott's first example works as-is and the second example can also be written as sort(xs, lt= -> _.re < _2.re) # (x,y) -> x.re < y.re |
to be honest I'm absolutely against making the single underscore refer to the same argument. Because we'd lose the entire convenience for the multi argument cases only to save 1(!!) character in rare situations... |
the numbering approach won't be possible until 2.0 anyway since _2 etc are currently valid identifiers and thus that would be breaking AFAIU |
I think it wouldn't be breaking if Good points re: just using an ordinary lambda. I'm curious what fraction of anonymous functions would be made shorter by the "headless" type. It'd be hard to measure accurately, since anonymous functions are used a lot in interactive non-package code. |
while certainly true for most characters, the underscore already has the meaning of |
I'm in agreement with @rapus95 here: let's stick to the simple win with |
Oh, one more thing to consider: interaction with collection |> -> map(lowercase, _) |> -> filter(in(words), _) We could either introduce a new pipe syntax as a shorthand for this, or say that collection |> map(lowercase, _) |> filter(in(words), _) This does allow using another headless lambda for the filter/map operation, like this for example: collection |> map(-> _ ^ 7, _) |> filter(-> _ % 3 == 0, _) Might want to do something similar with |
I love that idea tbh because it would give us some part of #24990 for "free". The only thing that holds me back is that, in that case applying the same rule to \circ probably make sense aswell and then it starts to feel like arbitrary special casing again... Btw, since the headless approach is currently primarily developed around multiple argument cases, it'd be very nice if we had a syntactical solution for splatting. Otherwise cases like EDIT: could we use |
Could syntax like |
I think argument 1 could be 1 underscore ( |
@bramtayl having every single underscore reference the same single argument doesn't scale well for the headless syntax. Read #38713 (comment) for why. |
Hmm, needing to use the same argument in multiple places happens all the time in querying though, I think. Consider processing a row of a table: you might need to reference several fields of the row. |
how many characters more would you need if you switch to using an ordinary lambda with a single letter variable name instead? (remember that you need the -> in any case) |
The extra |
a) but it doesn't scale into more different arguments |
It seems to me likely that wanting to make a two argument anonymous function will be much less common than wanting to reference an argument more than once |
@bramtayl do the maths yourself. it scales very bad (i.e.negatively) if you intend on using any other than the first argument more than once or if you intend to use more than 3 arguments _i for i being a single digit number would still be a better proposal for that case, both for scaling in number of uses per argument and number of arguments. But that can live in its own issue since it is orthogonal to the current proposal |
|
just include a lot of code samples which use reduce and similar functions. If you only include code that maps data it would be an unfair comparison. But this already shows what I'm talking about. We don't want a domain specific syntax feature in the general purpose language. And the queryverse definitively is domain specific. And it already has a macro for that exact case. Which doesn't seem like it made it outside of that domain. |
I think more fundamentally, the principle behind
(_, r) = divrem(1,2) foo(_) = 3
foo(nothing)
struct Foo
_::Int64
end
Foo(3)._ Note this usage is a counterexample to It seems (at least) roughly consistent with this principle that The use of
I agree that it would be more useful if each |
Hmm, well, a quick audit of non-single-argument-0-or-1-mention uses of
Multiple arguments
Reuses an argument
|
@goretkin that case is perfectly handled by interpreting it as 0-d and using @bramtayl would you be willing to translate these cases into both (or even better all 3 variants) i. e. |
Thanks for gathering these, @bramtayl. In the "multiple arguments" list, it looks like 12/28 don't follow the simple pattern of using every argument, exactly once, in order, and not as a type parameter. 2 of those simply drop trailing arguments, DetailsDrop last:simple_walk(compact, lifted_val, (pi, idx)->true) (mod, t) -> (print(rpad(string(mod) * " ", $maxlen + 3, "─"));Drop first or middle: retry(http_get, check=(s,e)->e.status == "503")(url) retry(read, check=(s,e)->isa(e, IOError))(io, 128; all=false) (io, linestart, idx) -> (print(io, idx > 0 ? lpad(cst[idx], nd+1)Shuffle: sum(map((i, s, o)->s*(i-o), J, strides(x), Tuple(first(CartesianIndices(x)))))*elsize(x)Re-use: afoldl((ys, x) -> f(x) ? (ys..., x) : ys, (), xs...) foldr((v, a) -> prepend!(a, v), iter, init=a) (io::IO, indent::String, idx::Int) -> printer(io, indent, idx > 0 ? code.codelocs[idx] : typemin(Int32))Type parameters: dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv)) dict_with_eltype((K, V) -> IdDict{K, V}, kv, eltype(kv)) Base.dict_with_eltype((K, V) -> WeakKeyDict{K, V}, kv, eltype(kv)) Simple cases (every parameter used exactly once, in order) [Edit -- now with some brackets removed]: (r,args) -> (r.x = f(args...)) (i,args) -> (itr.results[i]=itr.f(args...)) ((p, q) -> p | ~q ) ((p, q) -> ~p | q ) ((p, q) -> ~xor(p, q)) ((p, q) -> ~p & q) ((p, q) -> p & ~q) map((rng, offset)->rng .+ offset, I.indices, Tuple(j)) foldl((x1,x2)->:($x1 || ($expr == $x2)), values[2:end]; init=:($expr == $(values[1]))) CartesianIndices(map((i,j) -> i:j, Tuple(I), Tuple(J))) CartesianIndices(map((i,s,j) -> i:s:j, Tuple(I), Tuple(S), Tuple(J))) map((isrc, idest)->first(isrc)-first(idest), indssrc, indsdest) (x,y)->isless(x[2],y[2]) (x, y) -> lt(by(x), by(y)) (f, x) -> f(x) (f, x) -> wait(Threads.@spawn f(x)) ... which could become (with each -> (_.x = f(_...)) -> (itr.results[_]=itr.f(_...)) (-> _ | ~_ ) (-> ~_ | _ ) (-> ~xor(_, _)) (-> ~_ & _ ) (-> _ & ~_ ) map(->_ .+ _, I.indices, Tuple(j)) foldl(->:($_ || ($expr == $_)), values[2:end]; init=:($expr == $(values[1]))) CartesianIndices(map(-> _:_, Tuple(I), Tuple(J))) CartesianIndices(map(-> _:_:_, Tuple(I), Tuple(S), Tuple(J))) map(->first(_)-first(_), indssrc, indsdest) ->isless(_[2],_[2]) -> lt(by(_), by(_)) -> _(_) -> wait(Threads.@spawn _(_)) Interesting how few from either list above would be clearer (IMO) without naming variables -- most are quite long & complicated. So another possibility to consider is that this headless I'm not sure that counting characters saved is a great measure, as the cases where you could save the most letters also seem like the ones complicated enough that you ought to be explicit. [Nor is counting how many cases in Base, really.] But using
|
As it seems I'm missing core aspects on what makes the proposal difficult to understand for the user. Maybe some can help me get it. I'm assuming an unbiased user that didn't follow all the different ideas of how these lambdas could work and thus won't get lost in all the different possible meanings that were suggested in the past, but instead only gets to know this through the following description: Added a new syntax for lambdas for which the argument list is skipped. It is tailored to different situations:
further examples:
|
From my perspective it is not that the proposal is difficult to understand per se. Julia is a powerful language, which comes with a certain amount of complexity, and users manage that just fine. I think the key issue is the gain in function vs the added complexity, and tradeoffs between various alternatives (the multislot and the single argument versions are of course mutually exclusive). Also, I think that a (Incidentally, I find it confusing to switch syntaxes in the middle of a proposal like this.) |
@aplavin no, if you look at the code you quoted they are suggesting that
@rapus95 I have no problem personally, with adding more handy syntax because I've already learned our current syntax, so this is just a small bite sized addition for me to learn. However, that's not the case for everyone, specifically new users. I think it's important to not consider new syntax in isolation, but to consider the entire pile of special syntax we already have in addition to the proposed new syntax. We should think about this from the perspective of beginners learning the language, not from the perspective of experienced users. Julia's syntax is already quite complicated, and adding new syntax rules will make learning our syntax even harder for new users. The more special syntax we have, the less willing we should be to add more special syntax on top of it. |
Actually, I just realized that we don't really need to solve #36547, and we can actually just replace thist using MacroTools
@eval macro $(:_)(ex)
@gensym x
if ex isa Expr && ex.head == :tuple
pre_body, rest... = ex.args
else
pre_body = ex
rest = ()
end
body = MacroTools.postwalk(pre_body) do ex
ex == :_ ? x : ex
end
λ = :($x -> $body)
if length(rest) == 0
esc(λ)
else
esc(:(($λ, $(rest...))...))
end
end Behold: julia> map(@_ _[1], [[1,2,3], [4,5,6]])
2-element Vector{Int64}:
1
4 No parser changes required. |
Macros kind of work "outside in" but parsing kind of works "inside out". In this case, if this was done by the parser, |
Parsing also definitely works "outside in", and has to take the same care that a macro would have to take to attach |
Hmm, maybe I meant symbol resolution works inside out? |
This macro would simply work by macroexpanding any macros it finds inside itself. It'd be the same as the proposed syntax here unless I'm missing something. |
💯 I would go farther though. Less noise is better for everybody, not just somebody in their first week of learning Julia. |
Ok, but what if someone else writes a new macro that uses |
Someone could also quite easily write a macro that doesn't play well with this PR in the same way. |
Okay, I've made https://github.com/MasonProtter/SimpleUnderscores.jl, @bramtayl or anyone else interested in this syntax please feel free to poke around with it and see if it fails in any obvious ways. |
Whew, I missed that there's no dot between |
I personally don't like that the idea is to dedicate a syntax to replace x with an underscore. Because that's all, that your proposal could do. IMO there should be more benefit in this. And I'd like to have something that's compatible with DataFrames.jl and the higher order functions (especially able to create binary functions) |
I'm not sure I understand how the proposed Although I really would like to have concise lambdas, I find that anything that is not 100% obvious and transparent (why should two visually identical EDIT: Mmm, perhaps I indeed misunderstood, and |
My understanding is that it is the same as
But that isn't: julia> let x = 1
x -> x,x
end
(var"#5#6"(), 1) |
I'll keep producing ideas hoping there'll once be one that serves most of us. ->_+_ == x->x+x
1->_+_ == error
2->_+_ == (x, y)->x+y
3->_+_ == (x, y, z)->x+y Then I'd have my binaries by prepending a 2 and also something that works with DataFrames.jl while you still have your syntax available |
But you proposed to replace I don't think this kind of syntax is necessary at all in Julia, agree with @tpapp and others above in that. If this macro behavior is indeed officially supported, it may become popularized and see more usage in packages.
I don't propose to add
DataFrames often choose unique special syntax that has no equivalent elsewhere in Julia. Design of that and other packages isn't set in stone, and can be influenced by Julia changes. For example, at some point they may decide to replace In higher-order data processing functions with regular Julian interface, "underscore = the only argument" is most often convenient. |
My problem with this family of proposals is the meandering brainstorming that they degenerate to. Discussion is fine, and people should of course comment if they want to make a point, but if that leads to major changes a new issue should be opened IMO. Reading a comment stream which has various proposals floating around without a resolution is very confusing, especially in a discussion that goes on for years. |
yes, as a complementary proposal to extend the original one for better synergy. Not as the only feature |
I think this syntax should be a macro for the following reasons:
|
I've had a somewhat similar macro in LightQuery for a few years and it didn't seem to catch on (I haven't really been maintaining it). It uses double underscore instead of |
Ah yes, I had forgotten about the one in LightQuery.jl. I think so far as I recall, people didn't like that you had to write map(@_(1 + _), v) instead of map(@_ 1 + _, v) since |
|
Indeed, not having to put this in the core language at this point would allow fuller exploration of this syntax without the usual constraints of having stuff in Julia. So this would be a great advantage. Thanks for making a package! |
This PR adds support for parsing `.a` as `x->x.a`. This kind of thing has come up multiple times in the past, but I'm currently finding myself doing a lot of work on nested structs where this operation is very common. In general, we've had the position that this kind of thing should be a special case of the short-currying syntax (e.g. #38713), but I actually think that might be a false constraint. In particular, `.a` is a bit of a worst case for the curry syntax. If there is no requirement for `.a` to be excessively short in an eventual underscore curry syntax, I think that could open more options. That said, any syntax proposal of course needs to stand on its own, so let me motivate the cases where I think this plays: A. Curried getfield I think this is probably the most obvious and often requested. The syntax here is very useful for situations where higher order functions operate on collections of records: 1. `map(.a, vec)` and reductions for getting the fields of an object - also includes things like `sum(.price, items)` 2. Predicates like `sort(vecs, by=.x)` or `filter(!.deleted, entries)` 3. In pipelines `vecs |> .x |> sqrt |> sum` I think that's mostly what people are thinking of, but the use case for this syntax is more general. B. A syntax for lenses Packages like Accessors.jl provide lens-like abstractions. Currently these are written as `lens = @optic _.a`. An example use of Accessors.jl is (from their documentation) ``` julia> modify(lowercase, (;a="AA", b="BB"), @optic _.a) T("aa", "BB") ``` This PR can be thought of as providing lenses first class syntax, as in: ``` julia> modify(lowercase, (;a="AA", b="BB"), .a) T("aa", "BB") ``` C. Symbol index generalization to hierachical structures We have a lot of packages in the ecosystem that support named axes of various forms (Canonical examples might be DataFrames and NamedArrays, but there's probably two dozen of these). Generally the way that this syntax works is that people use quoted symbols for indexing: ``` df[5, :col] ``` However, this breaks down when there is hierachical composition involved. For example, for simulation models, you often build parameter sets and solutions out of hierarchies of simpler models. There's a couple of solutions that people have come up with for this problem: 1. Some packages parse out hierachy from symbol names: `sol[:var"my.nested.hierachy.state"]` 2. Other packages have a global root object: `sol[○.my.nested.hierarchy.state]` 2a. A variant of this is using the object as its own root `sol[sol.my.nested.hierarchy.state]` 2b. Yet another variant is having the root object be context specific `sol[sys.my.nested.hierarchy.state]` 3. Yet other packages put symbolic names into the global namespaces `sol[my.nested.hierarchy.state]` These solutions are all lacking. 1 requires string manipulation for composition, the various variants of 2 are ok, but there is no agreement among packages what the root object looks like or is spelled, and even so, it's an extra export and 3 pollutes the global namespaces. By using the same mechanism here, we essentially standardize the solution `2`, but make the root object implicit.`
Since #24990 stalls on the question of what the right amount of tight capturing is
Idea
I want to propose a headless
->
variant which has the same scoping mechanics as(args...)->
but automatically collects all not-yet-captured underscores into an argument list. EDIT: Nesting will follow the same rules as variable shadowing, that is, the underscore binds to the tightest headless->
it can find.lfold((x,y)->x+2y, A)
lfold(->_+2_,A)
lfold((x,y)->sin(x)-cos(y), A)
lfold(->sin(_)-cos(_), A)
map(x->5x+2, A)
map(->5_+2,A)
map(x->f(x.a), A)
map(->f(_.a),A)
Advantage(s)
In small anonymous functions underscores as variables can increase the readability since they stand out a lot more than ordinary letters. For multiple argument cases like anonymous functions for reduce/lfold it can even save a decent amount of characters.
Overall it reads very intuitively as start here and whatever arguments you get, just drop them into the slots from left to right
Sure, some more complex options like reordering (
(x,y)->(y,x)
), ellipsing ((x...)->x
) and probably some other cases won't be possible but if everything would be possible in the headless variant we wouldn't have introduced the head in the first place.Feasibility
->
and an_
as the right hand side (value side) error on 1.5 so that shouldn't be breaking.2a) switch between both variants mentally
2b) reuse most of the current parser code and just extend it to collect/replace underscores
Compatibility with #24990
It shouldn't clash with the result of #24990 because that focuses more on
tight single argumentvery tight argument cases. And even if you are in a situation where the headless->
consumes an underscore from #24990 unintentionally, it's enough to just put 2 more characters (->
) in the right place to make that underscore once again standalone.Further Explorations
This proposal can optionally be combined with #53946.
Additionally, the following links to comments further down explore different ideas to stretch into, all adding their own value to different parts of the ecosystem.
Alternative explorations: #38713 (comment) #38713 (comment)
The text was updated successfully, but these errors were encountered: