Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow slurping in any position #42902

Merged
merged 19 commits into from
Apr 8, 2022
Merged

allow slurping in any position #42902

merged 19 commits into from
Apr 8, 2022

Conversation

simeonschaub
Copy link
Member

@simeonschaub simeonschaub commented Nov 2, 2021

This extends the current slurping syntax by allowing the slurping to not
only occur at the end, but anywhere on the lhs. This allows syntax like
a, b..., c = x to work as expected.

The feature is implemented using a new function called split_rest
(definitely open to better names), which takes as arguments the
iterator, the number of trailing variables at the end as a Val and
possibly a previous iteration state. It then spits out a vector
containing all slurped arguments and a tuple with the n values that get
assigned to the rest of the variables. The plan would be to customize
this for different finite collection, so that the first argument won't
always be a vector, but that has not been implemented yet.

split_rest differs from rest of course in that it always needs to be
eager, since the trailing values need to be known immediately. This is
why the slurped part has to be a vector for most iterables, instead of a
lazy iterator as is the case for rest.

Mainly opening this to get some feedback on the proposed API here.

Closes #43413

@simeonschaub simeonschaub added feature Indicates new feature / enhancement requests compiler:lowering Syntax lowering (compiler front end, 2nd stage) labels Nov 2, 2021
@simeonschaub
Copy link
Member Author

Ok, I think this should be mostly working now. Before I add some tests, add some more overloads and finish this off, I would like some feedback on the proposed API here from triage:

  • Is this the behavior people would expect, i.e. always return a vector in the fallback case? (I do still mean to add some overloads for the more common collections.)
  • Is the use of Val here warranted? I didn't trust constant propagation quite enough to achieve type stability in more complex cases, but I know it has improved significantly lately, so maybe I'm just too pessimistic. Since the number of variables should always be static, I don't believe this use of Val is too problematic though.
  • I did say above that the second return value of split_rest would always be a tuple, but the current lowering implementation does not actually assume that, so it could be any iterator. In fact, maybe the generic fallback should not return a tuple but array instead? That would always result in a small additional allocation, but is probably preferable for iterators with non-concrete element types.
  • The current implementation tries to be smart in cases like a, b..., c = 1, 2, 3, 4 and treats it as a = 1; b..., = 2, 3; c = 4. This does mean it can be quite difficult to tell semantically whether destructuring is using rest, split_rest or neither like in this example. Since this only affects tuples for which those all should behave exactly the same, I don't really expect this to be an issue though,
  • What do people think of the name? I'm certainly not set on it.

@simeonschaub simeonschaub added the triage This should be discussed on a triage call label Nov 4, 2021
@Seelengrab
Copy link
Contributor

What do people think of the name? I'm certainly not set on it.

Maybe middle_slurp, since this allows for slurping in the middle of the LHS? split_rest sounds like splitting the rest of the iterator and to me doesn't really make it clear what it's for. Another idea: split_slurp_rest, to make it clear that it's the rest of the slurp that gets split.

The current implementation tries to be smart in cases like

Can this be smart for things that return a (Abstract)Vector on the RHS as well? I suspect this convenient syntax would be used for getting the first & last element of a vector at the same time, which (as I understand it) would be costly with the general iterator fallback.


Other than that, this looks nice!

@timholy
Copy link
Member

timholy commented Nov 4, 2021

I do like this. I suppose it's predictable the next request will be to support a, b::Symbol..., c::Int... = itr, so it might be worth thinking a bit about where this kind of functionality will require limits.

@simeonschaub
Copy link
Member Author

Can this be smart for things that return a (Abstract)Vector on the RHS as well? I suspect this convenient syntax would be used for getting the first & last element of a vector at the same time, which (as I understand it) would be costly with the general iterator fallback.

No, array literals do promotion, so we can't do that in this case. We can definitely add a specialized version of split_rest for arrays, although I don't think the current version will perform that badly.

I suppose it's predictable the next request will be to support a, b::Symbol..., c::Int... = itr, so it might be worth thinking a bit about where this kind of functionality will require limits.

I'm afraid it might already be a little late for that since we do already support type annotations for slurping in the last argument, where it annotates the type of the slurped iterator itself though, not the element type. In this PR, we get the same behavior since lowering always works recursively.

@Seelengrab
Copy link
Contributor

No, array literals do promotion, so we can't do that in this case. We can definitely add a specialized version of split_rest for arrays, although I don't think the current version will perform that badly.

Ah, I didn't even think about array literals, only about arrays returned from functions! That should still be a O(1) case instead of O(n) in theory, right? The question is to view or not to view the slurped part - I think the former would be better, to keep allocations low for people wanting to use the higher abstraction.

@simeonschaub
Copy link
Member Author

That should still be a O(1) case instead of O(n) in theory, right?

Yeah, probably.

The question is to view or not to view the slurped part

We already decided against using views in #37410, so we probably don't want to rehash that discussion.

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 4, 2021

We already decided against using views in #37410, so we probably don't want to rehash that discussion.

I may be reading the PR wrong or misunderstanding you, but I was thinking that e.g.

a, slurp..., b = f() # f returns a Vector{Int}

could result in Tuple{Int, SubArray{...}, Int}, which would destructure to the a, slurp, b part (the size is known statically after all, just like the indices required for the view in the slurped part). If I'm misunderstanding and this was exactly what was talked about, sorry for the noise!

@simeonschaub
Copy link
Member Author

could result in Tuple{Int, SubArray{...}, Int}, which would destructure to the a, slurp, b part

That tuple is never created, but semantically you can probably think about it this way. What I'm saying is that slurp should never just be a view as in this case, since it would be non-obvious that
mutating slurp can have undesired side effects. We discussed that on the triage call I believe for #37410.

@StefanKarpinski
Copy link
Member

Triage approves of the concept. @JeffBezanson would like to review the implementation.

This extends the current slurping syntax by allowing the slurping to not
only occur at the end, but anywhere on the lhs. This allows syntax like
`a, b..., c = x` to work as expected.

The feature is implemented using a new function called `split_rest`
(definitely open to better names), which takes as arguments the
iterator, the number of trailing variables at the end as a `Val` and
possibly a previous iteration state. It then spits out a vector
containing all slurped arguments and a tuple with the n values that get
assigned to the rest of the variables. The plan would be to customize
this for different finite collection, so that the first argument won't
always be a vector, but that has not been implemented yet.

`split_rest` differs from `rest` of course in that it always needs to be
eager, since the trailing values need to be known immediately. This is
why the slurped part has to be a vector for most iterables, instead of a
lazy iterator as is the case for `rest`.

Mainly opening this to get some feedback on the proposed API here.
base/array.jl Outdated Show resolved Hide resolved
@simeonschaub simeonschaub changed the title RFC: allow slurping in any position allow slurping in any position Jan 11, 2022
@simeonschaub simeonschaub added needs docs Documentation for this change is required needs news A NEWS entry is required for this change and removed triage This should be discussed on a triage call labels Jan 11, 2022
@simeonschaub simeonschaub marked this pull request as ready for review January 11, 2022 17:50
@simeonschaub simeonschaub added this to the 1.8 milestone Jan 11, 2022
@simeonschaub simeonschaub removed needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Jan 20, 2022
@simeonschaub
Copy link
Member Author

Bump

@vchuravy vchuravy removed this from the 1.8 milestone Feb 14, 2022
@JeffBezanson JeffBezanson self-assigned this Apr 6, 2022
base/strings/basic.jl Outdated Show resolved Hide resolved
@assert _check_length_split_rest(length(s), n)
end
last_n = SubString(s, nextind(s, i), lastind)
front = s[begin:i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use either indexing or SubString for both values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my reasoning was that the type of last_n doesn't really matter as long as it iterates correctly, so we might as well avoid the extra allocation. I guess we could make front a SubString as well, but that might be wasteful for small string if n is not much smaller than the string length, since we can't free the original string

@simeonschaub simeonschaub reopened this Apr 6, 2022
(list assigns)))
(n (length lhss-))
(st (gensy))
(end (list after))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, would be clearer to make this a mutable variable than a 1-element list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, it is also passed to destructure- 😭

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree it's a little bit awkward, but couldn't think of something more elegant

@sprmnt21
Copy link

sprmnt21 commented Jul 20, 2022

I don't know if this is the right place to make the following remark / request, but this is where I saw (coming from here )that the new split_rest function has been defined.
I have had some (quite a few) occasions of wanting to split on a positions basis.
I think it would be useful in some cases to have a more general function, such as split_at_pos (itr, pos ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:lowering Syntax lowering (compiler front end, 2nd stage) feature Indicates new feature / enhancement requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow ... in non-final assignment location
7 participants