Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow x.y .= z not to use getproperty(x, y) #36741

Closed
bkamins opened this issue Jul 20, 2020 · 19 comments · Fixed by #39473
Closed

Allow x.y .= z not to use getproperty(x, y) #36741

bkamins opened this issue Jul 20, 2020 · 19 comments · Fixed by #39473
Labels
broadcast Applying a function over a collection compiler:lowering Syntax lowering (compiler front end, 2nd stage)

Comments

@bkamins
Copy link
Member

bkamins commented Jul 20, 2020

When you write:

x[idx] .= z

it is processed as:

Base.materialize!(Base.dotview(x, idx), Base.broadcasted(Base.identity, z))

which allows packages to give a special meaning toBase.dotview(x, idx), which is important if they want to define types that are broadcasting aware (as it allows to specialize the behavior of dotview).

On the other hand:

x.field .= z

is processed as:

Base.materialize!(Base.getproperty(x, :field), Base.broadcasted(Base.identity, z))

which does not allow to introduce a special broadcasting behavior in this case, as we call just getproperty (this is as if we called getindex instead of dotview in the first example).

The solution would be to use some intermediate function (name is tentative) with a defualt definition:

maybegetproperty(x, y) = getproperty(x, y)

that could get a special implementation in packages if needed.


A concrete use case is in DataFrames.jl, which allows:

df[:, :newcol] .= 1

to add a new column to the df DataFrame, while:

df.newcol .= 1

errors if :newcol does not exist in df.


@tkf you had a more more general idea what could be done here. Can you please comment on this?

@mbauman mbauman added the broadcast Applying a function over a collection label Jul 20, 2020
@mbauman
Copy link
Member

mbauman commented Jul 20, 2020

Interesting. The reason-for-being for dotview wasn't package extension, but rather the sometimes-scalar sometimes-nonscalar aspects of indexing. That's why getproperty wasn't similarly special-cased. That said, I fully appreciate the motivation here.

@stev47
Copy link
Contributor

stev47 commented Jul 21, 2020

You can already specialize Base.getproperty to return some custom object upon which you can dispatch Base.materialize!, why wouldn't that work in your example?

@bkamins
Copy link
Member Author

bkamins commented Jul 21, 2020

@stev47 - as @mbauman commented: broadcasting allows x[...] to have a context dependent meaning on LHS (because it was required even in Base, otherwise dotview would not be needed). On the other hand, it was not needed in Base for x.y to have a context dependent meaning, but in general it can be needed in packages (and it is in DataFrames.jl).

In particular - have a look at the example above. df.x should error if there is no :x column in data frame df. On the other hand, we want df.x .= 1 to work and create a new column :x filled with as many 1s as is the value of size(df, 1).

@stev47
Copy link
Contributor

stev47 commented Jul 21, 2020

I see, so the proposal is to allow a.x .= b to be different from c = a.x; c .= b.
One doesn't usually associate field access with context dependent broadcasting semantics, so the user might be confused by this (in the array case the syntax is about indexing and well-known for mutable access). Your example makes sense though if you absolutely want to avoid something more explicit like e.g. a macro @new df.x .= 1.

@bkamins
Copy link
Member Author

bkamins commented Jul 21, 2020

the user might be confused by this

This is what we thought as developers of DataFrames.jl, but in practice users report that they are confused that df.x .= 1 does not work, when :x is missing in a data frame (actually this is the question that is one of the most frequently asked). This is especially confusing since df.x = [1,2,3] works and is a normal method to add a column to a data frame using setindex!.

@mbauman mbauman added the triage This should be discussed on a triage call label Jul 21, 2020
@mbauman
Copy link
Member

mbauman commented Jul 21, 2020

This seems simple enough to me and should be entirely nonbreaking. I think it's worth doing.

@CameronBieganek
Copy link
Contributor

I can't say I've mastered the ins and outs of indexing in DataFrames.jl, but this seems too magical to me. How can you do in-place assignment to a column that doesn't exist yet? If anything, I would argue that the better way to resolve the mentioned inconsistency is to make df[:, :newcol] .= 1 throw an error.

@bkamins
Copy link
Member Author

bkamins commented Jul 22, 2020

See JuliaData/DataFrames.jl#1961 for a long discussion on this. Initially the design disallowed it. The main reason for allowing it is that people find it natural in generic code to write df[:, col_name_as_variable] .= 1 and expect it to work without error without having to check if a column to with col_name_as_variable points exists or not.

In general, the main reason for request for df.newcol .= 1 is not to "resolve inconsistency", but rather that many people find pattern df.newcol .= 1 the most natural to use (even without thinking about df[:, :newcol] .= 1).

@JeffBezanson
Copy link
Member

From triage: we might want this to return a lazy object, where the materialize! method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field. This lazy object would essentially be a "lens", and perhaps @tkf has an opinion.

@JeffBezanson
Copy link
Member

Triage is 👍 on the general idea.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jul 23, 2020
@bkamins
Copy link
Member Author

bkamins commented Jul 23, 2020

Thank you!

we might want this to return a lazy object, where the materialize! method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field.

This is exactly how DataFrames.jl works now.

The decision is if this should be the last call in a chain that is lazy or some more general mechanism - as this is what @tkf proposed.

@oxinabox
Copy link
Contributor

oxinabox commented Jan 30, 2021

Is the proposal that there would be a another function introduced that is applied to anything that would be the first argument to materialize!?

So right now we have:

julia> Meta.lower(Main, :( x.y .= 2 .* 3 ))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.getproperty(x, :y)
│   %2 = Base.broadcasted(*, 2, 3)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

Would we instead have a function, calling it prepare_dest for now,
which would show up as:

julia> Meta.lower(Main, :( x.y .= 2 .* 3 ))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.prepare_dest(Base.getproperty, x, :y))
|
│   %2 = Base.broadcasted(*, 2, 3)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

With the default implementationm

Base.prepare_dest(f, args...; kwargs...) = f(args...; kwargs...)

And Base would define

function Base.prepare_dest(::typeof(getindex), x, inds...)
    return Base.dotview(x, inds...)
end

and would delete the dotview code from the julia-syntax.scm

And DataFrames would define:
https://github.com/JuliaData/DataFrames.jl/blob/6d1347983b67e93d50df847d2d8546a4c9a8cfd4/src/other/broadcasting.jl#L111-L119

function Base.prepare_dest(::typeof(getproperty), df::DataFrame, name)
    return Base.dotview(df, !, name)  # returns a LazyNewColDataFrame
end

And people who wanted to do other nightmarish things would define things like
Base.prepare_dest(::typeof(+), x::Foo, y::Int)

@bkamins
Copy link
Member Author

bkamins commented Jan 31, 2021

My idea was the following. Currently we have:

julia> Meta.lower(Main, :(x[1:2] .= 2))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = 1:2
│   %2 = Base.dotview(x, %1)
│   %3 = Base.broadcasted(Base.identity, 2)
│   %4 = Base.materialize!(%2, %3)
└──      return %4
))))

but:

julia> Meta.lower(Main, :(x.y .= 2))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.getproperty(x, :y)
│   %2 = Base.broadcasted(Base.identity, 2)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

and I was thinking that the simplest thing to do is to lower it to:

julia> Meta.lower(Main, :(x.y .= 2))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.dotgetproperty(x, :y)
│   %2 = Base.broadcasted(Base.identity, 2)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

And provide the following definition in Base:

dotgetproperty(x, y) = getproperty(x, y)

while DataFrames.jl would define:

Base.dotgetproperty(x::AbstractDataFrame, y::Symbol) = Base.dotview(x, !, y)
Base.dotgetproperty(x::AbstractDataFrame, y::AbstractString) = Base.dotview(x, !, y)

I was hesitating to implement it myself as I think it needs a change in

(nuref `(call (top dotview) ,(caddr refex) ,@(cdddr refex))))
function you mention (a second path is needed for getproperty similar to the one we have now for references), and I do not have experience in changing the parser to try doing this.

@simeonschaub
Copy link
Member

This should not be too hard to implement, but since getproperty often critically relies on constant propagation, I am wondering whether it would perhaps be beneficial for dotgetproperty to return a function instead (by default just getproperty) and use that to get the destination for broadcasting. This would make it a bit inconsistent with dotview though and I am not sure how much of a difference this would actually make in real use cases.

@simeonschaub simeonschaub added the compiler:lowering Syntax lowering (compiler front end, 2nd stage) label Jan 31, 2021
@bkamins
Copy link
Member Author

bkamins commented Jan 31, 2021

How would adding dotgetproperty affect getproperty uses when constant propagation matters?

@simeonschaub
Copy link
Member

If we lower x.y .= 2 to call dotgetproperty(x, :y) instead of getproperty(x, :y) that means that the constant :y needs to get propagated through another level of function call. If constant prop doesn't happen and x has properties of different types, this will cause type instabilities. (This of course doesn't have any affect on other uses of getproperty though.)

To be clear, what I am proposing is lowering to:

julia> Meta.lower(Main, :(x.y .= 2))
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.dotgetproperty(x)
│   %2 = (%1)(x, :y)
│   %3 = Base.broadcasted(Base.identity, 2)
│   %4 = Base.materialize!(%2, %3)
└──      return %4
))))

Base would then define dotgetproperty(::Any) = getproperty and DataFrames could just define Base.dotgetproperty(::AbstractDataFrame) = (x, y) -> Base.dotview(x, !, y)

@bkamins
Copy link
Member Author

bkamins commented Jan 31, 2021

What you propose is also OK. I would say it is even better as having only one parameter of dotgetproperty as you propose minimizes the risks of method ambiguities in the future (as various packages might support different types of y that are allowed).

However, but let me ask the following.

If constant prop doesn't happen

I thought that in this case we were guaranteed that constant propagation would happen. Also I would assume that getproperty would just get inlined if we define dotgetproperty(x, y) = getproperty(x, y) so this should be transparent. But apparently this is not guaranteed to be the case, so it would be interesting to learn why.

Thank you!

@simeonschaub
Copy link
Member

I thought that in this case we were guaranteed that constant propagation would happen.

Constant propagation is always purely an optimization, so we generally don't make guarantees where it will happen. In this example, I'd expect this to be inlined and :y to be propagated most of the time, but I wouldn't be surprised if the compiler bailed out earlier under some circumstances. Other people here are likely more qualified to comment whether these concerns are justified though.

@bkamins
Copy link
Member Author

bkamins commented Feb 1, 2021

Anyway - I agree that your approach is better 👍. The question is someone who knows femtolisp well enough to correctly implement it does have time to make this PR. Thank you!

simeonschaub added a commit that referenced this issue Feb 1, 2021
Don't know if `dotgetproperty` is a good name for this, since as discussed in #36741, the way this works is a bit different from `dotview`. Right now, `dotproperty` returns a function which is used instead of `getproperty`, but not sure if that is a bit strange and we should just make it behave like `dotview` instead.

fixes #36741
simeonschaub added a commit that referenced this issue Feb 1, 2021
Don't know if `dotgetproperty` is a good name for this, since as discussed in #36741, the way this works is a bit different from `dotview`. Right now, `dotproperty` returns a function which is used instead of `getproperty`, but not sure if that is a bit strange and we should just make it behave like `dotview` instead.

fixes #36741
simeonschaub added a commit that referenced this issue Mar 12, 2021
* make broadcasting into getproperty extensible

Don't know if `dotgetproperty` is a good name for this, since as discussed in #36741, the way this works is a bit different from `dotview`. Right now, `dotproperty` returns a function which is used instead of `getproperty`, but not sure if that is a bit strange and we should just make it behave like `dotview` instead.

fixes #36741

* make dotgetproperty call getproperty directly

Co-authored-by: Jameson Nash <[email protected]>
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
* make broadcasting into getproperty extensible

Don't know if `dotgetproperty` is a good name for this, since as discussed in JuliaLang#36741, the way this works is a bit different from `dotview`. Right now, `dotproperty` returns a function which is used instead of `getproperty`, but not sure if that is a bit strange and we should just make it behave like `dotview` instead.

fixes JuliaLang#36741

* make dotgetproperty call getproperty directly

Co-authored-by: Jameson Nash <[email protected]>
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
* make broadcasting into getproperty extensible

Don't know if `dotgetproperty` is a good name for this, since as discussed in JuliaLang#36741, the way this works is a bit different from `dotview`. Right now, `dotproperty` returns a function which is used instead of `getproperty`, but not sure if that is a bit strange and we should just make it behave like `dotview` instead.

fixes JuliaLang#36741

* make dotgetproperty call getproperty directly

Co-authored-by: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection compiler:lowering Syntax lowering (compiler front end, 2nd stage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants