Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

?= "get or set" operator #2108

Open
StefanKarpinski opened this issue Jan 22, 2013 · 21 comments
Open

?= "get or set" operator #2108

StefanKarpinski opened this issue Jan 22, 2013 · 21 comments
Labels
feature Indicates new feature / enhancement requests speculative Whether the change will be implemented is speculative

Comments

@StefanKarpinski
Copy link
Member

Related to #1764, I'd like to propose a ?= operator to do get-or-set operations. This is primarily useful for associative collections like Dicts. Example:

d[k] ?= f()

which translates to

has(d,k) ? d[k] : (d[k]=f())

except that it avoids hashing the key three separate times.

This is more properly called the []?= operator, since the indexing operation is part of the entire syntax. However, the ?= operator could be used for conditionally assigning to undefined variables:

let
  local x
  if randbool()
    x = 1
  end
  x ?= 0
  @show x
end

in which the x ? = 0 is like isdefined(:x) ? x : (x=0) but works in local or global scopes. This is of much more dubious utility than d[k] ?= f() but seems like a natural analogue.

@johnmyleswhite
Copy link
Member

This is kind of awesome. But I worry that we'll end up with Perl syntax soon since what I would really want is d[k] +?= f().

@StefanKarpinski
Copy link
Member Author

That's too much and better done via #1529 by having a d = Dict(+) and calling update(d,k,f()) or just d[k] += f().

@toivoh
Copy link
Contributor

toivoh commented Jan 22, 2013

Interesting idea, could be nice. More or less acting as dedicated syntax
for @get!, right?

@johnmyleswhite: By analogy, would +?= add the rhs if the key is absent
(doesn't make sense, of course), or just return the stored value otherwise?
I think that plain += might actually be a more suitable name for the
operation to set the value to zero if absent, and then add in the rhs.
(Which could be a nice thing to have, or dangerous. Perhaps it is
DefaultDicts that are missing...)

@StefanKarpinski
Copy link
Member Author

Having d[k] += v mean has(d,k) ? (d[k] = d[k]+v) : (d[k] = v) and such is actually not a terrible idea.

@johnmyleswhite
Copy link
Member

Stefan, you just beat me to the point. But I don't think we totally agree.

As Stefan says, I would want d[k] +?= f() to expand to a more efficient equivalent of has(d, k) ? d[k] += d[k] + f() : d[k] = f().

Now it is tempting to make += do that, but I think it goes too far. There are obviously efficiency issues, but my real concern is that I would like to sure a += b is as close to a = a + b as possible. Making one of them throw an error and the other work seems confusing.

In this case, I agree that DefaultDict's are what we really need. And I am starting to think that Pandas series type is pretty awesome. I still don't know whether it is related to Avi's idea, equivalent to Avi's idea or whether it subsumes it.

@HarlanH
Copy link
Contributor

HarlanH commented Jan 22, 2013

I've been planning to write a little DictCount type with methods inc, dec, set, and clear. This is related to that, although not the same.

I guess I'd vote for methods rather than special syntax here, for readability, although the conciseness is appealing...

@johnmyleswhite
Copy link
Member

Why use DictCount with inc rather than DefaultDict(0) with +?

@HarlanH
Copy link
Contributor

HarlanH commented Jan 22, 2013

Methods like inc!(x, "cat") feel more readable to me, but yeah, the clever DefaultDict approaches could work too.

@johnmyleswhite
Copy link
Member

Really? To me this is much more readable:

d = DefaultDict{K, Int}(0)
d[k] += 5

I suppose my preference really stems from the fact that all of the operators I already know work exactly the same way for this new type if I assume that d[k] is always initialized to 0. In some way, I see the DefaultDict as the Dict version of zeros and ones.

@StefanKarpinski
Copy link
Member Author

@HarlanH: the original proposal of d[k] ?= f() requires syntax because f() is only evaluated if d does not contain the key k; this can only be done with a macro, not a method. This is precisely what @toivoh's pull request implements. The would also likely be an underlying getorset!(d,k,v) method that associative types would implement so that, e.g. the hash of k could be computed only once, but it would not have the conditional evaluation semantics that make this syntax so useful for things like caching expensive computed values (which is precisely what led me to propose this in the first place).

@HarlanH
Copy link
Contributor

HarlanH commented Jan 22, 2013

Makes sense! Thanks, Stefan.

On Tue, Jan 22, 2013 at 5:14 PM, Stefan Karpinski
[email protected]:

@HarlanH https://github.com/HarlanH: the original proposal of d[k] ?=
f() requires syntax because f() is only evaluated if d does not contain
the key k; this can only be done with a macro, not a method. This is
precisely what @toivoh https://github.com/toivoh's pull request
implements. The would also likely be an underlying getorset!(d,k,v)method that associative types would implement so that, e.g. the hash of
k could be computed only once, but it would not have the conditional
evaluation semantics that make this syntax so useful for things like
caching expensive computed values (which is precisely what led me to
propose this in the first place).


Reply to this email directly or view it on GitHubhttps://github.com//issues/2108#issuecomment-12569779.

@RichMorin
Copy link
Contributor

Ruby's handy and popular ||= idiom is a bit like this, but there are subtleties:

http://www.rubyinside.com/what-rubys-double-pipe-or-equals-really-does-5488.html

@StefanKarpinski
Copy link
Member Author

Yep, that's the inspiration. That particular syntax (which Ruby in turn got from Perl) doesn't quite work for Julia, where the || operator is much stricter, but it's similar in spirit.

@dbeach24
Copy link
Contributor

[Parts of] this proposal also seems extremely similar to Python's dict.setdefault, except that in this instance, the default value is a function to be evaluated.

How would you feel about these methods (or alternatively, macros), for avoiding multiple lookups?

  1. setdefault!(dict, key, default)

    sets dict[key] = default() IFF key is absent; always returns dict[key]

  2. update!(dict, key, updater)

    updates dict[key] = updater(dict[key]), error if key is not present

  3. setorupdate!(dict, key, updater, default)

    sets dict[key] = updater(haskey(dict, key) ? dict[key] : default())

I believe the first two can be written purely as special cases of the third.

@dbeach24
Copy link
Contributor

After banging out some other attempts at a solution for this (all of which involved creating multiple reference types), I may have stumbled on something a bit more elegant.

This idea assumes that the major motivation for all of these combined test/get/set operations is that we want to avoid re-computing a (possibly expensive) hash function for the key type. To do this, we create a special key type which is "pre-hashed", as follows:

immutable PreHashed{K}
    key::K
    hashval::Int
end
prehash(key) = PreHashed{typeof(key)}(key, hash(key))
Base.hash(ph::PreHashed) = ph.hashval
Base.convert{K}(::Type{K}, ph::PreHashed{K}) = ph.key

The idea now is that, using this type, we can perform test/get/set without re-computing the key's hash, like so:

# define the pre-hashed key...
key = prehash("MySomewhatLongishStringKey")

# and now...
haskey(d, key) ? d[key] : (d[key] = generate_default_value())

In the above code, hash("MySomewhatLongishStringKey") is computed only once.

Of course for this solution to work as-is, you would need to define the key type of your dictionary to be PreHashed{K}. However, one could imagine updating the methods of Dict{K,V} to allow PreHashed{K} as a stand-in for the key type. (A mildly annoying problem with the current dict methods is that they eagerly convert the key to the dict's key type, before calling ht_keyindex, etc.)

This certainly doesn't get rid of all the overhead in repeated calls to ht_keyindex and ht_keyindex2, but it could substantially reduce it. Thoughts?

(As an aside, keeping the hash value for each key with a dictionary can be a nice optimization in general. When the dictionary is resized, the hash function does not need to be re-evaluated for the existing keys. Python uses this trick. However, there may need to be an explicit rehash operation which ignores the stored hash values, in case somebody used a mutable value for the key and truly needs to "re-hash" all of the keys.)

@ScottPJones
Copy link
Contributor

This of course is assuming that hashing is what you are using to implement these data structures.
Case in point, the multi-dimensional associative sparse arrays that I worked on / implemented took a set of keys, and produced a byte string, which was then used as the key in a persistent KV store, or in-memory p-trie, or hashed and used in a hash table.
What happens if the hashing is changed dynamically (for example, if the table grows)?

@carnaval
Copy link
Contributor

#12157

@ScottPJones
Copy link
Contributor

Yes, I'd seen #12157, and liked the proposal!

@x-ji
Copy link

x-ji commented Dec 16, 2018

To be clear there is currently no way to do it yet right? If I want to achieve this I'd have to first check whether the entry exists, and if not create it before returning it?

@chethega
Copy link
Contributor

help?> get!
  get!(f::Function, collection, key)

  Return the value stored for the given key, or if no mapping for the key is
  present, store key => f(), and return f().

  This is intended to be called using do block syntax:

  get!(dict, key) do
      # default value calculated here
      time()
  end

@x-ji
Copy link

x-ji commented Dec 16, 2018

Thanks

@brenhinkeller brenhinkeller added the feature Indicates new feature / enhancement requests label Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests