Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sets.transform(Set<F>, Converter<F, T>) or the equivalent - WontFix: Use ImmutableSet.copyOf(Collections2.transform(...)) #219

Closed
gissuebot opened this issue Oct 31, 2014 · 32 comments
Labels
status=will-not-fix type=enhancement Make an existing feature better

Comments

@gissuebot
Copy link

Original issue created by butkovic on 2009-08-13 at 11:52 AM


I'd like to use method:
Sets.transform(Set<F> fromSet, Function<? super F,? extends T> function)

But there is no such method. I would expect it would behave the way similar
to one in List.transform()

thanks.

Peter B.

@gissuebot
Copy link
Author

Original comment posted by jim.andreou on 2009-08-13 at 11:59 AM


How would you implement Set<T>.contains(Object) in the resulting set? (Or if that set
was not lazy, you wouldn't need this method). Well, you would need the inverse function
T --> F as well (which would imply that such an inverse function exists). So it's
lost more complicated than that, and I'm not sure that's the only issue.

@gissuebot
Copy link
Author

Original comment posted by butkovic on 2009-08-13 at 12:11 PM


I didn't notice it's implemented lazy for list, sorry, only now I checked the sources.
OK, that's probably too much complicated then.

@gissuebot
Copy link
Author

Original comment posted by jared.l.levy on 2009-08-13 at 01:57 PM


Sets.transform isn't feasible. The problem is that the function could map two distinct
elements in fromSet to the same value, which would violate the requirement that all
elements in the returned set must be unique.

Instead, you can pass a set to Collections2.transform(). If you need a set at output,
copy the collection generated by Collections2.transform() into a set.


Status: WontFix

@gissuebot
Copy link
Author

Original comment posted by jim.andreou on 2009-08-13 at 02:13 PM


To complement Jared response, he describes a situation where there wouldn't exist an
inverse function.

But we should note that if there was a "Bijection" interface, which defined both
directions of a function, such a transform method would be easily defined in terms of
that.

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by joe.kearney%[email protected] on 2009-08-13 at 02:29 PM


I have a use case for this, where I'm trying to implement a Map. I want the keySet to
be a view over the EntrySet, not just a copy. In this (admittedly limited) scenario
we have the advantage of being able to guarantee that the transformed elements will
indeed be unique. The function here would simply be to pull out the key from the
entry. Of course, as Jim Andreou points out above, you'd need a little more
cleverness for contains(Object), remove etc, but these could be achieved by a
function that boils down to key->underlyingMap.getEntry(key).

The simplest solution is to extend AbstractSet and implement iterator() using
Iterators.transform, but then most operations degenerate to a linear trawl, and I
can't easily delegate back to the entrySet/underlying map to do the work.

What would you recommend to implement this? Just a fuller implementation of
AbstractSet? Would there be any scope for requiring the inverse function to be given?
You'd shift the burden for guaranteeing correctness to the implementation of the
functions, but would only need it to be valid over the domain of the contents of the set.

Set<F> transform(Set<F> fromSet, Function<? super F,? extends T> function, Function<?
extends T,? super F> inverse) // generics would need more thought

I realise that this would be a rather complex addition at this stage, but I'd be
interested to hear others' thoughts. My point is that though this won't work in
general, there is a class of cases where you can provide the required guarantees such
that it could be well-defined and indeed useful.

Thanks,
Joe

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2009-08-13 at 03:12 PM


Joe, what features do you need that AbstractMap.keySet() doesn't provide? A fast
remove()? The ability to easily use the keySet() implementation without extending
AbstractMap?

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by joe.kearney%[email protected] on 2009-08-13 at 03:38 PM


Yes, sort of. I'm not using AbstractMap, but in any case that implementation of
keySet() just creates an AbstractSet over the keys in the entrySet iterator, so most
operations are still linear.

We're not using an AbstractMap for slightly obscure memory reasons, we're optimising
for a slightly unfortunate case we have where we see a vast number of small or empty
maps. The keySet and entrySet fields from AbstractMap turned out to be a noticeable
chunk of the memory used. It's an array-backed map akin to the LinearMap suggestion
in another bug report. The reason I'm still looking for better-than-linear
performance in the key-set methods is that we're using a binary search over the
(Comparable) keys. Maybe LogarithmicMap would have been better...

I don't think it's particularly complex to just delegate everything back to the
underlying map and translate the details, it would just be nice to have a utility to
do it for me. Then again, maybe this doesn't come up that often.

@gissuebot
Copy link
Author

Original comment posted by kevinb9n on 2009-11-04 at 11:04 PM


Reopening, because in the future we likely will have an invertible Function (we call
it Converter, for better or worse), and when we do, this method is worth considering.
It would simply have to document that your converter had darn well better be a strict
bijection (so no String<->Double or things like that!).


Status: Accepted
Labels: Milestone-Post1.0

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2010-02-12 at 05:06 PM


Upon further discussion, I'm coming to the same conclusion that I had a few years ago
when Sets.transform() was first requested: nearly every conceptually invertible
function is lossy; truly lossless (mathematically invertible) ones are vanishingly
rare. The result could be a broken Set. That doesn't mean we'll definitely never do
Sets.transform(), but its priority feels low for this reason.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2010-04-23 at 08:27 PM


note: depends on bug 177.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2010-07-30 at 03:53 AM


(No comment entered for this change.)


Labels: -Milestone-Post1.0

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2011-01-26 at 10:11 PM


(No comment entered for this change.)


Labels: Type-Enhancement

@gissuebot
Copy link
Author

Original comment posted by maliniakh on 2011-02-02 at 01:04 PM


@Kevinb: Developers may be aware that some sets are truly lossless, like converting db entities into their ids. If sets happen to be lossy, simply runtime exception could be thrown within Sets.transform() method. Sounds like a reasonable solution to me.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2011-02-02 at 03:03 PM


We wouldn't be able to throw such an exception; we wouldn't know there was anything wrong. It would just produce a buggy Set. Again, we're not rejecting the idea and it will probably happen eventually.

@gissuebot
Copy link
Author

Original comment posted by finnw1 on 2011-02-02 at 04:42 PM


I think that's an example of where lossiness does not matter. If a DB entity has been deleted, yes regenerating it from its id will fail, but then the object must have been invalidated already (even if the fact was not known.)

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2011-02-17 at 04:30 PM


Now that Converter has some traction in Google-internal code, we decided to revisit this. I did a survey of callers of {Iterables,Lists,Collections2}.transform(..., someConverter.asFunction()), which is the closest analogue to the requested Sets.transform, to see whether they might benefit from the new method.

The result is that Sets.transform is unnecessary or bug-promoting for all of them. Here's a list of criteria that a caller would need to meet for the method to be helpful:

  • Their converter must be reversible in the mathematical sense. Maybe we're OK if "01" and "1" map to the same thing (though I wouldn't want to be the one to play fast and loose with user input in a webapp), but people love Converter for converting between protocol buffers and richer POJOs. Why's that a problem? Protocol buffers have a concept of "repeated" fields, which are represented as a List. However, they're commonly used for data that would go in a Set in the POJO. The result is that a protocol buffer containing [a,b] and a protocol buffer containing [b,a] would be considered unequal, but both would map to the same POJO. This is the sort of bug that could manifest itself years after the code was written, e.g., during a JDK update to HashSet's hashing algorithm, a change that the developer might be only dimly aware of.
  • Part of this is that the conversion probably shouldn't have any invalid inputs. If -1 is a valid Integer but not a valid UserId and my UserId constructor throws IllegalArgumentException if it's provided, Sets.transform can give me a set for which contains(-1) throws IllegalArgumentException. Ugh.
  • The conversion probably has to be fast, since it will likely be performed many times. The Converters I see in Google code generally aren't (e.g., protocol buffers again); as a result, many callers of transform() immediately call ImmutableSet.copyOf() on the result, or they iterate over it a single time and throw it away. ImmutableSet.copyOf() ensures that they won't be transforming their protocol buffers over and over during inner loops that call contains().
  • The conversion probably has to be performed to many elements. If I'm transforming just one or two, why not create the ImmutableSet copy, or why not suffer the O(n) contains() already available to users of Collections2.transform()?
  • The consumer of the set must not need to perform a defensive copy, as many constructors do. If I'm calling ImmutableSet.copyOf(), all I need as input is an Iterable, which I can get from Iterables.transform().
  • More generally, the caller needs to be planning to perform calls to contains() (or perhaps remove()/add(), but that seems unlikely), as these are the only operations that would be faster with Sets.transform() than with Collections2.transform().
  • Those contains() calls must be performed by someone other than the caller himself. If the caller himself wants to perform them, it's often just as easy to call set.contains(userId.toInteger()) n times instead of Sets.transform(set, userIdToInteger) once and set.contains(userId) n times. Of course there's a tipping point, but I didn't see it in my investigations.

I remain convinced that Sets.transform would be very cool -- I had a CL to implement it myself 3 years ago, one of my first Java-libraries CLs -- but the more research I do, the more convinced I become that it would cause more problems than it solves.

I'm re-closing the bug.

The Converter class itself will still likely see the light of day.


Status: WontFix

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2011-03-18 at 10:21 PM


The comments below were written without seeing Chris' comments above. I agree the contract required for the Sets#transform's bijection is tricky, but not as tricky as described. I think the confusion comes from defining a bijection between all of A and all of B, while we only require it between a Set<A> and Set<B>, as I explain below, which is nowhere near as hard to implement correctly as the former one.

(For me, the tricky part is this: Do we define a complete A<-->B bijection type, and reduce its requirements in Sets#transform, or build a constrained (Set<A> <--> Set<B>, i.e. something like "isDefinedAt(A)" method) bijection type to begin with, with exactly the right spec for Sets#transform? I would definitely choose the latter, which is as useful and general - it can mimick the former just by implementing "isDefinedAt(A)" to return true, if the implementor can truly upheld the contract)


Just a comment from my recent experience implementing something like a Sets#transform: the tricky part is to specify the bijection contract. I mean the exact semantics of the methods, not just the signatures.

For example:
public interface InvertibleFunction<A,B> extends Function<A,B> {
    InvertableFunction<B,A> reverse();
}

This, without any other specification, implies a bijection A <--> B, i.e. from every A to every B (oh, yes, A and B (infinite) sets should have the same cardinality too). This specification makes it the easiest to implement Sets#transform, but also the hardest to implement InvertibleFunction correctly.

At least from the perspective of Sets#transform, what is really needed is something far less, and far easier to implement: a bijection not from all A to all B, but one that needs only to be defined for those A's in a specific Set<A>. The image of that would be a Set<B>, and similarly, the inverse function would only be required to map only the B's in that Set<B> back to A's of Set<A>.

InvertibleFunction as given above doesn't make it easy to restrict ourselves to a smaller domain than the type parameter itself (Set<A> is just a subset of A).

Aside: scala functions have an "isDefinedAt(x: A)" method, which is precisely what I'm talking about. They don't define functions over all A, but only a subset of it (a predicate on elements of A, such as isDefinedAt, defines a set of A just as Set<A> does, modulo not providing an iterator).

But we have no isDefinedAt method, and we can't define one, and even if we could, interfaces with two methods are going to become much more costly than those with a single method when closures arrive, so I think we are pretty much stuck with the clumsier approach. The good news is that the clumsier approach is workable, but the specification of a potential Sets#transform would be forced to explain this subtlety, that not a complete bijection is required, but only one defined for the given Set<A> and its image Set<B>, for other inputs it would be free to return whatever.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2011-03-18 at 10:38 PM


"Maybe we're OK if "01" and "1" map to the same thing" -- indeed, this is where nastiness begins. That the user can create a broken implementation in which all of these hold:

  1. Set<A> contains "01" and "1" (A = String)
  2. Set<B> contains Integer(1) (B = Integer)
  3. setA.size() == setB.size()
  4. ...setB.iterator() appears to contain duplicates when iterated :-/

Which is subtle, but as the fundamental CS tenet goes, 'garbage in, garbage out'...

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by [email protected] on 2012-02-27 at 08:46 PM


The discussion on this issue is long and complex, with lots of good arguments for and against using Sets.transform in specific scenarios. However, the arguments around the necessity of a bijection seem to be a primary reason for not including it the library at all. This confuses me a bit because it seems like all that is really needed is an injection (aka a one-to-one (but not onto) function).

Analogous to how Set refines Collection purely in terms of semantics (i.e. Javadoc only), couldn't a new Injection interface refine Function semantically as f(a) = f(b) implies a = b? This provides a well-defined Set with a linear-time contains() (as provided by AbstractSet), which is good enough in many common cases. I've done this myself, so I could easily provide code if desired.

Adding a Bijection (which refines Injection) with an inverse function would allow the transformed set to delegate contains() to the base set, preserving its likely better than linear runtime. This could be provided as a Sets.transform overload. I think the main trick here is that type erasure doesn't allow contains(Object) to safely call inverse(F) (since you can't use instanceof with a type parameter). Of course, that test could be included in the Bijection interface, e.g. rangeContains(Object) or rangeCast(Object).

Is this line of thought reasonable, or am I missing something?

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-27 at 11:01 PM


Note that Sets#transform is already implemented with a bijection, feel free to peek in the code. (Used by something called "WellBehavedMap", if I recall correctly)

But yes, this could have been implemented this with just an injection, and a linear time Set<B>#contains. But that would be surprisingly slow, right? You do expect Set#contains to be relatively fast.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-02-27 at 11:11 PM


I'm really leery of introducing a whole new Injection interface just for the sake of this one method. I'm convinced by Chris' argument earlier that Collections2.transform gives you something with the same performance guarantees, and if you really want a Set -- you probably really want the Set because you want a fast contains, so just copy it into an ImmutableSet.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-27 at 11:30 PM


Unless, of course, you have to keep the transformed set in sync with the original, and you plan to do enough contains() checks to worth the cost to construct an ImmutableSet. (Just framing this a bit better)

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by [email protected] on 2012-02-28 at 12:22 AM


@andreou: Heh, thanks, I never thought to look for an internal implementation. Not to you personally, but reflectively for all: it's a little disingenuous to argue against the need for/validity of something while using it privately, no? ;-)

@louis: No, I promise, I really don't want fast contains(). :-) I think there are many common usages that don't ever need to call contains() and primarily use size() and iterator(). However, Collection isn't always an option because of the need to interoperate with APIs that use Set because of its semantics (i.e. no duplicates), without expectations of the performance of contains(). Consider java.util.concurrent.CopyOnWriteArraySet as an example. Copying into ImmutableSet has worse performance in such cases, because each construction redundantly verifies the uniqueness constraint.

I really find the pushback surprising here, given that:
a) it's definitely useful/preferred in some cases (e.g. WellBehavedMap),
b) it's already implemented (and almost trivially, at that),
c) given an interface more specialized than Function (be it InvertibleFunction or Injection or other), it's pretty hard for someone to shoot himself in the foot.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 12:31 AM


Trevor, it would be funny if you directed that to me personally; I was arguing in favor of this. :) (#17 sums up the story of writing this)

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 03:48 AM


Another way to convert your Collections2.transform Collection to a Set:

class ForwardingCollectionSet extends ForwardingCollection implements Set {
  protected Collection delegate() { return delegate; }
}

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-02-28 at 04:23 AM


...except you need to override hashCode and equals, but other than that, Chris is correct. (That is the easiest way to view a Collection known to be unique as a Set, and it is quite easy.)

My main objection is the increased API surface -- creating an interface just for the sake of one method seems a bit much.

@gissuebot
Copy link
Author

gissuebot commented Oct 31, 2014

Original comment posted by [email protected] on 2012-02-28 at 09:21 PM


That's not a terrible solution, but overriding hashCode and equals (correctly) seems like a subtlety that cries out to be done carefully in a library such as this, rather than being done ad hoc.

Louis, I understand that you need to worry about API surface in general, but that argument seems like an oversimplification in this case:

  public interface Injection<F, T> extends Function<F, T> {}

Other than Javadoc, that's it. It hardly seems like a maintenance problem, and I'm happy to submit a patch including the Javadoc.

Speaking of subtlety, I'd like to point out a slightly different argument in favor of this feature: Obviously people are reaching for Sets.transform. Finding it missing, they likely end up here. If ImmutableSet.copyOf solves their situation, they're on their way in 20 mintues. Otherwise, they spend a day trying to understand this thread (and possibly want to rehash it) and then kludge together a solution that likely misses a few of the subtleties. Distilling an overview of these issues into a single static method and empty interface (and accompanying Javadoc) seems worth even more than the actual code, even if it encourages them to use ImmutableSet in the end anyway.

@gissuebot
Copy link
Author

Original comment posted by wasserman.louis on 2012-02-28 at 09:25 PM


Let me put it another way: there's already a decent amount of momentum behind Converter, which you describe as Bijection. I'd be absolutely okay with waiting for Converter to get off the ground and then providing Sets.transform after that.

I'd like to hear opinions from other Guava team members, though.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 09:36 PM


Indeed, my mistake makes the ForwardingCollection solution look bad, and it's not something I'd so much "recommend" as "fall back on if necessary."

The main problem with Sets.transform remains that it doesn't work right in the main cases that people seem to want it to work -- parsing numbers, for instance. Dimitris and I disagree over whether it's even a good fit for WellBehavedMap. (That use, too, had a subtle bug, fortunately caught by another reviewer.) Plus, ImmutableSet appeared to be an adequate solution for all the potential users we looked at. There will be users for whom this is not true, but overall I suspect that the method would cause enough bugs to outweight any improvements it can produce.

People are welcome to continue to discuss, but we have had this discussion repeatedly over the years, and I'm trying my best not to sink a few hours into it for the nth time.

I'll update the bug title to make the workaround clear.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 09:45 PM


(No comment entered for this change.)

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 10:30 PM


"Dimitris and I disagree over whether it's even a good fit for WellBehavedMap"

This is the first time I learn about this.

There are two reasons why WellBehavedMap can't use the suggested "ImmutableSet.copyOf(Collections2.transform(...))"

  1. The raison d'etre of EnumMultiset, EnumBiMap, EnumHashBiMap is higher iteration performance; otherwise a user could get the same functionality out of HashMultiset and HashBiMap. (The source of the bug, after all, was EnumMap overdoing it with performance considerations over its entrySet()). Constructing an ImmutableSet(Collections2.transform(...)) of the whole structure in entrySet() of these structures would just defeat their point.

  2. entrySet() is supposed to be a live view anyway. So we wouldn't even hope to get away with doing a single copy of the entire structure - after every update, we would have to drop any constructed ImmutableSet, to recreate it the next time. Let alone the extra memory retention that this shadow copy of the whole structure implies.

Sets#transform is really the best fit for WellBehavedMap - if you disagree, please provide data to back up that opinion.

@gissuebot
Copy link
Author

Original comment posted by [email protected] on 2012-02-28 at 10:36 PM


Right -- I didn't mean that we should use ImmutableSet.copyOf(...) there, merely that I didn't think a general-purpose Sets.transform was necessary -- and, it seems, it actually distracted from the need for a live view. (Or maybe I'm just making excuses for not spotting the problem :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status=will-not-fix type=enhancement Make an existing feature better
Projects
None yet
Development

No branches or pull requests

1 participant