Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerable#sum(&) fails with certain block arguments #15317

Open
rvprasad opened this issue Jan 1, 2025 · 13 comments
Open

Enumerable#sum(&) fails with certain block arguments #15317

rvprasad opened this issue Jan 1, 2025 · 13 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:discussion topic:stdlib:collection

Comments

@rvprasad
Copy link

rvprasad commented Jan 1, 2025

Bug Report

Enumerable.sum(&) should succeed as long as the block argument transforms the elements into values into types that supports addition. This pattern would be common when summing heterogeneous enumerables such as arrays and tuples. However, it fails in the following code snippet.

d = {"a", "b"}
puts d.sum(&.to_s)

Expected output: ab (similar to the output of d.join)
Actual output:

In /usr/lib/crystal/enumerable.cr:1759:12

1759 | type.zero
^---
Error: undefined method 'zero' for String.class

Crystal version: 1.14.0

@rvprasad rvprasad added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Jan 1, 2025
@rvprasad rvprasad changed the title Enumerable#sum(&) fails with heterogeneous enumerables Enumerable#sum(&) fails with certain block arguments Jan 1, 2025
rvprasad added a commit to rvprasad/crystal that referenced this issue Jan 10, 2025
@rvprasad
Copy link
Author

rvprasad commented Jan 10, 2025

To fix this issue, additive_identity needs to return an empty string (i.e. "zero" value) when T is String. Doing so changes the return type of additive_identity to (Int32 | String) and breaks the reduce call in sum(initial, &).

Instead, we could split the paths in sum(&) based on Reflect.type. Doing so, the reduce call in sum(initial, &) fails as (yield e) is typed as Int32 while the yield Enumerable.element_type(self) (in the context of sum(&)) is typed as String.

b499bb2 contains the above described changes. Running make std_spec will surface the issue.

@rvprasad
Copy link
Author

rvprasad commented Jan 11, 2025

Unlike sum(), sum(&) does not involve macro expansion to specialize it for different types of values because the return type of the block argument is unknown at the time of macro expansion. Introducing a dedicated code path to handle String in sum(&) changes the return type of sum(&) to a union type (e.g., (Int32 | String)) which breaks type checking of existing code.

Different variants of #sum may not be able to uniformly handle strings. Enumerable#join is the appropriate method to join strings. For these reasons, I suggest prohibiting the use of #sum with non-additive types as follows.

  1. Allow the use of sum(&) to fail at run time.
  2. Issue a compiler warning for uses of sum() with strings.

235f1bf contains the above suggested changes.

rvprasad added a commit to rvprasad/crystal that referenced this issue Jan 11, 2025
1) Flag the use of `sum(&)` with non-additive types at run-time.
2) Flag uses of `sum()` with strings with a compile-time warning.
   Eventually, change this warning into an error.
@straight-shoota
Copy link
Member

I don't think it makes sense to special-case Enumerable#sum for String. Remeber that this affects any type which implements + but not .additive_identity.

The purpose of .additive_identity is to avoid special casing the algorithm and instead delegate to the respective type.
So we could just implement String.additive_identity. We do the same for other non-numerical types such as Array.additive_identity and Time::Span.additive_identity.

@oprypin
Copy link
Member

oprypin commented Jan 11, 2025

Note that the fact that this doesn't work protects people from using a quadratic complexity operation with tons of allocations instead of the linear operation which is .join. We should think long and hard before considering String#additive_identity.

@rvprasad
Copy link
Author

@straight-shoota My first comment was about exploring the use of dedicated paths in #sum(&) as it is done in #sum(). My first commit b499bb2 surfaced the issues with this approach. The first paragraph of my second comment elaborates on more issues related to such dedicated paths.

As described in the latter half of my second comment, like @oprypin, I think that a better solution is to disallow the use of #sum to join enumerable of strings and encourage the use of #join. My second commit 235f1bf contains this solution.

With regards to disallowing the use of #sum to join strings, we could introduce it as an incremental change with a compiler warning as done in 235f1bf or as a breaking change via a compiler error as we did for #15269.

@rvprasad
Copy link
Author

Note: This comment is a "thinking out loud" comment.

Here are three reasons why I think we should not add .additive_identity to String.

  1. We will have to different yet identical ways -- join and sum -- to concatenate strings in an enumerable.
  2. Unlike adding numbers, joining strings is not commutative. This reasoning also applies against joining Arrays, Deques, and Slices via #sum. This reasoning does not apply against adding Numbers, Complexs, Sets, and Time::Spans.
  3. While joining/concatenating is a basic/natural operation on Array, Deque, Slice, and String, addition is not a basic/natural operation for these types. At best, I think addition can serve an alias to one of the basic/natural operations (e.g., prepend, append) on these types.

As I write this comment, I am thinking we could drop .additive_identity in Array, Deque, and Slice, and recommend the use of #flat_map or #reduce without any loss of functionality and expressibility). By doing so, we can reduce the API surface area by eliminating redundant interface elements. Disclaimer: I have not assessed the impact of such a change on the uses of [Array|Deque|Slice]#sum in the wild.

@straight-shoota
Copy link
Member

straight-shoota commented Jan 12, 2025

@rvprasad I do not follow what's the issue with adding String.additive_identity. Could you provide a concrete example that breaks with this method?
I'm pretty confident this should just fit right in, as do other existing implementations of this method.

  1. We will have to different yet identical ways -- join and sum -- to concatenate strings in an enumerable.

That's fine. They are different concepts and use different roads to reach the same result.
Each makes sense in its own right:

  • #join is a dedicated string concatenation for enumerable items.
  • #sum implements the generic concept of addition, which String happens to implement.

Please note that 3rd party code could define types that add with String, at which point #sum even might behave differently than #join when mixing String and a type that adds with it.

  1. Unlike adding numbers, joining strings is not commutative. This reasoning also applies against joining Arrays, Deques, and Slices via #sum. This reasoning does not apply against adding Numbers, Complexs, Sets, and Time::Spans.

I fail to see why missing commutativity should be a reason to disallow summing up. The semantics of the + operator do not require commutativity, and Enumerable#sum is just an application of that operator over an ordered collection. It's irrelevant that a different order of the collection would give a different result.

  1. While joining/concatenating is a basic/natural operation on Array, Deque, Slice, and String, addition is not a basic/natural operation for these types. At best, I think addition can serve an alias to one of the basic/natural operations (e.g., prepend, append) on these types.

The + operator on String and collection types is defined as concatenation. And we call that operator the addition operator because that's the original meaning. Adding two strings of course isn't an addition in the mathematical sense.

@oprypin Efficiency is a valid concern, however I don't think the API should stand in the way of applying a generic concept to some use case just because there's a different, more efficient implementation available. This limits the general expressiveness, and performance isn't always paramount. We can of course try to optimize implementations like sum of String. Or simply encourage using more efficient alternatives. But I see no reason to prevent applying an instance of a generic concept.

@oprypin
Copy link
Member

oprypin commented Jan 12, 2025

#sum implements the generic concept of addition, which String happens to implement.

String implements the concept of concatenation which happens to share the same operator + with another separate concept of addition, for reasons of historic familiarity.

We must not be blinded by this coincidental sharing of the + operator.

I fail to see why missing commutativity should be a reason to disallow summing up.

I fail to see why it wouldn't be a reason to disqualify this. I think it must always hold that array.shuffle.sum == array.shuffle.sum. What does a sum even mean otherwise?

Anyway, I 100% agree with @rvprasad #15317 (comment).

@straight-shoota
Copy link
Member

straight-shoota commented Jan 12, 2025

#sum implements the generic concept of addition, which String happens to implement.

String implements the concept of concatenation which happens to share the same operator + with another separate concept of addition, for reasons of historic familiarity.

Let me rephrase: #sum implements the generic concept of accumulating + operators (also referred to as the addition operator), which String happens to implement.

I think it must always hold that array.shuffle.sum == array.shuffle.sum. What does a sum even mean otherwise?

Commutativity doesn't even hold for numerical types when integer and floating point arithmetics mix.

[-1, 1, 0.3].sum(0) # => 0.3
[1, 0.3, -1].sum(0) # => 0.30000000000000004

@rvprasad
Copy link
Author

@straight-shoota I think there are two aspects to this issue: technical and non-technical.

On the technical front, I think adding String.additive_identity will suffice.

However, my concerns are on the non-technical (language/library design) front: why provide identical ways to accomplish a task?

  1. Such redundancies contribute to maintenance burden and possibly surprising breaking behaviors (as we have seen in this issue).
  2. Such redundancies can confuse users and affect productivity, e.g., should I use sum or join?, which of sum and join is more performant?
  3. Redundancy is useful is certain cases, e.g., frequency usage patterns like zipWithNext in Kotlin, similar but not identical ways, expressibility. So, while I see the use of redundancy of sum compared to reduce { |a,b| a + b } for additive types, I do not see the usefulness of the redundancy of sum for types like String that can be concatenated.
  4. In terms of semantics, + symbol is used to denote string concatenation in most languages. However, it is an overloaded use and, as @oprypin notes, it is not the same as addition.

Regarding the situation where third party code defines domain types that can be added with String, one could use map(&.to_s).join() if the intent is string concatenation. If the intent is some overloaded definition of addition, then I think a better approach would be define domain/wrapper type that lifts String into the target domain. Note: I know that design is subjective and contextual, and my intention here is not to preach a specific design approach.

Overall, I believe every language/library designs are subjective/opinionated. So, I think I am asking if such redundancy aligns with opinions/principles that inform the Crystal's design. That said, I am

On a lighter note, I am reminded of a quote from Jurassic Park: Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should. :)

@rvprasad
Copy link
Author

rvprasad commented Jan 12, 2025

@straight-shoota Based on the below fragments, I believe the issue that you observed is due to the quirks of floating point arithmetic in Crystal (or in general).

puts -1 + 1 + 0.3  # => 0.3
puts 1 + 0.3 - 1   # => 0.30000000000000004
puts 0.3 - 1       # => -0.7
puts -1 + 0.3      # => -0.7
puts 0.3 - 1 + 1   # => 0.30000000000000004
puts -1 + 0.3 + 1  # => 0.30000000000000004

@oprypin
Copy link
Member

oprypin commented Jan 12, 2025

Commutativity doesn't even hold for numerical types when integer and floating point arithmetics mix.

That is only a technicality. Ideologically, addition of numbers is commutative and people mostly use it as if it is.
In that case OK, if floats are involved, then their shuffled sum should be almost equal. Doesn't change my point.


@rvprasad regarding the latest comment, @straight-shoota understands the cause of it but he underlines this aspect anyway.

@rvprasad
Copy link
Author

Do folks have any more thoughts on how to proceed with this issue? Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. status:discussion topic:stdlib:collection
Projects
None yet
Development

No branches or pull requests

4 participants