Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Seq.traverse & sequence functions #277

Merged
merged 10 commits into from
Sep 23, 2024

Conversation

1eyewonder
Copy link
Contributor

@1eyewonder 1eyewonder commented Sep 17, 2024

Proposed Changes

Currently we have one implementation for Seq.sequenceResultM however, I believe the Seq.cache is just writing the sequence into memory unnecessarily. If we use a Seq.fold operation, we can significantly reduce the memory overhead of applications which have excessively large quantities of data i.e. hundreds of thousands or more (assuming my rough draft benchmark is written correctly)

v1 - current implementation
v2 - proposed implementation

image

Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance, we can always discuss having a memoize internal to the traverseXM' functions.

Types of changes

What types of changes does your code introduce to FsToolkit.ErrorHandling?

  • New feature (non-breaking change which adds functionality)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • Build and tests pass locally
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added necessary documentation (if appropriate)

Further comments

I'm open for discussion as this was just something I was playing around with

@1eyewonder
Copy link
Contributor Author

Added more benchmarks to test making the function mutable for an early escape, however between GetEnumerator and Seq.fold the difference is fairly minimal

image

@1eyewonder 1eyewonder marked this pull request as ready for review September 18, 2024 17:14
@1eyewonder 1eyewonder marked this pull request as draft September 18, 2024 17:16
@1eyewonder 1eyewonder marked this pull request as ready for review September 18, 2024 17:17
@1eyewonder 1eyewonder changed the title POC - Adding Seq.traverse & sequence functions Adding Seq.traverse & sequence functions Sep 18, 2024
Copy link
Collaborator

@TheAngryByrd TheAngryByrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance

I think we do need to address not iterating the whole sequence as this might be a behavior people are relying on.

I have two tests below, one calling the old implementation and one calling the new. As you can see the new one iterates a whole list.

        ftestCase "sequenceResultMv1 with few invalid data with long data set"
        <| fun _ ->

            let mutable lastValue = null
            let mutable callCount = 0

            let longerData =
                seq {
                    for i in 1..1000 do
                        lastValue <- string i
                        yield string i
                }

            let tweets =
                seq {
                    ""
                    "Hello"
                    aLongerInvalidTweet
                    yield! longerData
                }

            let tryCreate tweet =
                callCount <-
                    callCount
                    + 1

                match tweet with
                | x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
                | x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
                | x -> Ok(x)

            let actual = sequenceResultMv1 (Seq.map tryCreate tweets)

            Expect.equal callCount 1 "Should have called the function only 3 times"
            Expect.equal lastValue null ""

            Expect.equal
                actual
                (Error emptyTweetErrMsg)
                "traverse the sequence and return the first error"

        ftestCase "sequenceResultM with few invalid data with long data set"
        <| fun _ ->

            let mutable lastValue = null
            let mutable callCount = 0

            let longerData =
                seq {
                    for i in 1..1000 do
                        lastValue <- string i
                        yield string i
                }

            let tweets =
                seq {
                    ""
                    "Hello"
                    aLongerInvalidTweet
                    yield! longerData
                }

            let tryCreate tweet =
                callCount <-
                    callCount
                    + 1

                match tweet with
                | x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
                | x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
                | x -> Ok(x)

            let actual = Seq.sequenceResultM (Seq.map tryCreate tweets)

            Expect.equal callCount 3 "Should have called the function only 3 times"
            Expect.equal lastValue null ""

            Expect.equal
                actual
                (Error emptyTweetErrMsg)
                "traverse the sequence and return the first error"

In terms of performance, this may look fine for lists of numbers but it does become a concern when it's items that take time to generate when iterating the sequence. Creating a seq { } with a Thread.Sleep for instance would show the impact of iterating a whole list.

"version": "0.2.0",
"configurations": [
{
"name": "benchmarks",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice ❤️

"version": "2.0.0",
"tasks": [
{
"label": "build release",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@1eyewonder
Copy link
Contributor Author

No, that makes sense! I can implement the mutable implementation and add the benchmarks accordingly 🫡

@1eyewonder
Copy link
Contributor Author

I think this is ready for re-review @TheAngryByrd. Here are the benchmarks I got. I went with inlining since it showed slightly faster results.

image

Comment on lines 18 to 32
let mutable state = state
let enumerator = xs.GetEnumerator()

while Result.isOk state
&& enumerator.MoveNext() do
match state, f enumerator.Current with
| Error e, _ -> state <- Error e
| Ok oks, Ok ok ->
state <-
Seq.singleton ok
|> Seq.append oks
|> Ok
| Ok _, Error e -> state <- Error e

state
Copy link

@brianrourkeboll brianrourkeboll Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should these technically not use use on the enumerator?

    use enumerator = xs.GetEnumerator()
  2. Using seq { yield ok; yield! oks } and calling Seq.rev at the end is significantly (3–4×) faster on its own than Seq.singleton + Seq.append. The difference is much greater when you actually iterate the resulting sequences, where using a sequence expression is up to ~333× as fast and allocates as little as 1⁄250th as much.

  3. Seq.singleton + Seq.append as used here, or seq { yield! oks; yield ok }, can result in a stack overflow on iteration if the input sequence is long enough and f is defined in another assembly.

Suggested change
let mutable state = state
let enumerator = xs.GetEnumerator()
while Result.isOk state
&& enumerator.MoveNext() do
match state, f enumerator.Current with
| Error e, _ -> state <- Error e
| Ok oks, Ok ok ->
state <-
Seq.singleton ok
|> Seq.append oks
|> Ok
| Ok _, Error e -> state <- Error e
state
match state with
| Error _ -> state
| Ok oks ->
use enumerator = xs.GetEnumerator()
let rec loop oks =
if enumerator.MoveNext() then
match f enumerator.Current with
| Ok ok -> loop (seq { yield ok; yield! oks })
| Error e -> Error e
else
Ok(Seq.rev oks)
loop oks

Benchmarks

Source: https://gist.github.com/brianrourkeboll/6febebc4849708071eb08511491019b1

| Method                | Mean           | Error         | StdDev        | Ratio | RatioSD | Gen0       | Gen1       | Gen2      | Allocated    | Alloc Ratio |
|---------------------- |---------------:|--------------:|--------------:|------:|--------:|-----------:|-----------:|----------:|-------------:|------------:|
| SeqFuncs              |      16.938 us |     0.2691 us |     0.2385 us |  1.00 |    0.02 |    10.1929 |     5.0354 |         - |    125.05 KB |        1.00 |
| SeqExpr               |       5.261 us |     0.0831 us |     0.0737 us |  0.31 |    0.01 |     4.4708 |     1.1749 |         - |     54.81 KB |        0.44 |
|                       |                |               |               |       |         |            |            |           |              |             |
| SeqFuncs_Million      | 336,470.000 us | 6,425.8669 us | 6,598.8922 us |  1.00 |    0.03 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.84 KB |        1.00 |
| SeqExpr_Million       |  74,218.799 us | 1,441.9837 us | 1,716.5792 us |  0.22 |    0.01 |  5428.5714 |  5285.7143 | 1142.8571 |   54687.9 KB |        0.44 |
|                       |                |               |               |       |         |            |            |           |              |             |
| SeqFuncs_Iter_Million | 338,091.889 us | 4,229.8594 us | 3,749.6602 us |  1.00 |    0.02 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.86 KB |        1.00 |
| SeqExpr_Iter_Million  |  73,353.986 us | 1,465.0880 us | 1,744.0832 us |  0.22 |    0.01 |  5428.5714 |  5285.7143 | 1142.8571 |  54687.92 KB |        0.44 |
|                       |                |               |               |       |         |            |            |           |              |             |
| SeqFuncs_Iter         |   9,771.981 us |   104.2526 us |    97.5180 us | 1.000 |    0.01 |  2578.1250 |   359.3750 |  140.6250 |  31718.95 KB |       1.000 |
| SeqExpr_Iter          |      26.320 us |     0.5006 us |     0.6148 us | 0.003 |    0.00 |     9.9182 |     1.5564 |         - |    121.75 KB |       0.004 |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this! I learned a lot with just a few comments.

  1. I never knew the enumerator incorporated IDisposable since I actually have never had a need for it until now 😅
  2. Stupid question but why does iterating after the sequence is made make such a difference between the function and expression? Is there any reading material you could by chance point to, if it exists?
  3. Since it sounds like stack overflow is inevitable here, do you think adding a xml comment warning about this would be the only thing we can do to help here? Or maybe you have some additional suggestions/ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ignore point 2, it just clicked for me 🤦

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it sounds like stack overflow is inevitable here

The version I posted (using seq { yield ok; yield! oks } and Seq.rev at the end) doesn't blow the stack; using seq { yield! oks; yield ok } instead (note the order of the yield! and yield), would, though.

@1eyewonder
Copy link
Contributor Author

Here are the updated benchmarks after making the changes @brianrourkeboll recommended. I'm hoping to not make it too cluttered, but I felt it was good information to keep historical record of if people come looking back at this later. I'm open to recommendations on styling/organization.

image

Copy link
Collaborator

@TheAngryByrd TheAngryByrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this! Excellent work :)

@TheAngryByrd TheAngryByrd merged commit a48180d into demystifyfp:master Sep 23, 2024
25 checks passed
TheAngryByrd added a commit that referenced this pull request Sep 23, 2024
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
TheAngryByrd added a commit that referenced this pull request Sep 23, 2024
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
TheAngryByrd added a commit that referenced this pull request Sep 23, 2024
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
@1eyewonder
Copy link
Contributor Author

Thank you both for the patience and reviews! 😄

@1eyewonder 1eyewonder mentioned this pull request Dec 15, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants