-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Seq.traverse & sequence functions #277
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance
I think we do need to address not iterating the whole sequence as this might be a behavior people are relying on.
I have two tests below, one calling the old implementation and one calling the new. As you can see the new one iterates a whole list.
ftestCase "sequenceResultMv1 with few invalid data with long data set"
<| fun _ ->
let mutable lastValue = null
let mutable callCount = 0
let longerData =
seq {
for i in 1..1000 do
lastValue <- string i
yield string i
}
let tweets =
seq {
""
"Hello"
aLongerInvalidTweet
yield! longerData
}
let tryCreate tweet =
callCount <-
callCount
+ 1
match tweet with
| x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
| x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
| x -> Ok(x)
let actual = sequenceResultMv1 (Seq.map tryCreate tweets)
Expect.equal callCount 1 "Should have called the function only 3 times"
Expect.equal lastValue null ""
Expect.equal
actual
(Error emptyTweetErrMsg)
"traverse the sequence and return the first error"
ftestCase "sequenceResultM with few invalid data with long data set"
<| fun _ ->
let mutable lastValue = null
let mutable callCount = 0
let longerData =
seq {
for i in 1..1000 do
lastValue <- string i
yield string i
}
let tweets =
seq {
""
"Hello"
aLongerInvalidTweet
yield! longerData
}
let tryCreate tweet =
callCount <-
callCount
+ 1
match tweet with
| x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
| x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
| x -> Ok(x)
let actual = Seq.sequenceResultM (Seq.map tryCreate tweets)
Expect.equal callCount 3 "Should have called the function only 3 times"
Expect.equal lastValue null ""
Expect.equal
actual
(Error emptyTweetErrMsg)
"traverse the sequence and return the first error"
In terms of performance, this may look fine for lists of numbers but it does become a concern when it's items that take time to generate when iterating the sequence. Creating a seq { }
with a Thread.Sleep
for instance would show the impact of iterating a whole list.
"version": "0.2.0", | ||
"configurations": [ | ||
{ | ||
"name": "benchmarks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice ❤️
"version": "2.0.0", | ||
"tasks": [ | ||
{ | ||
"label": "build release", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
No, that makes sense! I can implement the mutable implementation and add the benchmarks accordingly 🫡 |
I think this is ready for re-review @TheAngryByrd. Here are the benchmarks I got. I went with inlining since it showed slightly faster results. |
src/FsToolkit.ErrorHandling/Seq.fs
Outdated
let mutable state = state | ||
let enumerator = xs.GetEnumerator() | ||
|
||
while Result.isOk state | ||
&& enumerator.MoveNext() do | ||
match state, f enumerator.Current with | ||
| Error e, _ -> state <- Error e | ||
| Ok oks, Ok ok -> | ||
state <- | ||
Seq.singleton ok | ||
|> Seq.append oks | ||
|> Ok | ||
| Ok _, Error e -> state <- Error e | ||
|
||
state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Should these technically not use
use
on the enumerator?use enumerator = xs.GetEnumerator()
-
Using
seq { yield ok; yield! oks }
and callingSeq.rev
at the end is significantly (3–4×) faster on its own thanSeq.singleton
+Seq.append
. The difference is much greater when you actually iterate the resulting sequences, where using a sequence expression is up to ~333× as fast and allocates as little as 1⁄250th as much. -
Seq.singleton
+Seq.append
as used here, orseq { yield! oks; yield ok }
, can result in a stack overflow on iteration if the input sequence is long enough andf
is defined in another assembly.
let mutable state = state | |
let enumerator = xs.GetEnumerator() | |
while Result.isOk state | |
&& enumerator.MoveNext() do | |
match state, f enumerator.Current with | |
| Error e, _ -> state <- Error e | |
| Ok oks, Ok ok -> | |
state <- | |
Seq.singleton ok | |
|> Seq.append oks | |
|> Ok | |
| Ok _, Error e -> state <- Error e | |
state | |
match state with | |
| Error _ -> state | |
| Ok oks -> | |
use enumerator = xs.GetEnumerator() | |
let rec loop oks = | |
if enumerator.MoveNext() then | |
match f enumerator.Current with | |
| Ok ok -> loop (seq { yield ok; yield! oks }) | |
| Error e -> Error e | |
else | |
Ok(Seq.rev oks) | |
loop oks |
Benchmarks
Source: https://gist.github.com/brianrourkeboll/6febebc4849708071eb08511491019b1
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|---------------------- |---------------:|--------------:|--------------:|------:|--------:|-----------:|-----------:|----------:|-------------:|------------:|
| SeqFuncs | 16.938 us | 0.2691 us | 0.2385 us | 1.00 | 0.02 | 10.1929 | 5.0354 | - | 125.05 KB | 1.00 |
| SeqExpr | 5.261 us | 0.0831 us | 0.0737 us | 0.31 | 0.01 | 4.4708 | 1.1749 | - | 54.81 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Million | 336,470.000 us | 6,425.8669 us | 6,598.8922 us | 1.00 | 0.03 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.84 KB | 1.00 |
| SeqExpr_Million | 74,218.799 us | 1,441.9837 us | 1,716.5792 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.9 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Iter_Million | 338,091.889 us | 4,229.8594 us | 3,749.6602 us | 1.00 | 0.02 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.86 KB | 1.00 |
| SeqExpr_Iter_Million | 73,353.986 us | 1,465.0880 us | 1,744.0832 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.92 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Iter | 9,771.981 us | 104.2526 us | 97.5180 us | 1.000 | 0.01 | 2578.1250 | 359.3750 | 140.6250 | 31718.95 KB | 1.000 |
| SeqExpr_Iter | 26.320 us | 0.5006 us | 0.6148 us | 0.003 | 0.00 | 9.9182 | 1.5564 | - | 121.75 KB | 0.004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this! I learned a lot with just a few comments.
- I never knew the enumerator incorporated IDisposable since I actually have never had a need for it until now 😅
- Stupid question but why does iterating after the sequence is made make such a difference between the function and expression? Is there any reading material you could by chance point to, if it exists?
- Since it sounds like stack overflow is inevitable here, do you think adding a xml comment warning about this would be the only thing we can do to help here? Or maybe you have some additional suggestions/ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually ignore point 2, it just clicked for me 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it sounds like stack overflow is inevitable here
The version I posted (using seq { yield ok; yield! oks }
and Seq.rev
at the end) doesn't blow the stack; using seq { yield! oks; yield ok }
instead (note the order of the yield!
and yield
), would, though.
… reviewer suggestions
Here are the updated benchmarks after making the changes @brianrourkeboll recommended. I'm hoping to not make it too cluttered, but I felt it was good information to keep historical record of if people come looking back at this later. I'm open to recommendations on styling/organization. |
Paket Lock Diff ReportThis report was generated via Paket Lock Diff Additions - (1)
Removals - (6)
Version Upgrades - (15)
Version Downgrades - (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this! Excellent work :)
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
Thank you both for the patience and reviews! 😄 |
Proposed Changes
Currently we have one implementation for
Seq.sequenceResultM
however, I believe theSeq.cache
is just writing the sequence into memory unnecessarily. If we use aSeq.fold
operation, we can significantly reduce the memory overhead of applications which have excessively large quantities of data i.e. hundreds of thousands or more (assuming my rough draft benchmark is written correctly)v1 - current implementation
v2 - proposed implementation
Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance, we can always discuss having a
memoize
internal to thetraverseXM'
functions.Types of changes
What types of changes does your code introduce to FsToolkit.ErrorHandling?
Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
I'm open for discussion as this was just something I was playing around with