-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't recompute IR children #9915
Conversation
Profiling benchmarks revealed that we spend non-neglible amount of time building IR preorder which in turns recomputes IR nodes' children. IR is immutable so there is not need to recompute that data for every pass. Note that the change uses `lazy val` because benchmarks weren't showing much improvements when allocating children on creation (with `val`). We could probably improve the situation by using a poor version of `lazy val` that does not employ synchronization.
@@ -138,7 +138,7 @@ object CallArgument { | |||
|""".toSingleLine | |||
|
|||
/** @inheritdoc */ | |||
override def children: List[IR] = name.toList :+ value | |||
override lazy val children: List[IR] = name.toList :+ value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the change, but it should've been Vector, instead of a List
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I heard @4e6 saying we shouldn't use List
as it does an eager copy, but a lazy iterator. With List
the preorder()
implementation:
- calls
children
on each one - the
children
implementation constructs aList
- the
preoder
then concatenates the list to its ownList
representing the wholeIR
tree
Where can the slowdown be? Only in the construction of of children
List
, right? This PR tries to trade that CPU cost for increased memory consumption.
Wouldn't it be better to change the children
to follow more efficient pattern and not allocate its own List
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List
is a classic linked list https://github.com/scala/scala/blob/e858de50d51e13d225ac5b35015aac54cd71e23e/src/library/scala/collection/immutable/List.scala#L655-L660
It is suitable for implementing control flows (where you can efficiently pattern-match on head and tail), and maybe in prepending elements https://august.nagro.us/list-vs-vector.html. But it is not how we're using it.
Vector
is a radix-balanced finger tree backed by arrays and should be more efficient in mixed use-cases (like ours append+traverse) both CPU and memory wise scala/scala#8534
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our usecase is "Construction with Append + Traversals" where Vector is faster. Possibly even better with children(addTo : Vector.Builder) : Unit
.
Up to you guys, I don't want to be expert on Scala data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to be expert on Scala data structures.
I continue to believe the best might be to have a lazy iterator like in this case of iterating all children of a directory - e.g. a queue where each nextElement()
adds its children to the end of the queue.
I believe I have also seen
Can't we just cache the result of
Can that be explained by any other hypothesis than: We do create some
I don't understand this sentence. |
engine/runtime-parser/src/main/scala/org/enso/compiler/core/ir/Name.scala
Show resolved
Hide resolved
What's the effect on benchmarks? |
I ran the benchmarks and locally I was seeing minor improvements but on CI not so much. So I'm wondering if it is worth integrating. Unless I'm doing it wrong. |
Note that
Maybe.
I meant that if this is a hotspot and |
After some thorough profiling it turned out that this solution makes things... worse. I will try a different approach with mutable collections and see if this gets us anything better. |
Pull Request Description
Profiling benchmarks revealed that we spend non-neglible amount of time building IR preorder which in turns recomputes IR nodes' children. IR is immutable so there is not need to recompute that data for every pass.
Note that the change uses
lazy val
because benchmarks weren't showing much improvements when allocating children on creation (withval
). We could probably improve the situation by using a poor version oflazy val
that does not employ synchronization.Important Notes
Pending benchmark results.
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.