-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert #14452 and make compile-time operations on stable arguments stable #15268
Revert #14452 and make compile-time operations on stable arguments stable #15268
Conversation
@@ -165,6 +165,8 @@ object Types { | |||
case _: SingletonType | NoPrefix => true | |||
case tp: RefinedOrRecType => tp.parent.isStable | |||
case tp: ExprType => tp.resultType.isStable | |||
case AppliedType(tycon: TypeRef, args: List[Type]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nervous about performance implications here. isStable
is a hotspot and what you do here seems to be about an order of magnitude more expensive than the rest. If we want to go down that route we probably should cache the value in AppliedType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b6704d7 caches isStable
in AppliedType
. As isStable
does not depend on type variables, can we just cache it once and for all? Or should it still depend on the period? All current tests pass with the first alternative.
In term of performance, I couldn't see any difference with our current benchmarks, so I added a micro benchmark measuring specifically the performance of isStable
: 9ca8891.
Before any change (main
):
Benchmark (valName) Mode Cnt Score Error Units
IsStable.isStableBenchmark int thrpt 30 31573088.895 ± 446326.483 ops/s
IsStable.isStableBenchmark deepSumUnstable thrpt 30 31905715.432 ± 403879.002 ops/s
IsStable.isStableBenchmark deepSumStable thrpt 30 31632042.261 ± 329863.487 ops/s
With applied-types-aware isStable
but without cache (9ca8891.):
Benchmark (valName) Mode Cnt Score Error Units
IsStable.isStableBenchmark int thrpt 30 27618898.774 ± 351504.087 ops/s
IsStable.isStableBenchmark deepSumUnstable thrpt 30 2702127.175 ± 99159.508 ops/s
IsStable.isStableBenchmark deepSumStable thrpt 30 815868.091 ± 27206.935 ops/s
With applied-types-aware isStable
cached (b6704d7):
Benchmark (valName) Mode Cnt Score Error Units
IsStable.isStableBenchmark int thrpt 30 31601394.759 ± 228033.535 ops/s
IsStable.isStableBenchmark deepSumUnstable thrpt 30 31393180.896 ± 336974.793 ops/s
IsStable.isStableBenchmark deepSumStable thrpt 30 31814767.854 ± 531967.869 ops/s
This confirms that the performance of isStable
on a deep type applications (balanced tree of depth 4) was indeed 1 or 2 order of magnitude slower without cache. By the way, don't we have exactly the same problems with intersections and unions? Why is it particularly a problem with applied types?
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/15268/ to see the changes. Benchmarks is based on merging with main (6783853) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
performance test scheduled: 6 job(s) in queue, 1 running. |
1 similar comment
performance test scheduled: 6 job(s) in queue, 1 running. |
Note that an important consequence of making compile-time operations on stable arguments stable is that it enables to use operation types where singletons are expected as demonstrated by this test: summon[t85.type + t86.type <:< Singleton] For now, this however does not work: summon[t85.type + t86.type <:< Singleton & Int] I plan to fix it in a separate PR. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/15268/ to see the changes. Benchmarks is based on merging with main (78dbc59) |
80d5136
to
b762b35
Compare
This comment was marked as outdated.
This comment was marked as outdated.
test performance please |
performance test scheduled: 1 job(s) in queue, 1 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/15268/ to see the changes. Benchmarks is based on merging with main (76a0b29) |
The previous run of the compilation benchmarks shows no significant differences. Following are some more detailed micro benchmarks results. isStable
Each point is the throughput for a JMH iteration, relative to the median of the corresponding "Baseline" iterations. Note the logarithmic scale for the y-axis. Higher is better. normalized
See legends in first graph. Linear scales. Higher is better. simplifiedSimilar observations as for See legends in first graph. Linear scales. Higher is better. |
b762b35
to
8b08215
Compare
@@ -4242,7 +4253,7 @@ object Types { | |||
// final val one = 1 | |||
// type Two = one.type + one.type | |||
// ``` | |||
case tp: TermRef => tp.underlying | |||
case tp: TypeProxy if tp.underlying.isStable => tp.underlying.fixForEvaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasoning here: no need to check that tp
is stable because it will be if its underlying type is stable. Could someone confirm if this is correct?
Otherwise, we might want:
case tp: TypeProxy if tp.isStable && tp.underlying.isStable => tp.underlying.fixForEvaluation
or
case tp: SingletonType if tp.isStable && tp.underlying.isStable => tp.underlying.fixForEvaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least morally this is correct.
d3a71ed
to
9bfc2f5
Compare
After caching The y-axis unit is operations per second. Note the logarithmic scale. Higher is better. I am not sure if these results are generally relevant or are only artifacts of these micro benchmarks. |
test performance with #compiletime please |
performance test scheduled: 3 job(s) in queue, 1 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/15268/ to see the changes. Benchmarks is based on merging with main (836ed97) |
I think you mean overriding not overloading, also it's not clear from the graph if bigger numbers are better or worse, but usually overriding is faster |
Indeed, fixed.
Legend added. The unit of the y-axis is ops/sec; higher is better. |
As discussed during today's dotty meeting:
|
9bfc2f5
to
cba629b
Compare
cba629b
to
9205cc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -4242,7 +4253,7 @@ object Types { | |||
// final val one = 1 | |||
// type Two = one.type + one.type | |||
// ``` | |||
case tp: TermRef => tp.underlying | |||
case tp: TypeProxy if tp.underlying.isStable => tp.underlying.fixForEvaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least morally this is correct.
It actually does work; I just didn't realize that summon[t85.type + t86.type <:< (Singleton & Int)] // compiles |
Revert #14452
#14452 made compile-time operations covariant. The use-case was mainly to allow allow the compile to remove skolems around operation arguments. That for example enables the following:
Unfortunately, we realized afterwards that this adds significant constraints on the normalization of these compile-time operations. To preserve transitivity of algorithmic subtyping, we need to make sure that
∀ T, U. T <: U → Norm[T] <: Norm[U]
, which is non-trivial to achieve when compile-time operations are covariant.For instance, consider the two following situations (an arrow from
A
toB
meansA <: B
):In both cases, the problem is the red arrow:
1 + 1
to2
, then we should make sure that it is still a subtype of1 + (2 | 3)
. This could be achieved by distributing additions and multiplications over unions, but we might not want to do it because this would make the runtime/space of normalization exponential.1 + (a : A)
is normalized to(a: A) + 1
, how do we ensure that the later is also a subtype of1 + A
? We thought about a few possible ways to do that—for example by approximating non-singleton types by their known atoms—, but it seems that if we want to handle the general case, we can't avoid brutally trying different orders of operands to compare operations, which is also exponential in the worst case.Conclusion: making compile-time operations covariant complexifies the implementation of normalization because it creates new subtype edges that we would need to preserve when normalizing. We do not want to handle all these cases now, we revert #14452 to make them invariant again (reverted in 12a10f2).
Make compile-time operations on stable arguments stable
So we don't want compile-time operations to be covariant, but we still need:
In order to achieve this:
Type.isStable
so that compile-time operations with stable arguments are considered stable,fixForEvaluation
method insideAppliedType.tryCompiletimeConstantFold
so that arguments of compile-time operations are dereferenced to their deepest underlying stable type.See
tests/neg/singleton-ops-int.scala
for the new tests it enables to pass.Add micro benchmarks
To check the performance of common operations on types, 823d710 adds a new JMH suite that benchmarks the runtime of the common type operations
isStable
,normalized
,simplified
,dealias
,widen
,atoms
,isProvisional
on some simple types (AppliedType
,TermRef
, etc.).The suite can be run with:
The use-case of these benchmarks for this PR was mainly to test different caching strategies for
Type#isStable
and to make sure that they indeed result in a performance gain. See results in comments for more details.