You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This A/B plot seems to highlight extremely severe regressions in memory usage for three p2p-disk tests:
The bar charts, however, tell us a completely different story:
While it is true that the test more than doubled its memory usage.... it is double of a negligible amount, which is still negligible.
Find a smart way to remove the false positive from the A/B test. We still want to measure RAM usage, in case something goes wrong and negligible use becomes non-negligible. Maybe decide that if both median measures are below a threshold (2 GiB/worker?) the bar should be suppressed.
The recently added measures for
Avg CPU (scheduler)
Max tick (worker)
Max tick (scheduler)
suffer from the same problem. We clearly don't care if avg cpu changed e.g. from 10 to 15%, or if the tick went up from 50ms to 75ms - but for the A/B plot they are 50% increases:
The text was updated successfully, but these errors were encountered:
I believe you are talking about two different problems.
The first one about the p2p-disk is about absolute changes that you do not consider meaningful. I agree that this is misleading but I also think it's not bad since every major difference should be double checked.
Regarding the newly added metrics about tick duration, etc., this is rather a problem about our statistical evaluation. These quantities do have a substantial error in their measurement we are not accounting for. I never checked the math but with such large variations, there shouldn't be any signal.
This A/B plot seems to highlight extremely severe regressions in memory usage for three
p2p-disk
tests:The bar charts, however, tell us a completely different story:
While it is true that the test more than doubled its memory usage.... it is double of a negligible amount, which is still negligible.
Find a smart way to remove the false positive from the A/B test. We still want to measure RAM usage, in case something goes wrong and negligible use becomes non-negligible. Maybe decide that if both median measures are below a threshold (2 GiB/worker?) the bar should be suppressed.
The recently added measures for
suffer from the same problem. We clearly don't care if avg cpu changed e.g. from 10 to 15%, or if the tick went up from 50ms to 75ms - but for the A/B plot they are 50% increases:
The text was updated successfully, but these errors were encountered: