Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B plot is misleading for p2p-disk memory usage and tick measures #1240

Open
crusaderky opened this issue Dec 19, 2023 · 1 comment
Open

Comments

@crusaderky
Copy link
Contributor

This A/B plot seems to highlight extremely severe regressions in memory usage for three p2p-disk tests:

image

The bar charts, however, tell us a completely different story:
image

While it is true that the test more than doubled its memory usage.... it is double of a negligible amount, which is still negligible.

Find a smart way to remove the false positive from the A/B test. We still want to measure RAM usage, in case something goes wrong and negligible use becomes non-negligible. Maybe decide that if both median measures are below a threshold (2 GiB/worker?) the bar should be suppressed.

The recently added measures for

  • Avg CPU (scheduler)
  • Max tick (worker)
  • Max tick (scheduler)

suffer from the same problem. We clearly don't care if avg cpu changed e.g. from 10 to 15%, or if the tick went up from 50ms to 75ms - but for the A/B plot they are 50% increases:

image
image

@fjetter
Copy link
Member

fjetter commented Dec 19, 2023

I believe you are talking about two different problems.

The first one about the p2p-disk is about absolute changes that you do not consider meaningful. I agree that this is misleading but I also think it's not bad since every major difference should be double checked.

Regarding the newly added metrics about tick duration, etc., this is rather a problem about our statistical evaluation. These quantities do have a substantial error in their measurement we are not accounting for. I never checked the math but with such large variations, there shouldn't be any signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants