Remove estimations where score data is available for osu! difficulty calculations #27691

Finadoggie · 2024-03-22T06:47:17Z

This PR replaces estimations with score data where applicable for Lazer scores. More specifically, it…

Modifies estimations for slider breaks
Removes estimations for slider end drops

For removing slider break estimations:

HitResult.Miss includes both misses and slider head misses, accounting for two types of combo breaks
HitResult.LargeTickMiss includes dropped reverse arrows and dropped slider ticks
For Lazer scores, large tick misses are deliberately capped to avoid large amounts of them nuking a score, as buzz sliders and high slider tick rates can easily reward players with numerous amounts of large tick misses
Scores with no large tick misses will have no estimated misses, removing an issue where lazer scores were unfairly punished for theoretical slider breaks that were demonstrably not present

For removing slider end drop estimations:

Slider end drops and large tick misses are used in place of estimated slider end drops
Calculations which use estimated slider end drops are otherwise left untouched

Notes:

Classic mod is not being considered as ppy has previously said that pp for classic mod should match stable

=== original post ===
(contains outdated info)

On non-CL lazer scores, extra slider data means certain estimations are no longer required.

For effective miss count, HitResults.Miss + HitResults.LargeTickMiss is used.

For sliderends dropped, HitResults.SmallTickMiss is used.

To my understanding, LargeTickMiss includes slider ticks and reverse arrows, the two parts that can break your combo, while SmallTickMiss is for sliderends. Please correct me if I am wrong.

Score data for non-CL scores includes sliderends dropped, meaning no need to estimate. CL scores are still estimated.

…or classic" This reverts commit 941c048.

No need to estimate misses for non-CL scores.

Miss count fix

Use actual sliderends dropped instead of estimating

Very much open to discussion on if these should be weighed differently

bdach · 2024-03-22T06:55:04Z

I am not sure the "available" score data in question can be used. We've previously expressed worry that this was going to spiral into two separate implementations of pp for classic and non-classic scores.

Finadoggie · 2024-03-22T08:28:31Z

My bad, I was not aware of these conversations. Could you drop a link to them so I can read through them?

bdach · 2024-03-22T09:16:44Z

#21938 would be a good starting point I guess

Natelytle · 2024-03-22T16:09:44Z

Stan's decision of splitting the calculator into 2 parts for classic and non-classic is a direction I personally dislike, and the way Fina is doing it here is the way I would approach it myself. I don't see any harm in using the new metrics, and PP spiraling into 2 calculators should only happen if per note judgements ever become available, and prove to give much better results than without them. These new metrics are easily estimable, and the existing estimations can just be made harsher if they prove to give an advantage to stable scores.

Givikap120 · 2024-03-22T16:54:26Z

Why is this only using additional data if slideracc enabled?
Lazer scores with CL mods also have additional data that should be used to get more accurate result.

Flamiii · 2024-03-22T20:06:07Z

The implications of a change like this are massive. If I set scores with sliderbreaks, I will only ever be punished more on lazer than I would be if I played on stable. I play on lazer as my main client, but a change like this on its own would make me consider moving back to stable because I don't want to play with something like this that is just a clear disadvantage. You could try tuning the estimations for stable scores to be more harsh, but I doubt people would be happy with that and I'm honestly not sure if that's even possible. The complexity to balance this kind of change is so high that I don't think it's worth the trouble to begin with.

Finadoggie · 2024-03-22T20:13:52Z

@Flamiii you’re already at an objective disadvantage if you play on lazer. This arguably helps you since it means miss estimations can no longer assume you may have sliderbroke where you didn’t. Any non-cl 1 miss plays where the break was in the middle are buffed. Same for plays on hard slider maps where you broke but didn’t drop any sliderends.

Flamiii · 2024-03-22T20:51:13Z

If by "objective disadvantage" you mean slideracc, I play on lazer because I'm certain that will be taken into account in the future (#27063). The scenario I'm thinking of which happens quite a bit is the one where a player misses near the end of a map but also sliderbreaks 2 times afterwards. In stable, this play is counted as a 1 miss and the sliderbreaks only negatively impact your accuracy. With these changes in lazer, this play would be treated as if it was a 3 miss. As far as I know, there really isn't a way to adjust stable's estimation to make this scenario balanced because the relevant information just isn't there. There are some situations where you're punished a bit less, but the scenario I mentioned is extremely common and you would be punished way more for playing on lazer with these changes.

Edit: It's also worth noting that using HitResults.LargeTickMiss to increase the effective miss count means that missing buzz sliders or any fast sliders with many sliderticks will be incredibly punishing. For example, misaiming a buzz slider with 5 repeats and missing it entirely would increase the effective miss count by 6, which would basically kill any score.

Rekunan · 2024-03-22T22:07:54Z

there really isn't a way to adjust stable's estimation

There actually is a way, if you take a look at the function in question, we could, for example, lower fullComboThreshold to make it assume there were more misses than before, and if you look at the changes in this pr, this function is only used if CL with slideracc is on.

misaiming a buzz slider with 5 repeats ... would basically kill any score

This is a valid point, however, I can assure you that future work will be done to account for this.

Flamiii · 2024-03-22T22:41:46Z

There actually is a way, if you take a look at the function in question, we could, for example, lower fullComboThreshold to make it assume there were more misses than before, and if you look at the changes in this pr, this function is only used if CL with slideracc is on.

I see. As long as these changes are balanced properly, I have no issues with them.

Finadoggie · 2024-03-22T23:45:26Z

shoot I hit the wrong button

Finadoggie · 2024-03-22T23:49:08Z

The scenario I'm thinking of which happens quite a bit is the one where a player misses near the end of a map but also sliderbreaks 2 times afterwards.

Ok but again this is already a thing. Sliderbreaks by missing the slider head are already counted as misses in lazer, so this scenario can already happen without any pp changes.

It's also worth noting that using HitResults.LargeTickMiss to increase the effective miss count means that missing buzz sliders or any fast sliders with many sliderticks will be incredibly punishing. For example, misaiming a buzz slider with 5 repeats and missing it entirely would increase the effective miss count by 6, which would basically kill any score.

I know, I'm more apprehensive about that part and wanted to spark discussion for that specifically.
Also buzz sliders are a lot more forgiving in lazer anyways so a late tap leading to complete score annihilation isn't something I'm worried about.

The crux of the matter is, some scores will be punished by being unable to slip through the cracks. That's inevitable. The gain of scores no longer being undeservingly punished from estimations counteracts this and, at least in my personal opinion, is a fair trade off.

As per suggestion by givikap, I was not aware that non-legacy cl scores stored this data

…into estimation-removal

@Flamiii

After letting the comments @Flamiii left brew for a while, I realized they were very much right about the buzz slider thing. As such, I've implemented a quick and dirty untested fix that will hopefully have zero unintended side-effects :) I don't see this as a permanent or final solution yet. There's definitely some potential issues/inaccuracies that could arise with maps like Notch Hell or IOException's Black Rover, but afaik this implementation would not cause any issues that stable doesn't already have.

peppy · 2024-10-21T05:52:11Z

!diffcalc

Finadoggie · 2024-10-21T05:58:39Z

Updated the original post to hopefully make the goals of the PR more clear

Finadoggie · 2024-10-21T07:14:02Z

@smoogipoo multiple people have told me that a unit test is no longer required since I've cleaned up and simplified everything, do you still want me to write one?

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs

slightly miffed by the lack of build errors but oh well

smoogipoo · 2024-10-21T08:05:59Z

This is honestly harder to read now than before, now that it has 4 separate places individually dealing with "usingClassicSliderAccuracy".

minisbett · 2024-10-21T08:15:58Z

This is honestly harder to read now than before, now that it has 4 separate places individually dealing with "usingClassicSliderAccuracy".

I honestly think it's fine, at least to me the usingClassicSliderAccuracy checks seem intuitive, even if used in multiple places. Do you think separating calculateEffectiveMisscount for slider-acc and non-slider-acc makes sense?

minisbett · 2024-10-21T10:08:25Z

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs

+                {
+                    double fullComboThreshold = attributes.MaxCombo - countSliderEndsDropped;


Suggested change

{

double fullComboThreshold = attributes.MaxCombo - countSliderEndsDropped;

{

// Consider that full combo is maximum combo minus dropped slider tails since they don't contribute to combo but also don't break it

double fullComboThreshold = attributes.MaxCombo - countSliderEndsDropped;

A comment to stay consistent with the other if-branch would be nice

Repeating the same comment is redundant

fair just wanted to mention it incase it is desired

minisbett · 2024-10-21T10:09:37Z

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs

+        /// <summary>
+        /// Missed slider ticks that includes missed reverse arrows. Will only be correct on non-classic scores
+        /// </summary>
+        private int countSliderTickMiss;


I feel like renaming this to countLargeTickMiss makes it alot clearer what it is because despite the comment reading it anywhere else in code could put up confusion... I don't see why it shouldn't be named 1:1 after the hit result it represents.

"LargeTick" does not represent anything on its own unless you're aware of what it actually contains

I feel like it's still a more meaningful term since it matches the hit result and also the "Large Tick Misses" field I added to osu-tools GUI and soon to CLI.

minisbett · 2024-10-21T10:23:21Z

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs

+        /// <summary>
+        /// Amount of missed slider tails that don't break combo. Will only be correct on non-classic scores
+        /// </summary>
+        private int countSliderEndsDropped;


Since slider-related code uses the term "tail" (and the hit result does too), I'd suggest renaming this to countSliderTailMiss or countSliderTailsDropped. I'd prefer to consider "miss" the opposite of hit even for slider tails but calling them "dropped" is arguable

minisbett · 2024-10-21T10:23:36Z

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs

+                double estimateImproperlyFollowedDifficultSliders;
+
+                if (usingClassicSliderAccuracy)
+                {
+                    // When the score is considered classic (regardless if it was made on old client or not) we consider all missing combo to be dropped difficult sliders
+                    int maximumPossibleDroppedSliders = totalImperfectHits;
+                    estimateImproperlyFollowedDifficultSliders = Math.Clamp(Math.Min(maximumPossibleDroppedSliders, attributes.MaxCombo - scoreMaxCombo), 0, estimateDifficultSliders);
+                }
+                else
+                {
+                    // We add tick misses here since they too mean that the player didn't follow the slider properly
+                    // We however aren't adding misses here because missing slider heads has a harsh penalty by itself and doesn't mean that the rest of the slider wasn't followed properly
+                    estimateImproperlyFollowedDifficultSliders = Math.Min(countSliderEndsDropped + countSliderTickMiss, estimateDifficultSliders);
+                }
+
+                double sliderNerfFactor = (1 - attributes.SliderFactor) * Math.Pow(1 - estimateImproperlyFollowedDifficultSliders / estimateDifficultSliders, 3) + attributes.SliderFactor;


Suggested change

double estimateImproperlyFollowedDifficultSliders;

if (usingClassicSliderAccuracy)

{

// When the score is considered classic (regardless if it was made on old client or not) we consider all missing combo to be dropped difficult sliders

int maximumPossibleDroppedSliders = totalImperfectHits;

estimateImproperlyFollowedDifficultSliders = Math.Clamp(Math.Min(maximumPossibleDroppedSliders, attributes.MaxCombo - scoreMaxCombo), 0, estimateDifficultSliders);

}

else

{

// We add tick misses here since they too mean that the player didn't follow the slider properly

// We however aren't adding misses here because missing slider heads has a harsh penalty by itself and doesn't mean that the rest of the slider wasn't followed properly

estimateImproperlyFollowedDifficultSliders = Math.Min(countSliderEndsDropped + countSliderTickMiss, estimateDifficultSliders);

}

double sliderNerfFactor = (1 - attributes.SliderFactor) * Math.Pow(1 - estimateImproperlyFollowedDifficultSliders / estimateDifficultSliders, 3) + attributes.SliderFactor;

double improperlyFollowedDifficultSliders;

if (usingClassicSliderAccuracy)

{

// When the score is considered classic (regardless if it was made on old client or not) we consider all missing combo to be dropped difficult sliders

int maximumPossibleDroppedSliders = totalImperfectHits;

improperlyFollowedDifficultSliders = Math.Clamp(Math.Min(maximumPossibleDroppedSliders, attributes.MaxCombo - scoreMaxCombo), 0, estimateDifficultSliders);

}

else

{

// We add tick misses here since they too mean that the player didn't follow the slider properly

// We however aren't adding misses here because missing slider heads has a harsh penalty by itself and doesn't mean that the rest of the slider wasn't followed properly

improperlyFollowedDifficultSliders = Math.Min(countSliderEndsDropped + countSliderTickMiss, estimateDifficultSliders);

}

double sliderNerfFactor = (1 - attributes.SliderFactor) * Math.Pow(1 - improperlyFollowedDifficultSliders / estimateDifficultSliders, 3) + attributes.SliderFactor;

The variable name is quite long and since it's not always estimated so it doesn't necessarily have a valuable meaning besides making the code more straining to read.

I prefer longer variable names that encapsulate what it actually is instead of hiding the fact that it can be estimated (and will be for billions of scores)

hmm idk, I feel like there isn't much difference between "estimated" when it can or cannot be vs. not saying it when it can or cannot be

github-actions · 2024-10-21T10:34:44Z

Target: #27691
Spreadsheet: https://docs.google.com/spreadsheets/d/1vh1XOA0udwR48ppRILVAC6TIsDkfr5Y4WKKIzYpIu5c/edit

tsunyoku · 2024-10-21T10:38:07Z

we'll need a new sheet following recent commits, 98800fe will have resulted in value changes even before StanR's refactors

smoogipoo · 2024-10-22T02:14:22Z

!diffcalc

github-actions · 2024-10-22T10:12:10Z

Target: #27691
Spreadsheet: https://docs.google.com/spreadsheets/d/1pwt7d5ZcHz4uTQQREgCLQ06EqL5nhcxmyMF7ZcczqVg/edit

smoogipoo · 2024-10-22T10:14:52Z

@ppy/osu-pp-committee please check above sheet!

stanriders · 2024-10-22T10:18:10Z

@smoogipoo all good!

apollo-dw

one more for luck :]

smoogipoo

Reads fine to me now

Finadoggie and others added 8 commits March 21, 2024 19:02

Make length bonus account for sliders, use proper misscount for classic

941c048

Use actual sliderends dropped instead of estimating

4db6f28

Score data for non-CL scores includes sliderends dropped, meaning no need to estimate. CL scores are still estimated.

Revert "Make length bonus account for sliders, use proper misscount f…

3dafdc0

…or classic" This reverts commit 941c048.

Use miss count for effective miss count

8408455

No need to estimate misses for non-CL scores.

Merge pull request #1 from Finadoggie/miss-count-fix

12afa8d

Miss count fix

Merge branch 'estimation-removal' into dropped-tail-fix

eb30b4a

Merge pull request #2 from Finadoggie/dropped-tail-fix

c9e3c10

Use actual sliderends dropped instead of estimating

Update OsuPerformanceCalculator.cs

b0d20e6

pull-request-size bot added the size/S label Mar 22, 2024

Add slider ticks and reverse arrows to effective misscount

6fe478c

Very much open to discussion on if these should be weighed differently

bdach added the area:difficulty label Mar 22, 2024

Finadoggie closed this Mar 22, 2024

Finadoggie reopened this Mar 22, 2024

Finadoggie and others added 4 commits March 23, 2024 14:27

Merge branch 'master' into estimation-removal

4f5f0e5

Use sliderend data for all non-legacy scores

58bc184

As per suggestion by givikap, I was not aware that non-legacy cl scores stored this data

Merge branch 'estimation-removal' of https://github.com/Finadoggie/osu …

c24f99e

…into estimation-removal

Finadoggie marked this pull request as draft April 11, 2024 17:28

Only clamp estimated miss count with relevant statistics

5907c2a

minisbett suggested changes Oct 21, 2024

View reviewed changes

osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs Outdated Show resolved Hide resolved

Fix variables being used before being assigned

98800fe

slightly miffed by the lack of build errors but oh well

smoogipoo mentioned this pull request Oct 21, 2024

Change convert-to-ternary warning to hint #30373

Merged

Refactor and add comments

bcb9970

pull-request-size bot added size/M and removed size/S labels Oct 21, 2024

Fix effectiveMissCount being calculated wrong

acf282d

peppy added the next release Pull requests which are almost there. We'll aim to get them in the next release, but no guarantees! label Oct 21, 2024

minisbett suggested changes Oct 21, 2024

View reviewed changes

smoogipoo mentioned this pull request Oct 21, 2024

Score processing command updates for upcoming (and future PP deploys) ppy/osu-queue-score-statistics#284

Merged

tsunyoku approved these changes Oct 22, 2024

View reviewed changes

stanriders approved these changes Oct 22, 2024

View reviewed changes

apollo-dw approved these changes Oct 22, 2024

View reviewed changes

smoogipoo approved these changes Oct 22, 2024

View reviewed changes

smoogipoo merged commit 71eb712 into ppy:master Oct 22, 2024
9 of 13 checks passed

tsunyoku mentioned this pull request Oct 22, 2024

Fix Slideraim penalty for missing slider ends #29991

Closed

Finadoggie deleted the estimation-removal branch October 24, 2024 23:09

Rekunan mentioned this pull request Nov 8, 2024

Clamp estimateImproperlyFollowedDifficultSliders for lazer scores #30544

Merged

		{
		double fullComboThreshold = attributes.MaxCombo - countSliderEndsDropped;

Remove estimations where score data is available for osu! difficulty calculations #27691

Remove estimations where score data is available for osu! difficulty calculations #27691

Conversation

Finadoggie commented Mar 22, 2024 • edited Loading

bdach commented Mar 22, 2024 • edited Loading

Finadoggie commented Mar 22, 2024

bdach commented Mar 22, 2024

Natelytle commented Mar 22, 2024

Givikap120 commented Mar 22, 2024

Flamiii commented Mar 22, 2024

Finadoggie commented Mar 22, 2024

Flamiii commented Mar 22, 2024 • edited Loading

Rekunan commented Mar 22, 2024

Flamiii commented Mar 22, 2024 • edited Loading

Finadoggie commented Mar 22, 2024

Finadoggie commented Mar 22, 2024 • edited Loading

peppy commented Oct 21, 2024

Finadoggie commented Oct 21, 2024

Finadoggie commented Oct 21, 2024

smoogipoo commented Oct 21, 2024 • edited Loading

minisbett commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stanriders Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2024

tsunyoku commented Oct 21, 2024

smoogipoo commented Oct 22, 2024

github-actions bot commented Oct 22, 2024

smoogipoo commented Oct 22, 2024

stanriders commented Oct 22, 2024

apollo-dw left a comment

Choose a reason for hiding this comment

smoogipoo left a comment

Choose a reason for hiding this comment

Finadoggie commented Mar 22, 2024 •

edited

Loading

bdach commented Mar 22, 2024 •

edited

Loading

Flamiii commented Mar 22, 2024 •

edited

Loading

Flamiii commented Mar 22, 2024 •

edited

Loading

Finadoggie commented Mar 22, 2024 •

edited

Loading

smoogipoo commented Oct 21, 2024 •

edited

Loading

stanriders Oct 21, 2024 •

edited

Loading