Metrics support for sweeping #148

maximsch2 · 2021-03-29T18:22:50Z

🚀 Feature

We would like to have tighter integration of metrics and sweeping. This requires a few features:

Knowing if higher_is_better (e.g. are we trying to minimize or maximize the metric in a sweep)
Knowing what value to optimize for. E.g. if a recall@precision metric returns both recall value and corresponding threshold, we want to optimize by maximizing recall and ignoring the threshold.

Alternatives

An alternative implementation will be for each metric to have is_better(left: TMetricResult, right: TMetricResult) where TMetricResult is whatever compute returns.

If we don't have it, people will have to have wrappers around the metrics to support this functionality in sweepers.

The text was updated successfully, but these errors were encountered:

SkafteNicki · 2021-03-29T19:39:38Z

I think it is great addition.
@maximsch2 is there a specific framework that you had in mind where this would give better integration?

maximsch2 · 2021-03-29T22:43:13Z

Yeah, maybe something like this: https://ax.dev/tutorials/tune_cnn.html

But I do imagine that any sort of sweeping requires us to be able to a) select the target metric b) compare two runs to see if the metric

SkafteNicki · 2021-03-30T07:34:43Z

@maximsch2 what do you think about something like this:

_REGISTER = {}

def register(metric, minimize, index=None):
    if minimize:
        compare_fn = torch.less
        init_val = torch.tensor(float("inf"))
    else:
        compare_fn = torch.greater
        init_val = -torch.tensor(float("inf"))
    _REGISTER[metric] = (minimize, compare_fn, init_val, index)

register(MeanSquaredError, True)


class MetricCompare:
    def __init__(self, metric):
        self.base_metric = metric
        minimize, compare_fn, init_val, index = _REGISTER[type(metric)]
        self._minimize = minimize
        self._compare_fn = compare_fn
        self._index = index
        self._init_val = init_val
        self._new_val = deepcopy(init_val)
        self._old_val = deepcopy(init_val)
        
    def update(self, *args, **kwargs):
        self.base_metric.update(*args, **kwargs)

    def compute(self):
        self._old_val = self._new_val
        val = self.base_metric.compute()
        self._new_val = val.detach()
        return val
    
    def reset(self):
        self.base_metric.reset()
        self._new_val = deepcopy(self._init_val)
        self._old_val = deepcopy(self._init_val)

    @property
    def has_improved(self):
        if self._index is None:
            return self._compare_fn(self._new_val, self._old_val)
        else:
            return self._compare_fn(self._new_val[index], self._old_val[index])

    @property
    def minimize(self):
        return self._minimize
    
    @property
    def maximize(self):
        return not self.minimize

metric = MetricCompare(MeanSquaredError())
metric.update(torch.randn(100,), torch.randn(100,))
val = metric.compute()
print(metric.has_improved)

this is basically a wrapper for metrics that adds additional properties that can tell if the metric should be minimized/maximized and after compute is called if it has improved.

maximsch2 · 2021-03-30T17:56:31Z

Usually sweeps will be run in a distributed fashion (e.g. schedule runs with different hyperparams separately, compute metric values, pick the one with the best metric), so has_improved might not be as useful there.

Thinking about it a bit more, just providing a way to convert a metric to optimization value might be enough (with a semantics that we are increasing or decreasing it).

Another example of package for hyperparam optimization that also takes objective: http://hyperopt.github.io/hyperopt/

breznak · 2021-04-01T11:11:46Z

I'd like to see this implemented as well. We're using PL + Optuna (+ Hydra's plugin_sweeper_optuna) and running into the same problem. Esp. when a metric of a model is configurable.

I think the approach with property direction() -> 'min'/'max' is simple and would suffice.

While the solutions with wrappers work, I think it'd be good if PL somehow standardized this, so the other HP optimization libraries can integrate this.

SkafteNicki · 2021-04-01T13:33:34Z

Okay, then settle on adding a property to each metric.

What should it be named?
direction->'min'/'max',
minimize->True/False,
higher_is_better->True/False
It should not be implemented for all metrics. ConfusionMatrix comes to mind where it does not make sense to talk when one if better than another
How do we deal with metric with multi output and metrics with multidim output.

breznak · 2021-04-01T20:43:46Z

ConfusionMatrix comes to mind where it [min/max] does not make sense

add -> min/max/None?

How do we deal with metric with multi output and metrics with multidim output.

Ie. Optuna let's you define a tuple

direction: 
 - minimize
 - maximize

I'd say we don't care for the first iteration and just leave these as None. And we cannot decide anyway on pareto-optimal front.

... and you probably meant multi-dim metric's output, not multidim optimization, right?
For the multidim output, we need a form of reduction.

Can we say that for the first draft, this feature works only form metrics that Loss(y_hat: Tensor, y: Tensor) -> float ?

maximsch2 · 2021-04-02T17:44:22Z

For multi-output metrics we need ability to extract the value that is actually being optimized over. E.g. some metrics can return value and corresponding threshold (e.g. recall@precision=90%) and we only want to optimize over the actual value.

Borda · 2021-04-26T22:06:39Z

@maximsch2 @breznak @SkafteNicki how is it going here? do we have a resolution on what to do?

breznak · 2021-04-28T06:58:12Z

I think we got stuck on more advanced cases (eg. metrics that return more values, as above). While I see it's important to design it well so it works for all usecases in the future, I think we should find a MVP that we can easily implement, otherwise this will likely get stuck.

In practice, what we're running into is that this would ideally be coordinated "API" for pl.metrics and torchmetrics.

E.g. some metrics can return value and corresponding threshold (e.g. recall@precision=90%) and we only want to optimize over the actual value.

could you elaborate on this example, please, @maximsch2 ? From what I understand, the metric returns multiple values for several thresholds. But wouldn't the direction still be the same for all of them? (recall -> max ?)

SkafteNicki · 2021-04-28T08:50:38Z

In practice, what we're running into is that this would ideally be coordinated "API" for pl.metrics and torchmetrics.

@breznak since pl.metrics will be deprecated in v1.3 of lightning and completely removed from v1.5, we only need to think about the torchmetrics API.

E.g. some metrics can return value and corresponding threshold (e.g. recall@precision=90%) and we only want to optimize over the actual value.

could you elaborate on this example, please, @maximsch2 ? From what I understand, the metric returns multiple values for several thresholds. But wouldn't the direction still be the same for all of them? (recall -> max ?)

I think what @maximsch2 is referring to, is that metrics such as PrecisionRecallCurve have 3 outputs:

precision, recall, thresholds = pr_curve(pred, target)

where I basically want to optimize the precision/recall but not the threshold values.

breznak · 2021-04-28T13:09:52Z

we only need to think about the torchmetrics API.

good to know, thanks! then it should be easier.

precision, recall, thresholds = pr_curve(pred, target)
where I basically want to optimize the precision/recall but not the threshold values.

how about adding a "tell us what is the (1) optimization criterion for you" to the metric, then?
Like precision, recall, thresholds = pr_curve(pred, target, optimize='recall')
Then we have 1 number that represents the "important" results from such metric.

maximsch2 · 2021-04-29T01:22:09Z

I'm actually thinking that maybe let's defer the multi-output metrics to later as long as we can support those in CompositionalMetric. E.g. for single-output metrics, we'll provide higher_is_better, but for multi-output metrics we'll skip it and rely on people doing something like CompositionalMetric(RecallAndThresholdMetric()[0], None, higher_is_better=True) which will implement the needed functions and return the single value?

breznak · 2021-04-29T07:17:24Z

I'm for starting small, but doing it rather soon.
Btw, it'd be nice to get people from Optuna/Ray/Ax/etc PL sweepers here, as those might have valuable feedback.

stale · 2021-06-28T10:26:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

maximsch2 added enhancement New feature or request help wanted Extra attention is needed labels Mar 29, 2021

stale bot added the wontfix label Jun 28, 2021

Borda added this to the v0.5 milestone Jul 2, 2021

stale bot removed the wontfix label Jul 2, 2021

Borda modified the milestones: v0.5, v0.6 Aug 3, 2021

Borda removed the help wanted Extra attention is needed label Sep 20, 2021

SkafteNicki mentioned this issue Sep 24, 2021

Metric sweeping #544

Merged

4 tasks

SkafteNicki closed this as completed in #544 Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics support for sweeping #148

Metrics support for sweeping #148

maximsch2 commented Mar 29, 2021 •

edited

Loading

SkafteNicki commented Mar 29, 2021

maximsch2 commented Mar 29, 2021

SkafteNicki commented Mar 30, 2021

maximsch2 commented Mar 30, 2021 •

edited

Loading

breznak commented Apr 1, 2021

SkafteNicki commented Apr 1, 2021

breznak commented Apr 1, 2021

maximsch2 commented Apr 2, 2021

Borda commented Apr 26, 2021

breznak commented Apr 28, 2021

SkafteNicki commented Apr 28, 2021

breznak commented Apr 28, 2021

maximsch2 commented Apr 29, 2021

breznak commented Apr 29, 2021

stale bot commented Jun 28, 2021

Metrics support for sweeping #148

Metrics support for sweeping #148

Comments

maximsch2 commented Mar 29, 2021 • edited Loading

🚀 Feature

Alternatives

SkafteNicki commented Mar 29, 2021

maximsch2 commented Mar 29, 2021

SkafteNicki commented Mar 30, 2021

maximsch2 commented Mar 30, 2021 • edited Loading

breznak commented Apr 1, 2021

SkafteNicki commented Apr 1, 2021

breznak commented Apr 1, 2021

maximsch2 commented Apr 2, 2021

Borda commented Apr 26, 2021

breznak commented Apr 28, 2021

SkafteNicki commented Apr 28, 2021

breznak commented Apr 28, 2021

maximsch2 commented Apr 29, 2021

breznak commented Apr 29, 2021

stale bot commented Jun 28, 2021

maximsch2 commented Mar 29, 2021 •

edited

Loading

maximsch2 commented Mar 30, 2021 •

edited

Loading