Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why is true negative represented by 'n' in the classification matrix? #62

Closed
neomatrix369 opened this issue Oct 13, 2020 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@neomatrix369
Copy link
Contributor

Describe the bug

In the confusion matrix:

Class                           n          tp          fn          fp      recall        prec          f1
Iris-versicolor                16          15           1           0       0.938       1.000       0.968
Iris-virginica                 15          15           0           1       1.000       0.938       0.968
Iris-setosa                    14          14           0           0       1.000       1.000       1.000
Total                          45          44           1           1

The title/label for True negative is shown as n instead of tn

Expected behaviour

Most documentations, on confusion matrix I have seen so far, represent it as tn.

It might lead to doubts by those who may be aware of the standard representations. Especially the dependent metrics like recall, precision, f1, accuracy etc... are made up these base metrics (and True negative is one of them).

@neomatrix369 neomatrix369 added the bug Something isn't working label Oct 13, 2020
@neomatrix369 neomatrix369 changed the title [Question] Why is true negative represented by 'n' in the confusion matrix [Question] Why is true negative represented by 'n' in the confusion matrix? Oct 13, 2020
@Craigacp
Copy link
Member

Craigacp commented Oct 13, 2020

That's not the confusion matrix. N is the total number of that class in the test set. It doesn't show the true negatives because in multiclass settings it would give a misleading looking number. True negatives aren't found in a confusion matrix anyway, confusion matrices have the possible labels as both axes.

True negatives aren't used by precision, recall, f1 or accuracy.

@neomatrix369
Copy link
Contributor Author

That's not the confusion matrix. N is the total number of that class in the test set. It doesn't show the true negatives because in multiclass settings it would give a misleading looking number. True negatives aren't found in a confusion matrix anyway, confusion matrices have the possible labels as both axes.

True negatives aren't used by precision, recall, f1 or accuracy.

Thanks for the explanation.

In that case, it would be good to have a legend describing what each of the Classification metrics mean i.e. n = number of classes, tp = true positive, etc...

I think the current layout (without a legend) would confuse some of the others as it has confused me.

@neomatrix369
Copy link
Contributor Author

neomatrix369 commented Oct 13, 2020

How is the accuracy calculations from Tribuo different than those explained here https://en.wikipedia.org/wiki/Precision_and_recall?

image

@neomatrix369 neomatrix369 changed the title [Question] Why is true negative represented by 'n' in the confusion matrix? [Question] Why is true negative represented by 'n' in the classification matrix? Oct 13, 2020
@neomatrix369
Copy link
Contributor Author

On an unrelated discussion, false negatives is a useful metric, but I guess the other 3 metrics already provide the data indirectly - whats your thoughts on this?

@neomatrix369
Copy link
Contributor Author

Another question:

Class                           n          tp          fn          fp     
Iris-versicolor                16          16           0           1
Iris-virginica                 15          14           1           0 
Iris-setosa                    14          14           0           0
Total                          45          44           1           1

if N = number of observations per class, and tp, fn, fp are breakdowns of it then should n = tp + fn + fp for each row in the above table?

@Craigacp
Copy link
Member

That's not the confusion matrix. N is the total number of that class in the test set. It doesn't show the true negatives because in multiclass settings it would give a misleading looking number. True negatives aren't found in a confusion matrix anyway, confusion matrices have the possible labels as both axes.
True negatives aren't used by precision, recall, f1 or accuracy.

Thanks for the explanation.

In that case, it would be good to have a legend describing what each of the Classification metrics mean i.e. n = number of classes, tp = true positive, etc...

I think the current layout (without a legend) would confuse some of the others as it has confused me.

This is the output of the toFormattedString method. We could add another method that emits a legend String, or modify the toFormattedString output. I'd prefer the former (or some kind of documentation change) as pretty soon the legend would be irrelevant noise to anyone using Tribuo for any length of time. What kind of other information would you want in it?

How is the accuracy calculations from Tribuo different than those explained here https://en.wikipedia.org/wiki/Precision_and_recall?

image

That's the binary classification accuracy. Tribuo treats every classification problem as if it's multiclass, and in multiclass problems accuracy is the sum of the true positives divided by the total number of test examples. We decided early on to not allow any special casing for binary problems, as it made it difficult for things like moving from two class sentiment (positive/negative) to three class sentiment (positive/negative/neutral) as all the code paths would change.

The methods & lambdas Tribuo uses to calculate the various metrics are here and here.

On an unrelated discussion, false negatives is a useful metric, but I guess the other 3 metrics already provide the data indirectly - whats your thoughts on this?

We provide the false negatives row wise as it's hard to break down where the misclassifications are without it. However in that case it's probably best to print the confusion matrix and look at it directly.

This particular formatted string is the output we use as it's what our data science team wanted to show in their reports, and it's easy to pull out the relevant information without it taking up too much space. All the metrics we've discussed are calculated and can be accessed on the Evaluation object, including the true negatives, so others are welcome to generate their own reporting output. If there are metrics we aren't calculating we're happy to take PRs to add them to the LabelEvaluation, but I think it's a slightly higher bar to get them into the toFormattedString output as every additional metric increases the clutter.

Another question:

Class                           n          tp          fn          fp     
Iris-versicolor                16          16           0           1
Iris-virginica                 15          14           1           0 
Iris-setosa                    14          14           0           0
Total                          45          44           1           1

if N = number of observations per class, and tp, fn, fp are breakdowns of it then should n = tp + fn + fp for each row in the above table?

N = tp + fn. It's the total number of elements of that class, so it's the true positives (i.e. the things we correctly predicted as that class) plus the false negatives (i.e. the things we incorrectly predicted as members of another class) which is the denominator of the recall).

@neomatrix369
Copy link
Contributor Author

This is the output of the toFormattedString method. We could add another method that emits a legend String, or modify the toFormattedString output. I'd prefer the former (or some kind of documentation change) as pretty soon the legend would be irrelevant noise to anyone using Tribuo for any length of time. What kind of other information would you want in it?

So here's my suggestion since so many things will change with the library as you get feedback from us:

  • add a note(s) in the Classification tutorial notebook about the things you mentioned above just like the Regression Tutorial has some nice illustrations on the metrics
  • make all (other) relevant docs point to this tutorial(s), as such notebooks are best forms of documentations
  • change the toFormattedString such that it does have a bit expanded version of the acronyms i.e. n could be num of classes, fp could be False +ve, etc...

@neomatrix369
Copy link
Contributor Author

neomatrix369 commented Oct 18, 2020

That's the binary classification accuracy. Tribuo treats every classification problem as if it's multiclass, and in multiclass problems accuracy is the sum of the true positives divided by the total number of test examples.

👍 This is good information, in fact, may other resources do NOT make such details clear. So I would suggest, in the toFormatString() of all classification related classes, please specific (if known) the type of classification in hand - Binary or Multiclass, then clarity is added to the output.

For e.g. something like this would help:

Type: multiclass classification
Class                           n          tp          fn          fp      recall        prec          f1
Iris-versicolor                16          15           1           0       0.938       1.000       0.968
Iris-virginica                 15          15           0           1       1.000       0.938       0.968
Iris-setosa                    14          14           0           0       1.000       1.000       1.000
Total                          45          44           1           1

and in case of binary:

Type: binary classification
.
.
.

Even though it may appear obvious because we are printing all the classes in the output.

@neomatrix369
Copy link
Contributor Author

This printing thing you mention could give rise to PrinterClasses or OutputClasses that you could expose and have the users extend it. Such output classes could end up generating markdown or graphic/visuals as well - but that's a different feature request and discussion on its own.

@Craigacp
Copy link
Member

This is the output of the toFormattedString method. We could add another method that emits a legend String, or modify the toFormattedString output. I'd prefer the former (or some kind of documentation change) as pretty soon the legend would be irrelevant noise to anyone using Tribuo for any length of time. What kind of other information would you want in it?

So here's my suggestion since so many things will change with the library as you get feedback from us:

  • add a note(s) in the Classification tutorial notebook about the things you mentioned above just like the Regression Tutorial has some nice illustrations on the metrics

Sure we can expand the classification tutorial with an evaluation metrics section. We had that in an earlier version and it got cut to try and compress the tutorial a little (because it's already too long for the first thing that we point people at), but it could easily go back in.

  • make all (other) relevant docs point to this tutorial(s), as such notebooks are best forms of documentations

I'm not clear that pointing at the notebooks for all doc related needs is ideal, but we could link to Wikipedia or some other reference for a more detailed treatment of basic ML topics (either from the notebooks or from other pages in the docs). Our aim for the moment is that Tribuo's docs teach people about using Tribuo, and not necessarily teach them all of the ML knowledge that they could need to use Tribuo. The latter is a much larger documentation effort than the former, and we don't have anyone on our team to devote to such a task.

  • change the toFormattedString such that it does have a bit expanded version of the acronyms i.e. n could be num of classes, fp could be False +ve, etc...

We could expand the acronyms, I'll see what would fit as I don't want the tables to get much wider. For the record n is the number of test examples with that ground truth label.

That's the binary classification accuracy. Tribuo treats every classification problem as if it's multiclass, and in multiclass problems accuracy is the sum of the true positives divided by the total number of test examples.

👍 This is good information, in fact, may other resources do NOT make such details clear. So I would suggest, in the toFormatString() of all classification related classes, please specific (if known) the type of classification in hand - Binary or Multiclass, then clarity is added to the output.

For e.g. something like this would help:

Type: multiclass classification
Class                           n          tp          fn          fp      recall        prec          f1
Iris-versicolor                16          15           1           0       0.938       1.000       0.968
Iris-virginica                 15          15           0           1       1.000       0.938       0.968
Iris-setosa                    14          14           0           0       1.000       1.000       1.000
Total                          45          44           1           1

and in case of binary:

Type: binary classification
.
.
.

Even though it may appear obvious because we are printing all the classes in the output.

Well, the two formulations are equivalent (as in the binary case a true negative is just a true "positive" for the negative class, and in a binary task n = tp + fp + fn + tn so adding up all the true "positives" and dividing by the total number of examples is an identical calculation), and Tribuo does exactly the same calculations in the binary and multiclass cases, so I'm not clear what the benefit is of printing out that it's multiclass or binary.

This printing thing you mention could give rise to PrinterClasses or OutputClasses that you could expose and have the users extend it. Such output classes could end up generating markdown or graphic/visuals as well - but that's a different feature request and discussion on its own.

Sure, that's definitely something we could consider. At the moment we have the formatted string output, and a html string output (which exists because we found it useful to paste into wiki pages). Refactoring the evaluations to take some kind of output printer class would be interesting, though it would require a bit of designing to make sufficiently flexible and useful. As I said earlier, we don't actually print all the metrics that are available on LabelEvaluation in the formatted string (and some metrics are conditional on the evaluated model producing probabilistic outputs), so it would need to have a method for the user to select what they are interested in, and then possibly some complex logic to format it nicely. At the moment users can query the LabelEvaluation for whatever metrics they want and build their own formatted output, and this is how we expect most people will use it because the metrics we chose to print might not be the best for their specific use case.

@Craigacp Craigacp added documentation Improvements or additions to documentation and removed bug Something isn't working labels Oct 19, 2020
@Craigacp
Copy link
Member

In the 4.1 release we updated the classification tutorial to discuss the formatted output and generally improved the docs. I'm going to close this issue and make a separate one to track the addition of formatter/printer classes for evaluations.

@nezda
Copy link
Contributor

nezda commented May 27, 2021

Feels related to #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants