Multitarget - same metrics for several different algorithms #71

thiagonazareth · 2021-04-06T23:40:08Z

Dear, good night. I'm sorry for the English, I'm using a translator.
I am using MEKA for my master's work, which is using machine learning to predict student retention in higher education, and I came across the following situation. Using the GUI interface and running Meka Explorer
to test the multitarget algorithms, the results of Hamming score and Accuracy (per label) are the same for several different algorithms. I used two multitarget datasets available in the data MEKA folder, the
thyroid-L7.arff and solar_flare.arff, and the same behavior of equal metrics for different algorithms occurs.

Using meka.classifiers.multitarget.CC, meka.classifiers.multitarget.BCC, meka.classifiers.multitarget.CCp and meka.classifiers.multitarget.CR, all running with J48 and NaiveBayes, with default parameters, present the same results as Hamming score, Exact match, Hamming loss, ZeroOne loss, Levenshtein distance and Accuracy (per label).

I ran the experiments on both Mac OSX and Ubuntu.

The result is this for all the algorithms and variations mentioned above, using the thyroid-L7.arff dataset.

N (test) 3119
L 7
Hamming score 0.281
Exact match 0
Hamming loss 0.719
ZeroOne loss 1
Levenshtein distance 0.719
Label indices [0 1 2 3 4 5 6]
Accuracy (per label) [0.002 0.023 0.006 0.939 0.013 0.001 0.980]

thiagonazareth · 2021-04-14T16:00:27Z

I found the problem and created the pull request to fix it

jmread · 2021-04-28T14:44:35Z

It is not necessarily a problem to get the same results for different algorithms. But if I understand, according to your proposed change, it looks like this may be a result of the posterior distribution information not being copied into the right place where it is later accessed under evaluation metrics. Is that correct?

thiagonazareth · 2021-04-28T14:59:41Z

I agree that it is not necessarily a problem to get the same results for different algorithms, but it caught my attention to use very different algorithms, varying several input parameters and they bring the same result. The problem I found was the following: the size of the array that stores the results (distributionForInstance method) is doubled, to store the result of the label from position i and in position i + N store probability information from label i. Probability information has not yet been implemented for MT classifiers. So, Arrays.copyOfRange (y, L, L * 2) takes information that is always of value 1. The correct thing is to do Arrays.copyOfRange (y, 0, L);

jmread · 2021-04-30T12:10:14Z

The reason for the doubling of the array is to make space to store the probability information from the posterior, P(y[j] = y_max[j] | x) where y_max[j] is the most likely value. The first part of the array (up to L) is used to store the y_max[j] directly for each label. This is not needed in the standard multi-label case, we just store P(y[j] = 1) instead, because y_max[j] can be inferred directly (there are only two possible values -- either 0 or 1). In the multi-target case, this was included mainly for display/debug purposes, and does not represent the full distribution anyway. I guess this is what you mean by "Probability information not yet implemented for MT classifiers". I agree that the fix you propose makes sense. It seems that in this part of the code information is missing from 0...L altogether, which shouldn't be the case. Probably it also needs to be accompanied by a unit test for example on thyroid-L7.arff (as you used above to demonstrate the issue). Are you able to put your experiment into a small unit test?

thiagonazareth mentioned this issue Apr 13, 2021

FIX: Cut off any [no-longer-needed] probabalistic information from MT… #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multitarget - same metrics for several different algorithms #71

Multitarget - same metrics for several different algorithms #71

thiagonazareth commented Apr 6, 2021

thiagonazareth commented Apr 14, 2021 •

edited

Loading

jmread commented Apr 28, 2021

thiagonazareth commented Apr 28, 2021

jmread commented Apr 30, 2021

Multitarget - same metrics for several different algorithms #71

Multitarget - same metrics for several different algorithms #71

Comments

thiagonazareth commented Apr 6, 2021

thiagonazareth commented Apr 14, 2021 • edited Loading

jmread commented Apr 28, 2021

thiagonazareth commented Apr 28, 2021

jmread commented Apr 30, 2021

thiagonazareth commented Apr 14, 2021 •

edited

Loading