Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Meka Evaluation Result #55

Open
Mali-DS opened this issue Oct 28, 2018 · 3 comments
Open

Unexpected Meka Evaluation Result #55

Mali-DS opened this issue Oct 28, 2018 · 3 comments

Comments

@Mali-DS
Copy link

Mali-DS commented Oct 28, 2018

Hi,
The result of my evaluation is zero and I don't know why? my code is here:
try {
ConverterUtils.DataSource dataSource = new ConverterUtils.DataSource(FILE_PATH); // original dataset
Instances preparedDataSet = dataSource.getDataSet();
preparedDataSet = filterUnsupervisedAttributes(preparedDataSet);
preparedDataSet.setClassIndex(7);

        CRUpdateable classifier = new CRUpdateable();
        RandomForest randomForest = createRandomForest(1);
        classifier.setClassifier(randomForest);

        Instances  trainingInstances = new Instances(dataSource.getStructure()); // temporary dataset for train
        trainingInstances = filterUnsupervisedAttributes(trainingInstances);
        trainingInstances.setClassIndex(7);

        Instances testInstances = new Instances(dataSource.getStructure()); // temporary dataset for test
        testInstances = filterUnsupervisedAttributes(testInstances);
        testInstances.setClassIndex(7);
        int countTestInstances = 0;
        int countTrainInstances = 0;
        boolean firstTrain = true;
        boolean benchTest = true;
        int numInst = preparedDataSet.numInstances();
        for(int row = 123; row < 5021; row++) {
                Instance trainingInstance = preparedDataSet.instance(row);
                trainingInstances.add(trainingInstance); // collect instances to use as training
                countTrainInstances++;
                if (firstTrain && countTrainInstances%100 == 0 ) {  // train the classifier with the first 100 instances(without any missing values)
                    firstTrain = false;
                    classifier.buildClassifier(trainingInstances);
                }
                if(!firstTrain){
                    benchTest = true;

// classifier.updateClassifier(trainingInstance);

                    for(int j=row+1;j<row+101;j++){
                        if(benchTest && countTestInstances != 100) { // add next 100 instances to testInstance
                            Instance testInstance = preparedDataSet.instance(j);
                            testInstances.add(testInstance);
                            countTestInstances++;

                            if (countTestInstances % 100 == 0) {
                                System.out.println("Evaluate CRUpdateable classifier on ");
                                String top = "PCut1";
                                String vop = "3";
                                Result result = Evaluation.evaluateModel(classifier, trainingInstances , testInstances, top, vop);
                                System.out.println("Evaluation available metrics: " + result.availableMetrics());
                                System.out.println("Evaluation Info: " + result.toString());
                                System.out.println("Levenshtein distance: " + result.getValue("Levenshtein distance"));
                                System.out.println("Type: " + result.getInfo("Type"));
                                countTestInstances = 0;
                                benchTest = false;
                                testInstances.delete();
                            }
                        }
                    }
                }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

The result of Evaluation is here:

Evaluation Info: == Evaluation Info

Classifier meka.classifiers.multiltarget.incremental.CRUpdateable
Options [-W, weka.classifiers.trees.RandomForest, --, -P, 100, -I, 1, -num-slots, 1, -K, 0, -M, 1.0, -V, 0.001, -S, 1]
Additional Info
Dataset Missing_values_Predicted-weka.filters.unsupervised.attribute.RemoveType-Tstring
Number of labels (L) 7
Type MT
Verbosity 3

== Predictive Performance

N(test) 100
L 7
Hamming score 0
Exact match 0
Hamming loss 1
ZeroOne loss 1
Levenshtein distance 1
Label indices [ 0 1 2 3 4 5 6 ]
Accuracy (per label) [ 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ]

== Additional Measurements

Number of training instances 154
Number of test instances 100
Label cardinality (train set) 659.407
Label cardinality (test set) 676.757
Build Time 0.061
Test Time 0.006
Total Time 0.067

@fracpete
Copy link
Member

From a quick glance, you seem to treat the data like you would for Weka. However, Meka works a bit different. See the following examples:

Final remark, you only seem to have a single class attribute...

@Mali-DS
Copy link
Author

Mali-DS commented Oct 29, 2018

Thanks for your answer, you mentioned good points, I changed my code and used Meka ways, now code is as under:
try {
ConverterUtils.DataSource dataSource = new ConverterUtils.DataSource(FILE_PATH); // original dataset
Instances preparedDataSet = dataSource.getDataSet();

        CRUpdateable classifier = new CRUpdateable();
        RandomForest randomForest = createRandomForest(1);  // random forest is not updatable classifier
        classifier.setClassifier(randomForest);

        Instances  trainingInstances = new Instances(dataSource.getStructure()); 
        Instances testInstances = new Instances(dataSource.getStructure());
        int countTestInstances = 0;
        int countTrainInstances = 0;
        boolean firstTrain = true;
        boolean benchTest = true;
        for(int row = 123; row < 5021; row++) {
                Instance trainingInstance = preparedDataSet.instance(row);
                trainingInstances.add(trainingInstance); // collect instances to use as training
                countTrainInstances++;
                if (firstTrain && countTrainInstances%100 == 0 ) { 
                    trainingInstances = PrepareClassAttributes(trainingInstances,"1,2,3,4,5,6,7");
                    firstTrain = false;
                    classifier.buildClassifier(trainingInstances);
                }
                if(!firstTrain){
                    benchTest = true;
                    classifier.updateClassifier(trainingInstance);
                    for(int j=row+1;j<row+101;j++){
                        if(benchTest && countTestInstances != 100) { 
                            Instance testInstance = preparedDataSet.instance(j);
                            testInstances.add(testInstance);
                            countTestInstances++;
                            if (countTestInstances % 100 == 0) {
                                testInstances = PrepareClassAttributes(testInstances,"1,2,3,4,5,6,7");
                                System.out.println("Evaluate CRUpdateable classifier on ");
                                String top = "PCut1"; 
                                String vop = "3";  
                                Result result = Evaluation.evaluateModel(classifier, trainingInstances , testInstances, top, vop);
                                System.out.println("Evaluation Info: " + result.toString());
                                countTestInstances = 0;
                                benchTest = false;
                                testInstances.delete();
                            }
                        }
                    }
                }
        }

    } catch (Exception e) {
        e.printStackTrace();
    }

but yet the Accuracy is zero, and the stats results are strange:

N(test) 100
L 7
Hamming score 0
Exact match 0
Hamming loss 1
ZeroOne loss 1
Levenshtein distance 1
Label indices [ 0 1 2 3 4 5 6 ]
Accuracy (per label) [ 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ]

@jmread
Copy link
Contributor

jmread commented Oct 30, 2018

Actually the stats results make sense given that there are 0 correct predictions. Without being familiar with your data, it is difficult to know if this is 'strange' or not. Have you tried getting results using a simple test in the GUI first? Or to print out the prediction for each instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants