Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h20 learner : "Error in checkPredictLearnerOutput(.learner, .model, p)" #1787

Closed
caprone opened this issue May 1, 2017 · 7 comments
Closed

Comments

@caprone
Copy link

caprone commented May 1, 2017

HI
os: win7
Rstudio: 1.0.143
R : 3.4
issue:
all h2o learner fails in prediction task: seems this is class levels names conflit:

"Error in checkPredictLearnerOutput(.learner, .model, p) :
predictLearner for classif.h2o.gbm has returned not the class levels as column names: p0, p1"

checking current names in learner , those lacks 'p' prefix; in fact they are '0', '1' and not 'p0, p1' as needed

@jakob-r
Copy link
Member

jakob-r commented May 2, 2017

Can you supply a minimum (not) working example?

@caprone
Copy link
Author

caprone commented May 2, 2017

HI, ok, thanks
`

task --> class levels are '0', '1'

classif.task = makeClassifTask(id = "example",
data = train_scaled,
target = "is_duplicate",
positive = "1")

learner

gbm.lrn = makeLearner("classif.h2o.gbm",
predict.type = "prob",
fix.factors.prediction = TRUE)

cv

rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)

r = resample(gbm.lrn, classif.task, rdesc,
measures = list(auc, logloss)
)
`
this works well with all learners but with h2o

@larskotthoff
Copy link
Member

You haven't shown us your data.

@caprone
Copy link
Author

caprone commented May 2, 2017

ok, sorry, this is a perfectly reproducible example:

library(mlr)

df <- data.frame(matrix(runif(100, 0, 1), 100, 9))
classx <- factor(sample(c(0, 1), 100, replace = TRUE))
df <- cbind(classx, df)

classif.task = makeClassifTask(id = "example", 
                               data = df,
                               target = "classx", 
                               positive = "1")

gb.lrn  = makeLearner("classif.h2o.gbm", 
                      predict.type = "prob", 
                      fix.factors.prediction = TRUE)

rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)
rin = makeResampleInstance(rdesc, task = classif.task)

r = resample(gb.lrn, classif.task, rin, 
             measures = list(auc, logloss)
)
# error:
Error in checkPredictLearnerOutput(.learner, .model, p) : 
  predictLearner for classif.h2o.gbm has returned not the class levels as column names: p0, p1

thanks!!

@larskotthoff
Copy link
Member

Thanks, this is indeed a bug (caused by completely broken code). I've fixed this in #1790.

MinhAnhL pushed a commit that referenced this issue May 9, 2017
In test_that - “oneclass_h2oautoencoder”, reproducible = TRUE and seed= 1234 need to be set in parset.list before the for-block, instead of setting them in the for-block like in h2ogmb, otherwise testProbParsets return error because of different prediction values.

In test_that-(“class names are integers and probabilities predicted (#1787)"
activation function needs to be set to “Tanh” instead of the default “Rectifire”, otherwise resample() return an error:
Error: DistributedException from localhost/127.0.0.1:54321, caused by java.lang.UnsupportedOperationException:
(This error occurs very often when using h2o, there are no clear and not a lot explanation in the web how to solve it or why it happens, but it seems like to be related with the case if the model is “unstable”. The activation function “tanh” has a natural bound and suitable to “control” the instability.
@andrewcparnell
Copy link

Hi,

A similar possible bug I've just found still exists when spaces occur in the class names.

Here's an example based on the above that still breaks for me using mlr v2.13:

library(mlr)

set.seed(123)
df <- data.frame(matrix(runif(100, 0, 1), 100, 9))
classx <- sample(paste(letters[1:4],letters[1:4]), 100, replace = TRUE)
df <- cbind(classx, df)

classif.task = makeClassifTask(id = "example", 
                               data = df,
                               target = "classx")

gb.lrn  = makeLearner("classif.h2o.randomForest", 
                      predict.type = "prob")

rdesc = makeResampleDesc("CV", iters = 3, stratify = TRUE)
rin = makeResampleInstance(rdesc, task = classif.task)

r = resample(gb.lrn, classif.task, rin, 
             measures = list(mmce))

I get:

Error in checkPredictLearnerOutput(.learner, .model, p) : 
  predictLearner for classif.h2o.randomForest has returned not the class levels as column names: a.a,b.b,c.c,d.d

This will run when replacing e.g. classif.h2o.randomForest with classif.randomForestSRC. It seems to fail with other non-standard characters in the class names but I haven't done an exhaustive search.

Many thanks for the wonderful package.

Andrew

@larskotthoff
Copy link
Member

Ah, looks like the names are sometimes sanitized and sometimes not. The workaround for now is to sanitize everything yourself, e.g. using the make.names() function. Could you please open a new issue for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants