Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object storage sizes #173

Open
ugroempi opened this issue May 1, 2021 · 4 comments
Open

Object storage sizes #173

ugroempi opened this issue May 1, 2021 · 4 comments

Comments

@ugroempi
Copy link

ugroempi commented May 1, 2021

I have been working with an interaction forest (intaus) on 2000 observations that consists of the default 20000 trees (from package diversityForest). This forest uses 306 682 KB disk space. I have applied Interaction$new and FeatureEffects$new to that forest and stored the resulting objects on disk (R work spaces with a single object each). I end up with the following stored object sizes:

hilf <- Predictor$new(intaus, data=as.data.frame(yx2[,-1]), y=yx2[,1], predict.function=predfun)
hilf2 <- Interaction$new(hilf)
## storage size is 1 227 232 KB
fes <- FeatureEffects$new(hilf)
## storage size is 1 248 226 KB

To me, these sizes appear excessive. I wonder what functionalities of these objects I might miss that justify these huge object sizes. Or would it perhaps be possible for Interaction$new and FeatureEffects$new to return smaller objects without sacrificing functionality?

Best, Ulrike

@christophM
Copy link
Collaborator

What is your use case for storing these objects?

One reason for the size is that the Predictor is part of Interaction / FeatureEffects. But it seems not completely explanatory for the size, maybe it is stored more than once.

@ugroempi
Copy link
Author

ugroempi commented May 6, 2021

The use case is that I don't want to invest the run time again, and want to have them available later e.g. for plotting or printing in comparison to other numbers calculated elsewhere.

@christophM
Copy link
Collaborator

I have not tried it yet, but you could try setting the predictor to NULL:
interaction_object$predictor = NULL
This should make the object a lot smaller. The results are stored in a data.frame in $results and the plotting should not be affected by it either. It's a hacky solution, so I can't guarantee it works right away

@ugroempi
Copy link
Author

ugroempi commented May 6, 2021

Thank you for the proposal. After setting the $predictor to NULL, the file size was only 252 kB. The plot method still works, the print method doesn't (but I can of course access the $results nevertheless).

I think that it would be highly desirable that output objects for interactions and feature effects are more parsimonious per default (green ML!).

By the way, from within R I found it quite difficult to assess object sizes. object.size(hilf2) returned size 448 Bytes(!) for the huge object. That size remains unchanged after removing $predictor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants