-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate all backends except text #2189
Comments
Thanks @fabianrost84. What do you need the backend for? Have you tried the hdf5 backend? We are considering deprecating the backends. |
I thought of backends as a convenient way to store traces that result from long sampling, such that I can work with them later. Would there be another way to do that? hdf5 backend gives the same error. |
Why do you need long sampling runs? Do you have poor convergence? |
Haha ;) I should have been more specific. No, rather many datasets which to which I fit the same model. For one dataset the sampling is reasonably fast (~1 minute). But when I fit 20 datasets, it takes 20 minutes, and so I just wanted to store those traces in case I want to do further analysis. |
I see, so if you were able to store traces after sampling, that would be sufficient? |
Exactly. I also tried to pt = pickle.dumps(trace) However, |
you can also try to do |
Thanks for the suggestion @junpenglao. That would work for saving some important information. But I would loose the sampler stats, right? And, with a dataframe I loose some convenient functionalities, for instance I cannot create a Maybe to conclude from the discussion above:
So maybe we should split this issue into three, something like:
\2) and 3) will have to wait until 1) is decided. 3) Is a feature request which might have to wait. |
Can't you just pickle the trace object? |
Yes! This would all be great! For now I think a good hack would be something like:
where you specify a model id you can use. (You might not even need to keep the model, but you could use it to keep sampling) |
Also, I call this a "hack" because you shouldn't trust pickle objects! They can run arbitrary python code, but it should be fine for local work. |
In my case I have the same errors as @fabianrost84, while I need long runs because there is some obvious random walk behaviour in my (admittedly non-trivial) model. Anyway being able to save intermediate traces is a legitimate need, so some mechanism to achieve this must be present. |
Maybe we can refactor some part of the SMC code to make the saving and loading intermediate trace a general feature. |
I also wouldn't mind doing short runs interleaved with saving, but this means that we must be able to restart sampling from the previous state (possibly loaded from a file). Is there a clean way to do it now? |
SMC does it already, yes. You do not get the error not specifying the trace, because it uses the numpy array then as backend... Yes the SMC trace (doesnt reevaluate the model again) uses a list of arrays as record input compared to the other backends that need a point dict. Once we decided on how to best refactor we can do that. |
@madanh How would you intend to use the file-backed traces? If it is truly too large to fit in memory, then it seems like you would need bespoke machinery to do any analysis on it anyways. It seems like one of |
@madanh What sampler do you use? What's your effective sample size? Is the model continuous? |
@twiecki I'm in the process of figuring out what to use. I tried NUTS first, but the model is pretty finicky - it has a plateau where some components of the gradient are exactly zero for a certain region of parameter space (It;s a feature), so NUTS is slowly random walking (which is expected, in the hindsight). Metropolis is fast, but for some starting conditions rarely accepts (also expected). Slice hangs randomly - the reason seems to lie in the implementation - it tries to sample uniformly from a hypercube for non-scalar variables, rather than working them out component-by-component. And when those variables have large sizes and some covariance, it practically never succeeds. But that is a separate issue, which I will raise once I make sure that that is indeed the case. Anyway, above stuff has nothing to do with saving traces. And as it is now I understand that, I don't need saving traces, but rather try fix the Slice sampler. @ColCarroll For me it's not about the memory. One thing why having access to traces is nice is that it aids diagnosing/debugging. Imagine your sampler hangs after reasonable amount of work (which happened to me with Slice) - if you interrupt it - all is lost and you can't diagnose it. So you restart in debug mode and wait twice as long, or make a shorter run, trying to figure out what will happen, before it happens, or whatever. But if your traces are safe on disk, you can begin investigating straight away. Doubly useful if you have njobs>1. |
Why dont you give the SMC a try- would be also good for us to further improve it and make it more smooth in terms of API?! How to start it is shown here: |
@hvasbath I took a quick look, I will try that if hacking the Slice sampler will not help. |
OK, fixed the Slice sampler in a quick and dirty fashion and now it at least does not hang. It's suboptimal though, as it returns only 1 sample per group of variables. Also found a bug in my model, now NUTS is the best. |
I have a model that currently only works well with large number of samples used, due to poor convergence. The in-memory backend cannot be used due to memory limits (32GB is not enough). At least before I can rethink and able to implement alternative model, I would like to see the possibility of at least using large sample to work out my problem. I think backends is a useful feature that should not be deprecated. |
Per conversation drop all backends except text backend |
Just wanted to add my five cents to underline the importance of keeping at least one backend. For the parts of the community that come from the physical modelling side sampling often takes days to weeks, because one likelihood evaluation may take several tens of seconds. Few hours would be fast! |
Yes, the plan is to keep the plain text backend. |
I was using in-memory backend. But I'm happy now that I see the text backend will be preserved. |
This is done. |
In https://pymc-devs.github.io/pymc3/api/backends.html#selecting-a-backend there is some pseudo code on how to select backends:
I tried to get a minimal example running, but I did run into some issues. First, I think
db = pm.backends.Text('test')
needs a context. Furthermore, for the sampling we need at least one random variableHowever, I get the following error:
Interestingly, using the SQLite backend gives the same error, but the example without a trace backend works just fine:
The text was updated successfully, but these errors were encountered: