Remove Text, SQLite and HDF5 backends #4231

fonnesbeck · 2020-11-18T17:22:38Z

Since the warning has been present for over half a year, this PR removes the storage backends other than NDArray.

This might be a good place to remove save/load for NDArray as well, or this can be done in a separate PR.

codecov · 2020-11-19T01:32:37Z

Codecov Report

Merging #4231 (fea10a1) into master (e51b9d3) will decrease coverage by 0.74%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4231      +/-   ##
==========================================
- Coverage   88.93%   88.18%   -0.75%     
==========================================
  Files          92       89       -3     
  Lines       14788    14358     -430     
==========================================
- Hits        13152    12662     -490     
- Misses       1636     1696      +60

Impacted Files	Coverage Δ
pymc3/backends/base.py	`86.92% <ø> (-0.39%)`	⬇️
pymc3/__init__.py	`100.00% <100.00%> (ø)`
pymc3/backends/__init__.py	`100.00% <100.00%> (ø)`
pymc3/sampling.py	`86.49% <100.00%> (-0.40%)`	⬇️
pymc3/variational/opvi.py	`85.29% <100.00%> (-0.03%)`	⬇️
pymc3/tests/backend_fixtures.py	`78.97% <0.00%> (-19.61%)`	⬇️
pymc3/backends/ndarray.py	`87.36% <0.00%> (-5.27%)`	⬇️
pymc3/distributions/continuous.py	`93.04% <0.00%> (-0.12%)`	⬇️
... and 6 more

michaelosthege

So much deleted code! 🤩

In principle I approve, but before someone accidentally hits "merge": Do we want to deprecate the backends already for 3.10?
I'd say yes, but we have a major bump coming up and could do it there too.

AlexAndorra · 2020-11-19T09:52:42Z

As the warnings have been here for a while now, and as ArviZ netcdf backend is already available, I think we can go ahead and depreacte this in 3.10.
Maybe the next major version will implement return_inferencedata=True as the default for pm.sample, in which case it's good that users already experienced and transitioned to az.to_netcdf

DanPen · 2020-12-11T17:20:06Z

I'm pretty confused. Why were these backends removed? I'm running out of memory trying to sample a model, thought that writing to disk was pretty cool. Unsure what to do.

michaelosthege · 2020-12-11T17:30:19Z

@DanPen they were removed because because couldn't think of examples that would not work with the standard backend, and we saw no activity in Github/Discourse that indicated that they were being used at all.
Also, we are interested in transitioning to an xarray-based backend, because it would seamlessly integrate with InferenceData.

Now to your problem: Can you give some details about the size of the model and sampling strategy? Running out of memory sounds like you either have a memory leak, or your model is astronomically gigantic, or you're including things in the trace that are deterministic and take a lot of memory.

DanPen · 2020-12-11T18:31:29Z

@michaelosthege Thanks for the quick response!

I'm following the Dependent density regression tutorial with my own dataset. As I increase my number of observations to ~500, my 32GB of memory gets used up. I'm sampling with 4 metropolis chains, 20,000 samples, and 10,000 burn-in. Memory seems to increase linearly as sampling continues.

Should I open a new issue?

michaelosthege · 2020-12-11T18:45:20Z

@DanPen that's 266 kB per iteration, roughly equivalent to 8000 float32 values. I doubt that your model has that many parameters (if it does, you should give up on Metropolis right now). So something else than model parameters must be chewing up that memory -> either a memory leak, or a Deterministic that has much more dimensions than the actual parameters.

You can print(model.ndim) and also go and print the shapes of all variables in the trace (after sampling a few iterations).

In a Jupyter notebook:

with model:
   idata = pm.sample(return_inferencedata=True, tune=70, draw=30)
idata

DanPen · 2020-12-13T17:36:25Z

@michaelosthege Cool. ndim=200 and the posterior's dimensions are alpha_dim_0: 40, beta_dim_0: 40, chain: 4, delta_dim_0: 40, draw: 30, gamma_dim_0: 40, mu_dim_0: 490, mu_dim_1: 40, tau_dim_0: 40, w_dim_0: 490, w_dim_1: 40.

Not sure I understand why mu and w are 2D.

michaelosthege · 2020-12-13T22:47:51Z

@DanPen Then it's indeed the 2D Deterministic variables mu and w that blow up the size of the posterior.
The other variables make up the 200 free paramters, but the (490, 40)-shaped mu and w add a combined 39_200 numbers per chain per iteration.

In such cases it's better to not save them at all. You can do this by not wrapping them in Deterministic. This should reduce your memory footprint by 197x.
If you need them later, you can always reconstruct them (that's why it's called Deterministic...).

fonnesbeck added 9 commits July 6, 2020 22:34

Updated Mauna Loa notebook

11d057e

Merge remote-tracking branch 'upstream/master'

17309e4

Merge remote-tracking branch 'upstream/master'

022c7be

Merge remote-tracking branch 'upstream/master'

156cfb4

Merge remote-tracking branch 'upstream/master'

5bf0c2d

Added deprecation warning to custom backends.

227fdd9

Removed Text, SQLite and HDF5 backends

4b578b2

Remove unnecessary line from .gitignnore

d3a6cad

Black formatting on sampling.py

d184e8d

fonnesbeck requested a review from michaelosthege November 18, 2020 18:32

fonnesbeck added 2 commits November 18, 2020 16:02

Restored save_trace and load_trace to global namespace

a34e98f

Remove another backend test

fea10a1

fonnesbeck changed the title ~~[WIP] Remove Text, SQLite and HDF5 backends~~ Remove Text, SQLite and HDF5 backends Nov 18, 2020

michaelosthege reviewed Nov 19, 2020

View reviewed changes

michaelosthege approved these changes Nov 19, 2020

View reviewed changes

michaelosthege added the maintenance label Nov 19, 2020

michaelosthege added this to the 3.10 milestone Nov 19, 2020

michaelosthege merged commit e526d32 into pymc-devs:master Nov 19, 2020

eigenfoo mentioned this pull request Nov 19, 2020

New commits to pymc3/sampling.py or pymc3/step_methods/hmc/ eigenfoo/littlemcmc#100

Closed

michaelosthege mentioned this pull request Jan 10, 2022

Remove SQLite backend in release >=3.10.0 #3904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Text, SQLite and HDF5 backends #4231

Remove Text, SQLite and HDF5 backends #4231

fonnesbeck commented Nov 18, 2020

codecov bot commented Nov 19, 2020 •

edited

Loading

michaelosthege left a comment

AlexAndorra commented Nov 19, 2020 •

edited

Loading

DanPen commented Dec 11, 2020

michaelosthege commented Dec 11, 2020

DanPen commented Dec 11, 2020 •

edited

Loading

michaelosthege commented Dec 11, 2020

DanPen commented Dec 13, 2020 •

edited

Loading

michaelosthege commented Dec 13, 2020

Remove Text, SQLite and HDF5 backends #4231

Remove Text, SQLite and HDF5 backends #4231

Conversation

fonnesbeck commented Nov 18, 2020

codecov bot commented Nov 19, 2020 • edited Loading

Codecov Report

michaelosthege left a comment

Choose a reason for hiding this comment

AlexAndorra commented Nov 19, 2020 • edited Loading

DanPen commented Dec 11, 2020

michaelosthege commented Dec 11, 2020

DanPen commented Dec 11, 2020 • edited Loading

michaelosthege commented Dec 11, 2020

DanPen commented Dec 13, 2020 • edited Loading

michaelosthege commented Dec 13, 2020

codecov bot commented Nov 19, 2020 •

edited

Loading

AlexAndorra commented Nov 19, 2020 •

edited

Loading

DanPen commented Dec 11, 2020 •

edited

Loading

DanPen commented Dec 13, 2020 •

edited

Loading