Refactor conversion to InferenceData #44

gibsramen · 2021-05-19T21:39:07Z

Refactor of InferenceData conversion code. Adds a flag to automatically convert a fitted CmdStanMCMC to InferenceData. For parallelized models this should convert after each fit and not after all are completed. If this flag is specified, the BaseModel.fit object will be of type InferenceData or List[InferenceData].

Also changes the way to_inference_object works. Now, an arbitrary BaseModel class should call the specify_model method to pass in params, coords, dims, etc. to_inference_object now uses these specifications instead of taking them as arguments.

Still need to update documentation. After this will probably bump up version to 0.0.3.

Still need to check that non-concatenation works.

mortonjt · 2021-05-19T21:43:25Z

Great! Is the parallelism ready to be tested on a cluster?

gibsramen · 2021-05-19T21:47:17Z

I think it's worth trying. My guess is it will not yet work, though.

gibsramen · 2021-05-24T19:09:38Z

Note to myself - should have some sort of error handling such that if the conversion to inference fails the fit will still be saved as CmdStanMCMC. Otherwise the whole fit would be thrown away which could cause some headache.

mortonjt · 2021-05-24T19:14:11Z

@gibsramen yes this makes a lot of sense, since jobs fail all the time...
One possibility is we have a robust merge, where we merge together the runs that succeeded and have some record of which runs failed, so that we can rerun those features.

mortonjt

Spotted one potential typo

I'm testing this locally atm - so far, its smooth.

Regarding the cluster setup, it looks like it can be completely decoupled from Birdman. So long as dask calls are being made inside of Birdman model fits, Birdman should not care how the dask cluster is setup. From my experiments, it looks like there does not need to have a dask cluster accepted as input for any of the methods.

That being said, it'll still be painful for users to setup the cluster. I think we can tackle this two ways

We can have simple commands (i.e. qiime2 commands) that don't have cluster support, but can make use of local threads.
We can have launch scripts for running these models on clusters that are supported with documentation which advanced users can use as a template.

I'm going to test this on the cluster shortly.

mortonjt · 2021-05-24T22:44:01Z

birdman/model_util.py

    :returns: ``arviz`` InferenceData object with selected values
    :rtype: az.InferenceData
    """
-    if dask_cluster is not None:
-        dask_cluster.scale(jobs=jobs)


I believe that this should be extended to the BaseModel as well

mortonjt · 2021-05-25T00:04:23Z

Hi @gibsramen I've just verified that the slurm deployment appears to be working!
And this is without even passing in dask_cluster as input, so I think we can just go ahead an kill all of those parameter inputs.

mortonjt

Ok, I missed a couple of things in my previous review. I have provided fixes that work on my cluster.

mortonjt · 2021-05-25T00:22:48Z

birdman/model_base.py

+        # if already Inference, just return
+        if isinstance(self.fit, az.InferenceData):
+            return self.fit
+        if isinstance(self.fit, list):


zoops, spoke too soon. Turns out that self.fit can be a tuple, so this will need to be

Suggested change

if isinstance(self.fit, list):

if isinstance(self.fit, list) or isinstance(self.fit, tuple):

mortonjt · 2021-05-25T00:23:12Z

birdman/model_base.py

+            return self.fit
+        if isinstance(self.fit, list):
+            if isinstance(self.fit[0], az.InferenceData):
+                return self.fit


I'd think we'd want to concat these objects together right? If so, the following will do

Suggested change

return self.fit

cat_name = self.specifications["concatenation_name"]

coords = self.specifications["coords"]

return concatenate_inferences(self.fit, coords, cat_name)

Good catch. I think here it makes sense to check if combine_individual_fits == True and then proceed accordingly.

mortonjt · 2021-05-25T00:31:49Z

birdman/model_base.py

 import pandas as pd
 from patsy import dmatrix

-from .model_util import single_fit_to_inference, multiple_fits_to_inference
+from .model_util import (single_fit_to_inference, multiple_fits_to_inference,
+                         _single_feature_to_inf)


If you like the previous suggestion, you'll want to change this to

Suggested change

_single_feature_to_inf)

_single_feature_to_inf, concatenate_inferences)

Now if self.fit is a sequence of InferenceData objects, can concatenate them in to_inference_object.

Remove dask-jobqueue as dependency as that can be handled outside of BIRDMAn. Bump version to 0.0.3.

Also addresses #41.

mortonjt · 2021-05-27T21:48:12Z

It looks like the slurm deployment is working!

gibsramen · 2021-05-28T19:33:04Z

@mortonjt Is this good to merge or are you still testing/have suggestions?

mortonjt · 2021-05-28T19:33:55Z

Yes this is good to merge!

…

On Fri, May 28, 2021 at 1:33 PM Gibs ***@***.***> wrote: @mortonjt <https://github.com/mortonjt> Is this good to merge or are you still testing/have suggestions? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gibsramen/BIRDMAn/pull/44#issuecomment-850629428>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA75VXOEBUDEHNFVMSKHNB3TP7VX7ANCNFSM45FRKQ4A> .

gibsramen added 6 commits May 17, 2021 10:43

Remove dask.delayed decorator from single feat fit

c1d7af3

Preliminary restructure of inference conversion

06470ff

Still need to check that non-concatenation works.

Add auto-conversion to inference of // models

e47b029

Remove unwanted vars when auto-converting to inf

f5cf57d

Update default model specifications

648573d

Restructure again with comments

294f726

mortonjt reviewed May 24, 2021

View reviewed changes

mortonjt reviewed May 25, 2021

View reviewed changes

mortonjt mentioned this pull request May 25, 2021

WIP : Birdman dependency flatironinstitute/q2-batch#13

Merged

2 tasks

gibsramen added 6 commits May 26, 2021 08:58

Remove dask cluster & jobs args

0eebfbf

Concatenate inferences in to_inference_object

f5f7e01

Now if self.fit is a sequence of InferenceData objects, can concatenate them in to_inference_object.

Raise error if attempted fit before compilation

039e0f9

Setup changes

b20f1af

Remove dask-jobqueue as dependency as that can be handled outside of BIRDMAn. Bump version to 0.0.3.

Update custom model docs page

ddc1ca9

Also addresses #41.

Update parallelization docs

53bd058

gibsramen changed the title ~~[WIP] Refactor conversion to InferenceData~~ Refactor conversion to InferenceData May 28, 2021

gibsramen merged commit 2b26ead into main May 28, 2021

gibsramen deleted the restructure-inf branch July 1, 2021 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor conversion to InferenceData #44

Refactor conversion to InferenceData #44

gibsramen commented May 19, 2021 •

edited

Loading

mortonjt commented May 19, 2021

gibsramen commented May 19, 2021

gibsramen commented May 24, 2021

mortonjt commented May 24, 2021

mortonjt left a comment

mortonjt May 24, 2021

mortonjt commented May 25, 2021 •

edited

Loading

mortonjt left a comment

mortonjt May 25, 2021

mortonjt May 25, 2021

gibsramen May 26, 2021

mortonjt May 25, 2021

mortonjt commented May 27, 2021

gibsramen commented May 28, 2021

mortonjt commented May 28, 2021 via email

	if isinstance(self.fit, list):
	if isinstance(self.fit, list) or isinstance(self.fit, tuple):

-                return self.fit
+                cat_name = self.specifications["concatenation_name"]
+                coords = self.specifications["coords"]
+                return concatenate_inferences(self.fit, coords, cat_name)

	_single_feature_to_inf)
	_single_feature_to_inf, concatenate_inferences)

Refactor conversion to InferenceData #44

Refactor conversion to InferenceData #44

Conversation

gibsramen commented May 19, 2021 • edited Loading

mortonjt commented May 19, 2021

gibsramen commented May 19, 2021

gibsramen commented May 24, 2021

mortonjt commented May 24, 2021

mortonjt left a comment

Choose a reason for hiding this comment

mortonjt May 24, 2021

Choose a reason for hiding this comment

mortonjt commented May 25, 2021 • edited Loading

mortonjt left a comment

Choose a reason for hiding this comment

mortonjt May 25, 2021

Choose a reason for hiding this comment

mortonjt May 25, 2021

Choose a reason for hiding this comment

gibsramen May 26, 2021

Choose a reason for hiding this comment

mortonjt May 25, 2021

Choose a reason for hiding this comment

mortonjt commented May 27, 2021

gibsramen commented May 28, 2021

mortonjt commented May 28, 2021 via email

gibsramen commented May 19, 2021 •

edited

Loading

mortonjt commented May 25, 2021 •

edited

Loading