-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor conversion to InferenceData #44
Conversation
Still need to check that non-concatenation works.
Great! Is the parallelism ready to be tested on a cluster? |
I think it's worth trying. My guess is it will not yet work, though. |
Note to myself - should have some sort of error handling such that if the conversion to inference fails the fit will still be saved as CmdStanMCMC. Otherwise the whole fit would be thrown away which could cause some headache. |
@gibsramen yes this makes a lot of sense, since jobs fail all the time... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted one potential typo
I'm testing this locally atm - so far, its smooth.
Regarding the cluster setup, it looks like it can be completely decoupled from Birdman. So long as dask
calls are being made inside of Birdman model fits, Birdman should not care how the dask cluster is setup. From my experiments, it looks like there does not need to have a dask cluster accepted as input for any of the methods.
That being said, it'll still be painful for users to setup the cluster. I think we can tackle this two ways
- We can have simple commands (i.e. qiime2 commands) that don't have cluster support, but can make use of local threads.
- We can have launch scripts for running these models on clusters that are supported with documentation which advanced users can use as a template.
I'm going to test this on the cluster shortly.
:returns: ``arviz`` InferenceData object with selected values | ||
:rtype: az.InferenceData | ||
""" | ||
if dask_cluster is not None: | ||
dask_cluster.scale(jobs=jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this should be extended to the BaseModel
as well
Hi @gibsramen I've just verified that the slurm deployment appears to be working! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I missed a couple of things in my previous review. I have provided fixes that work on my cluster.
birdman/model_base.py
Outdated
# if already Inference, just return | ||
if isinstance(self.fit, az.InferenceData): | ||
return self.fit | ||
if isinstance(self.fit, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zoops, spoke too soon. Turns out that self.fit
can be a tuple, so this will need to be
if isinstance(self.fit, list): | |
if isinstance(self.fit, list) or isinstance(self.fit, tuple): |
birdman/model_base.py
Outdated
return self.fit | ||
if isinstance(self.fit, list): | ||
if isinstance(self.fit[0], az.InferenceData): | ||
return self.fit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think we'd want to concat these objects together right? If so, the following will do
return self.fit | |
cat_name = self.specifications["concatenation_name"] | |
coords = self.specifications["coords"] | |
return concatenate_inferences(self.fit, coords, cat_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think here it makes sense to check if combine_individual_fits == True
and then proceed accordingly.
birdman/model_base.py
Outdated
import pandas as pd | ||
from patsy import dmatrix | ||
|
||
from .model_util import single_fit_to_inference, multiple_fits_to_inference | ||
from .model_util import (single_fit_to_inference, multiple_fits_to_inference, | ||
_single_feature_to_inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like the previous suggestion, you'll want to change this to
_single_feature_to_inf) | |
_single_feature_to_inf, concatenate_inferences) |
Now if self.fit is a sequence of InferenceData objects, can concatenate them in to_inference_object.
Remove dask-jobqueue as dependency as that can be handled outside of BIRDMAn. Bump version to 0.0.3.
Also addresses #41.
It looks like the slurm deployment is working! |
@mortonjt Is this good to merge or are you still testing/have suggestions? |
Yes this is good to merge!
…On Fri, May 28, 2021 at 1:33 PM Gibs ***@***.***> wrote:
@mortonjt <https://github.com/mortonjt> Is this good to merge or are you
still testing/have suggestions?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/gibsramen/BIRDMAn/pull/44#issuecomment-850629428>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA75VXOEBUDEHNFVMSKHNB3TP7VX7ANCNFSM45FRKQ4A>
.
|
cc @mortonjt
Refactor of InferenceData conversion code. Adds a flag to automatically convert a fitted
CmdStanMCMC
toInferenceData
. For parallelized models this should convert after each fit and not after all are completed. If this flag is specified, theBaseModel.fit
object will be of typeInferenceData
orList[InferenceData]
.Also changes the way
to_inference_object
works. Now, an arbitraryBaseModel
class should call thespecify_model
method to pass in params, coords, dims, etc.to_inference_object
now uses these specifications instead of taking them as arguments.Still need to update documentation. After this will probably bump up version to 0.0.3.