User-facing API review #268

tbenthompson · 2023-02-07T14:10:02Z

Time calcifies designs. We're writing more and more models and tutorials using imprint. Every month, it'll get more expensive (in terms of effort/time) to change APIs. Once we have external users, we'll need to worry about backwards compatibility before changing APIs.

James, Mike, could you both spend 30-60m thinking about this and share thoughts?

I want to mostly focus on imprint rather than "internal" stuff like Adagrid since it will always be easier to make changes on internal tools where we have access to 100% of the calling code.

Let's use this issue as a place to gather complaints and comments about the comment design. Either add a comment below, or add a link to another issue.

External/open-source tooling:

Internal tooling:

Think about data schema versioning #229

The text was updated successfully, but these errors were encountered:

tbenthompson · 2023-02-07T14:13:22Z

Actually implementing and trying out the detailed parameter for sim_batch. Is that a useful thing?
Maybe: rename sim_batch to stat_batch.

JamesYang007 · 2023-02-07T15:06:44Z

I've actually never played around with calibration so no inputs there yet. All the model interface looks good to me for now. I haven't felt like it was awkward in any way. Only one suggestion:

Gridding was something that felt like a barrier sometimes. Sometimes I want to actually manually just create a grid and feed it in. Problem is that it's not just the thetas I need worry about; I need to make sure all the other columns in grid.df is initialized properly as well. Can we standardize a function for general gridding where user feeds in any subset of the columns in grid.df and the function assumes they're "correct" and infers what the rest of the columns are? For example, if I feed in thetas only, it will figure out null hypos flags (among other columns). If I feed in both thetas and null_hypos flags, it will only figure out the rest of the columns.

tbenthompson · 2023-02-07T15:31:41Z

Problem is that it's not just the thetas I need worry about; I need to make sure all the other columns in grid.df is initialized properly as well.

I'm guessing you want something halfway in between init_grid and cartesian_grid. I've wanted that too.

Function signature something like:

def create_grid(theta, radii, null_hypos=None, prune=True):
    ...

Then, I think I would set grid.init_grid to be private and call it like _raw_init_grid. Then the two main grid creation functions would be create_grid and cartesian_grid.

If I feed in both thetas and null_hypos flags, it will only figure out the rest of the columns.

Does this work? I think we always need theta and radii. I suppose if we just had theta, we could construct a Voronoi diagram for theta and then make bounding boxes for each voronoi cell to construct the tiles. Is that what you were imagining? That actually could be super nice and ergonomic for 1D and 2D problems. It would get a bit harder to understand and implement in 3D and higher...

tbenthompson · 2023-02-07T15:42:51Z

Question for @JamesYang007.

Should Grid.prune() remove inactive tiles?

What is an inactive tile? Our grids are append-only until explicitly pruned or subsetted. So, when a null hypothesis splits an input tile into two children, what happens is that the new grid actually contains three tiles: the parent tile marked as parent['active'] = False and the children with children['active'] = True. We also have a parent_id column. This inactivity/append-only design is also used heavily in the adagrid code. The nice thing about this is that you can look at the tiles to figure out how the grid got to be the way it is.
Currently, Grid.prune() removes tiles that are fully in the alternative space, but does nothing about inactive tiles.
Having inactive tiles accidentally in our results has caused a couple bugs recently.
This is because ip.validate and ip.calibrate output results for all the tiles that are passed to them including tiles that are inactive. This seems like something worth changing! I think ip.validate and ip.calibrate should produce output only for active tiles (and perhaps also print a warning if those functions were passed inactive tiles).
Grid.active() explicitly removes the inactive tiles.
Should we make Grid.prune() also remove inactive tiles?
We could rename Grid.active to Grid.prune_inactive and rename Grid.prune to Grid.prune_alternative.
We could add a second (default True) parameter to cartesian_grid (and create_grid) above called prune_inactive=True.

JamesYang007 · 2023-02-07T15:54:04Z

I'm guessing you want something halfway in between init_grid and cartesian_grid. I've wanted that too.

Function signature something like:
def create_grid(theta, radii, null_hypos=None, prune=True):
    ... 
Then, I think I would set grid.init_grid to be private and call it like _raw_init_grid. Then the two main grid creation functions would be create_grid and cartesian_grid.

Yah this is perfect!

If I feed in both thetas and null_hypos flags, it will only figure out the rest of the columns.

Does this work? I think we always need theta and radii.

Oopsie I forgot about radii, but yes, the create_grid is basically what I'm talking about.

JamesYang007 · 2023-02-07T15:56:44Z

As a user of just the cartesian_grid (and soon create_grid) I like prune_inactive=True!! While you're at it, to keep the prune_inactive, prune_alternative naming consistent, we can rename prune to prune_alternative in cartesian_grid.

tbenthompson · 2023-02-07T20:25:52Z

Current public API of imprint:

Grid class and its methods
- also grid constructors like create_grid and cartesian_grid.
- The contents of the grid dataframe.
- plot_grid
NullHypothesis class
- HyperPlane
- hypo helper function for easily defining planar nulls.
Model class
- family and family_params
- sim_batch
calibrate
validate
batch and batch_all - currently "public" because they're potentially useful for a model writer.
logging (but imprint doesn't do very much logging right now)

tbenthompson · 2023-02-07T20:27:32Z

Reading through the above, I think one more place where we should modify the API is family and family_params. It would be nice to allow defining a custom family. We can just check whether family is a string or not. If it's not, we expect it to follow the same interface as the built-in bounds.

JamesYang007 · 2023-02-07T21:14:13Z

Ah I was gonna go about this by making the users register it in the dictionary:

confirmasaurus/imprint/imprint/driver.py

Lines 14 to 20 in 34e0bb4

    
           bound_dict = { 
        
               "normal": bound.normal.NormalBound, 
        
               "normal2": bound.normal2.Normal2Bound, 
        
               "scaled_chisq": bound.scaled_chisq.ScaledChiSqBound, 
        
               "binomial": bound.binomial.BinomialBound, 
        
               "exponential": bound.exponential.ExponentialBound, 
        
           }

but this is probably safer and more scalable.

tbenthompson · 2023-02-07T21:28:02Z

We could get rid of family_params and have the Model-writer pass an object for family:

import imprint as ip
class ExampleModel(ip.Model):
    def __init__(self, ...):
        self.family = ip.bound.Binomial(n=10)

JamesYang007 · 2023-02-07T21:42:23Z

I like this a lot more. I wasn't fond of having to think about family and family_params as separate entities. We also don't have to do the dictionary checking anymore then. Validation/calibration just needs to check at some point that the model class does have a member called family.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-facing API review #268

User-facing API review #268

tbenthompson commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023

JamesYang007 commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023

tbenthompson commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023

tbenthompson commented Feb 7, 2023

JamesYang007 commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023

User-facing API review #268

User-facing API review #268

Comments

tbenthompson commented Feb 7, 2023 • edited Loading

tbenthompson commented Feb 7, 2023

JamesYang007 commented Feb 7, 2023 • edited Loading

tbenthompson commented Feb 7, 2023

tbenthompson commented Feb 7, 2023 • edited Loading

JamesYang007 commented Feb 7, 2023 • edited Loading

JamesYang007 commented Feb 7, 2023 • edited Loading

tbenthompson commented Feb 7, 2023

tbenthompson commented Feb 7, 2023

JamesYang007 commented Feb 7, 2023 • edited Loading

tbenthompson commented Feb 7, 2023 • edited Loading

JamesYang007 commented Feb 7, 2023

tbenthompson commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

JamesYang007 commented Feb 7, 2023 •

edited

Loading

tbenthompson commented Feb 7, 2023 •

edited

Loading