Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging DoubleMachineLearning into MultiplyRobust #96

Open
brash6 opened this issue Oct 29, 2024 · 6 comments
Open

Merging DoubleMachineLearning into MultiplyRobust #96

brash6 opened this issue Oct 29, 2024 · 6 comments

Comments

@brash6
Copy link
Collaborator

brash6 commented Oct 29, 2024

@houssamzenati
Inside the med_bench_prototype code, you implemented MultiplyRobust' fit function like this :

def fit(self, t, m, x, y):
        """Fits nuisance parameters to data
        """
        # bucketize if needed
        t, m, x, y = self.resize(t, m, x, y)

        # fit nuisance functions
        self.fit_score_nuisances(t, m, x, y)

        if self._ratio == 'density':
            self.fit_treatment_propensity_x_nuisance(t, x)
            self.fit_mediator_nuisance(t, m, x)

        elif self._ratio == 'propensities':
            self.fit_treatment_propensity_x_nuisance(t, x)
            self.fit_treatment_propensity_xm_nuisance(t, m, x)

        else:
            raise NotImplementedError
        
        if self._procedure == 'nesting':
            self.fit_cross_conditional_mean_outcome_nuisance(t, m, x, y)

        elif self._procedure == 'discrete':

            if not is_array_integer(m):
                self._bucketizer = KMeans(n_clusters=10, random_state=self.rng,
                                          n_init="auto").fit(m)
                m = self._bucketizer.predict(m)
            self.fit_mediator_nuisance(t, m, x)
            self.fit_cross_conditional_mean_outcome_nuisance_discrete(t, m, x, y)
        
        else:
            raise NotImplementedError

        self._fitted = True

        if self.verbose:
            print(f"Nuisance models fitted")

We can see that there's two discrimination variables : ratio and procedure

From my understanding about our discussion today, with some questions in the middle :

  • DML estimator is equivalent to the ratio='propensities' scenario of the MultiplyRobust estimator, where m is continuous (and multidimensional ?).
  • procedure variable enables to specify, in the scenario where m is continuous, if we want to "discretize" m, in order to make the estimation faster ?
  • We want to get rid of this discrete scenario, and therefore also get rid of the procedure variable
  • Additionally, we want to get rid of the ratio variable and detect automatically if we are in the density (m is discrete) or propensities (m is continuous or discrete) scenario ?!

fyi @judithabk6

@bthirion
Copy link
Collaborator

I may have missed some of the previous steps, but I would add the logic to infer such parameters automatically after the current refactoring. It's an optimization, not really a must-have imho.

@judithabk6
Copy link
Owner

@bthirion one of the first step included some legacy code from @houssamzenati, so we are rather reverting back to the current state

not a must-have but not so complicated I think?

@houssamzenati
Copy link
Collaborator

@brash6 yes I agree with what you wrote, it is indeed what we said yesterday @bthirion we are indeed trying to implement automatic inference of such variants, the only parameter that will be let is ratio='propensities' or 'density' for the mediator m when it is only discrete only. Would you like to also enforce a single option in that case? we thought with judith that in that case it was beneficial to let the choice.

@brash6
Copy link
Collaborator Author

brash6 commented Oct 30, 2024

@houssamzenati It's clear on my side, so :

  • We remove the discrete scenario (and therefore the _procedure variable)
  • We keep the _ratio variable to enable the choice between propensities and density when m is discrete
  • Finally, do we remove DML in this PR (from my understanding, if DML is just a subcase of MR, ce can just remove it in this PR after the two first points are done ? @bthirion @houssamzenati @judithabk6

@judithabk6
Copy link
Owner

I agree with the 3 points

@houssamzenati
Copy link
Collaborator

yes you can remove DML in the PR after the points are treated and also when cross fitting is implemented (is back actually) in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants