Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed AnchorTabular length discrepancy between feature and names field. #902

Merged

Conversation

RobertSamoilescu
Copy link
Collaborator

This PR fixes the AnchorTabular length discrepancy between the feature and names filed returned in the explanation object. To describe what caused the issue, let us consider the following example.

Consider that the dataset has a numerical feature f. Because Anchors can only handle discrete data, a discretization step is required for numerical features. In our examples, we discretize the numerical values based on the 25, 50, 75% quantiles. Let t25, t50, t75 be the associated quantile values. This results in a discretization of the numerical feature f in 4 bins: [-inf, t25], [t25, t50], [t50, t75], and [t75, +inf], encoded by 0, 1, 2, and 3, respectively.

Let us consider that we want to explain an instance X, and let us denote X[f] the feature value of f for the instance X. Assume that X[f] falls in bin number 2, thus being encoded by the value 2.

For numerical features, the AnchorTabular algorithm creates multiple predicates associated with the same feature f. Those predicates correspond to intervals from which numerical samples can be drawn for the perturbation step in the algorithm. The code for this can be seen here. In our case the following predicates will be created:

  • P1 = [1, 2, 3],
  • P2 = [2, 3],
  • P3 = [0, 1, 2]

Note that each predicate Pi corresponds to an interval to from which we can sample values for the feature f. For example P1 will be associated with the interval [t25, +inf], P2 with [t50, +inf], and P3 with [-inf, t75].

It is possible that the final anchor can contain multiple predicates form the three Pi's we listed above. Let us assume that it ends up containing P1 and P2. With this assumption let us move to the construction of the human interpretable representaion of the anchor implemented here.

Let's say that the the anchor is composed of three predicates encoded by [1, 2, 3], where 1 is associtated to a feature g different than f, and 2, 3 correspond to predicates P1, P2 associtated to feature f.

Following the code line be line we have:

anchor_idxs = explanation['feature']     # anchor_idx= [1, 2, 3]

explanation['names'] = []

explanation['feature'] = [self.enc2feat_idx[idx] for idx in anchor_idxs]  # explanation['features'] = [g, f, f]

ordinal_ranges = {self.enc2feat_idx[idx]: [float('-inf'), float('inf')] for idx in anchor_idxs}  # ordinal_ranges = {g: [-inf, +inf], f: [-inf, +inf]}

We already see at this point that the length of the explanation['feature'] differs from the length of the keys in ordinal_ranges, because explanation['feature'] contains a duplicate of f.

The following block of code perform a correct intersection and refinement of the intervals for each feature in the anchor:

for idx in set(anchor_idxs) - self.cat_lookup.keys():
    feat_id = self.enc2feat_idx[idx]  # feature col. id
    if 0 in self.ord_lookup[idx]:  # tells if the feature in X falls in a higher or lower bin
        ordinal_ranges[feat_id][1] = min(
            ordinal_ranges[feat_id][1], max(list(self.ord_lookup[idx]))
        )
    else:
        ordinal_ranges[feat_id][0] = max(
            ordinal_ranges[feat_id][0], min(list(self.ord_lookup[idx])) - 1
        )

Finally, the human interpretable representation of the anchor for numerical features is constructed here based on the dictionary ordinal_ranges.

Note that the explanation['names'] filed avoids the duplication of the same feature, hence the difference in length with the explanation['feature'].

The way to fix this issue is to set the explanation[names] to the keys list in ordinal_ranges.

@jklaise
Copy link
Contributor

jklaise commented Apr 17, 2023

Nice! Thanks also for the thorough explanation.

@jklaise jklaise merged commit 89eb7d0 into SeldonIO:master Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants