Fixed AnchorTabular
length discrepancy between feature
and names
field.
#902
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the
AnchorTabular
length discrepancy between thefeature
andnames
filed returned in the explanation object. To describe what caused the issue, let us consider the following example.Consider that the dataset has a numerical feature
f
. BecauseAnchors
can only handle discrete data, a discretization step is required for numerical features. In our examples, we discretize the numerical values based on the 25, 50, 75% quantiles. Lett25
,t50
,t75
be the associated quantile values. This results in a discretization of the numerical featuref
in 4 bins:[-inf, t25]
,[t25, t50]
,[t50, t75]
, and[t75, +inf]
, encoded by 0, 1, 2, and 3, respectively.Let us consider that we want to explain an instance
X
, and let us denoteX[f]
the feature value off
for the instanceX
. Assume thatX[f]
falls in bin number 2, thus being encoded by the value 2.For numerical features, the
AnchorTabular
algorithm creates multiple predicates associated with the same featuref
. Those predicates correspond to intervals from which numerical samples can be drawn for the perturbation step in the algorithm. The code for this can be seen here. In our case the following predicates will be created:P1 = [1, 2, 3]
,P2 = [2, 3]
,P3 = [0, 1, 2]
Note that each predicate
Pi
corresponds to an interval to from which we can sample values for the featuref
. For exampleP1
will be associated with the interval[t25, +inf]
,P2
with[t50, +inf]
, andP3
with[-inf, t75]
.It is possible that the final anchor can contain multiple predicates form the three
Pi
's we listed above. Let us assume that it ends up containingP1
andP2
. With this assumption let us move to the construction of the human interpretable representaion of the anchor implemented here.Let's say that the the anchor is composed of three predicates encoded by
[1, 2, 3]
, where1
is associtated to a featureg
different thanf
, and2
,3
correspond to predicatesP1
,P2
associtated to featuref
.Following the code line be line we have:
We already see at this point that the length of the
explanation['feature']
differs from the length of the keys inordinal_ranges
, becauseexplanation['feature']
contains a duplicate off
.The following block of code perform a correct intersection and refinement of the intervals for each feature in the anchor:
Finally, the human interpretable representation of the anchor for numerical features is constructed here based on the dictionary
ordinal_ranges
.Note that the
explanation['names']
filed avoids the duplication of the same feature, hence the difference in length with theexplanation['feature']
.The way to fix this issue is to set the
explanation[
names]
to the keys list inordinal_ranges
.