-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for full set of XGBoost feature importance metrics #52
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things in XGBoostFeatureImportance, and could you add tests to make sure it correctly generates the object for one of the dummy datasets in XGBoostClassificationModel and XGBoostRegressionModel?
*/ | ||
public static class XGBoostFeatureImportanceRecord { | ||
|
||
private String featureName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature name should be final.
} | ||
|
||
private Booster booster; | ||
private ImmutableFeatureMap featureMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields probably should be final.
|
||
/** | ||
*/ | ||
public List<XGBoostFeatureImportanceRecord> getImportances() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc.
* @param numFeatures number of features to return | ||
* @return records of all importance metrics for each feature, sorted by gain. | ||
*/ | ||
public List<XGBoostFeatureImportanceRecord> getImportances(int numFeatures) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc.
} | ||
|
||
public String toString() { | ||
return "XGBoostFeatureImportance(" + getImportances(5).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a toString that generates an awful lot of garbage. Do we even want a toString here given it does computation?
* An instance of feature importance values for a single feature. See {@link XGBoostFeatureImportance} for details | ||
* on interpreting the metrics. | ||
*/ | ||
public static class XGBoostFeatureImportanceRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm dubious about giving this a name including "Record" given it's not a record yet, and calling records "somethingRecord" sounds a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about XGBoostFeatureImportanceInstance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
return "XGBoostFeatureImportance(" + getImportances(5).stream() | ||
.map(XGBoostFeatureImportanceRecord::toString) | ||
.collect(Collectors.joining(",\n\t")) + ")"; | ||
return String.format("XGBoostFeatureImportance(booster=%s, featureIdMap=%s)", booster.toString(), featureMap.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to return a toString that refers to the model provenance? The booster and featureIdMap toStrings aren't very helpful.
…ng importance method te `XGBoostExternalModel`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding richer feature importance support to XGBoost modeling to get parity with other XGBoost wrappers.