Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for full set of XGBoost feature importance metrics #52

Merged
merged 3 commits into from
Sep 30, 2020

Conversation

JackSullivan
Copy link
Member

Adding richer feature importance support to XGBoost modeling to get parity with other XGBoost wrappers.

Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things in XGBoostFeatureImportance, and could you add tests to make sure it correctly generates the object for one of the dummy datasets in XGBoostClassificationModel and XGBoostRegressionModel?

*/
public static class XGBoostFeatureImportanceRecord {

private String featureName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature name should be final.

}

private Booster booster;
private ImmutableFeatureMap featureMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields probably should be final.


/**
*/
public List<XGBoostFeatureImportanceRecord> getImportances() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc.

* @param numFeatures number of features to return
* @return records of all importance metrics for each feature, sorted by gain.
*/
public List<XGBoostFeatureImportanceRecord> getImportances(int numFeatures) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc.

}

public String toString() {
return "XGBoostFeatureImportance(" + getImportances(5).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a toString that generates an awful lot of garbage. Do we even want a toString here given it does computation?

* An instance of feature importance values for a single feature. See {@link XGBoostFeatureImportance} for details
* on interpreting the metrics.
*/
public static class XGBoostFeatureImportanceRecord {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm dubious about giving this a name including "Record" given it's not a record yet, and calling records "somethingRecord" sounds a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about XGBoostFeatureImportanceInstance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

return "XGBoostFeatureImportance(" + getImportances(5).stream()
.map(XGBoostFeatureImportanceRecord::toString)
.collect(Collectors.joining(",\n\t")) + ")";
return String.format("XGBoostFeatureImportance(booster=%s, featureIdMap=%s)", booster.toString(), featureMap.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to return a toString that refers to the model provenance? The booster and featureIdMap toStrings aren't very helpful.

…ng importance method te `XGBoostExternalModel`.
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Craigacp Craigacp merged commit 3b647a3 into main Sep 30, 2020
@Craigacp Craigacp deleted the xgboost-feature-importance branch September 30, 2020 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants