Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 1.05 KB

README.md

File metadata and controls

19 lines (13 loc) · 1.05 KB

In-database parallel grid-search wrapper for XGBoost

Dependencies

  1. XGBoost library
  2. scikit-learn

Installation

  1. Download XGBoost and sklearn libraries to all nodes of your Greenplum cluster (using gpscp -f hostfile <source> =:<destination>)
  2. Compile and install XGBoost and sklearn libraries on all nodes (using gpssh -f hostfile followed by the compile and install commands)
  3. Run the above SQL file (it will create a schema called xgbdemo).
  4. Invoke the UDFs as shown in the sample snippet.

Note: XGBoost and Python 2.6

Since the XGBoost implementation in https://github.com/dmlc/xgboost is not Python 2.6 compatible, I recommend you clone my version from https://github.com/vatsan/xgboost and use it instead (Python 2.6 compatible, will work with PL/Python on Greenplum/HAWQ).

Implementation details for in-database parallel grid search

In-database parallel grid-search