- Download XGBoost and sklearn libraries to all nodes of your Greenplum cluster (using
gpscp -f hostfile <source> =:<destination>
) - Compile and install XGBoost and sklearn libraries on all nodes (using
gpssh -f hostfile
followed by the compile and install commands) - Run the above SQL file (it will create a schema called
xgbdemo
). - Invoke the UDFs as shown in the sample snippet.
Since the XGBoost implementation in https://github.com/dmlc/xgboost is not Python 2.6 compatible, I recommend you clone my version from https://github.com/vatsan/xgboost and use it instead (Python 2.6 compatible, will work with PL/Python on Greenplum/HAWQ).