Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] FutureCancelledError when training DaskLGBMClassifier on databricks #6660

Open
gdubs89 opened this issue Oct 1, 2024 · 1 comment

Comments

@gdubs89
Copy link

gdubs89 commented Oct 1, 2024

Description

When trying to train a DaskLGBMClassifier using dask-databricks, I seem to run into the error FutureCancelledError: operator.itemgetter(1)-aedec8478dd062943dfc5db591c68b4c cancelled for reason: unknown. no matter what I do. Databricks assistant/chatGPT thought it might be due to partitioning of the training data but I've tried massively reducing or decreasing the number of partitions and all that changes is how quickly it fails.

Reproducible example

import lightgbm.dask as lgb_dask
from lightgbm import DaskLGBMClassifier
import dask_databricks
import dask.dataframe as dd

client = dask_databricks.get_client()

train_ddf = dd.read_parquet(train_filepath, storage_options=storage_options).repartition(npartitions=320)
eval_ddf = dd.read_parquet(eval_filepath, storage_options=storage_options).repartition(npartitions=320)
#have played with lots of different partitioning strategies

#cast categorical features as categorical types, categorize in the training data, apply that categorization to the eval data
train_ddf[categorical_features] = train_dff[categorical_features].astype('category').categorize()
category_mappings = {col: train_ddf[col].cat.categories for col in categorical_columns}#hold onto this guy for making predictions on unseen data
eval_ddf[categorical_features] = eval_dff[categorical_features].astype('category')
for col in categorical_columns:
  eval_ddf[col] = eval_ddf[col].cat.set_categories(category_mappings[col])

clf = DaskLGBMClassifier(
    client=client,
    objective="binary",
    max_depth=-1,
    num_leaves=5000,
    metric="binary_logloss",
    boosting_type="gbdt"
)

clf.fit(
    X=train_ddf[features], 
    y=train_ddf['target'], 
    eval_set=[(eval_ddf[features], eval_ddf['target'])],
    eval_names=['eval'],

)

Environment info

Lightgbm 4.5.0, just installed via a !pip install lightgbm at the top of the notebook

I'm using databricks runtime 15.4 LTS, with the following init script:

#!/bin/bash

# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade xgboost s3fs dask[complete] dask-databricks "numpy==1.*"

# Start Dask cluster components
dask databricks run
@jameslamb jameslamb changed the title FutureCancelledError when training DaskLGBMClassifier on databricks [dask] FutureCancelledError when training DaskLGBMClassifier on databricks Oct 1, 2024
@jmoralez
Copy link
Collaborator

Hey @gdubs89, thanks for using LightGBM.

Is it possible for you to install LightGBM on the cluster? Running it at the top of the notebook installs it on the driver, but every executor needs to have it, so that may be the reason.
Another thing that would help is if you're able to access the logs from the executors, those may have the exact error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants