Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] SageMaker Pipelines Notebook - XGBoost is first installed with anaconda then upgraded with pip. #4779

Open
kmanuwai opened this issue Nov 11, 2024 · 0 comments

Comments

@kmanuwai
Copy link

kmanuwai commented Nov 11, 2024

Link to the notebook
https://github.com/aws/amazon-sagemaker-examples/blob/cddb473cc79c2eaae5d7fb79c456280cc5d6471d/%20%20%20ml_ops/sm-pipelines_batch_inference_step_decorator/sm-pipelines_batch_inference_step_decorator.ipynb

Describe the bug
Pipeline execution fails at Training Step, due to two different version installations of XGboost. One by Conda from the Sagemaker Distribution Image. The other from Pip in the requirements.txt.

To reproduce
Run the notebook on SageMaker Studio. Distribution Image 2.1.0.

Potential Fix identified
Works if we run on SageMaker Distribution Image 1.11. However this is not the default anymore, so customer will run into this issue more often.

  1. Update XGboost in requirements.txt to 2.1.1.
  2. Move early_stopping_rounds=5 from xgb.fit() to XGBClassifier()
    Like below:
    xgb = XGBClassifier(n_estimators=num_round, early_stopping_rounds=5,  **param)
    xgb.fit(
        train_df,
        y_train,
        eval_set=[(validation_df, y_validation)],
    )
    
  3. Now need to fix Evaluate step. Errors with ModelBuilder.

Logs

Traceback (most recent call last): File "/opt/conda/lib/python3.11/site-packages/sagemaker/remote_function/invoke_function.py", line 144, in main _execute_remote_function( File "/opt/conda/lib/python3.11/site-packages/sagemaker/remote_function/invoke_function.py", line 119, in _execute_remote_function stored_function.load_and_invoke() File "/opt/conda/lib/python3.11/site-packages/sagemaker/remote_function/core/stored_function.py", line 189, in load_and_invoke result = func(*resolved_args, **resolved_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/ipykernel_1942/1143899264.py", line 26, in train File "/opt/conda/lib/python3.11/site-packages/xgboost/init.py", line 7, in from . import collective, dask, rabit File "/opt/conda/lib/python3.11/site-packages/xgboost/collective.py", line 12, in from .core import _LIB, _check_call, c_str, py_str, from_pystr_to_cstr File "/opt/conda/lib/python3.11/site-packages/xgboost/core.py", line 264, in _LIB = _load_lib() ^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/xgboost/core.py", line 258, in _load_lib raise ValueError(msg)

ValueError: Mismatched version between the Python package and the native shared object. Python package version: 1.7.1. Shared object version: 2.1.1. Shared object is loaded from: /opt/conda/lib/libxgboost.so.
Likely cause: * XGBoost is first installed with anaconda then upgraded with pip. To fix it please remove one of the installations.

After Doing steps 1 and 2 of the fix above, get this error in the Evaluate step.

ValueError: Unable to auto detect a DLC for framework xgboost, framework version py311 and python version 2.1.1. Please manually provide image_uri to ModelBuilder()

Note: I am an AWS employee. Please feel free to message internally.

@kmanuwai kmanuwai changed the title [Bug Report] SageMaker Pipelines Notebook - XGBoost is first installed with anaconda then upgraded with pip. To fix it please remove one of the installations. [Bug Report] SageMaker Pipelines Notebook - XGBoost is first installed with anaconda then upgraded with pip. Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant