Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: "Member must have length less than or equal to 63" when creating a job in SageMaker #1517

Closed
1 task done
Frank995 opened this issue May 4, 2023 · 1 comment
Closed
1 task done
Labels
bug Something isn't working

Comments

@Frank995
Copy link
Contributor

Frank995 commented May 4, 2023

Contact Details [Optional]

[email protected]

System Information

ZENML_LOCAL_VERSION: 0.37.0
ZENML_SERVER_VERSION: 0.37.0
ZENML_SERVER_DATABASE: mysql
ZENML_SERVER_DEPLOYMENT_TYPE: other
ZENML_CONFIG_DIR: /home/francesco/.config/zenml
ZENML_LOCAL_STORE_DIR: /home/francesco/.config/zenml/local_stores
ZENML_SERVER_URL: https://agp2rc7652.eu-west-1.awsapprunner.com
ZENML_ACTIVE_REPOSITORY_ROOT: /home/francesco/repos/shipamax-vulcan-review/data-science/src/modelling_tools/zenml
PYTHON_VERSION: 3.10.10
ENVIRONMENT: native
SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': 
'20.04'}
ACTIVE_WORKSPACE: default
ACTIVE_STACK: francesco_stack
ACTIVE_USER: francesco
TELEMETRY_STATUS: enabled
ANALYTICS_CLIENT_ID: fac1c365-39ab-48cb-8147-4bfcb59c3bd6
ANALYTICS_USER_ID: 5135c97c-454d-4ea4-b04c-67c9b9464cef
ANALYTICS_SERVER_ID: face5c38-e106-44f8-aa3b-88ffcab8b10c
INTEGRATIONS: ['aws', 'kaniko', 'lightgbm', 'pillow', 'plotly', 'pytorch', 's3', 'scipy', 'sklearn']
PACKAGES: {'pdfminer.six': '20221105', 'regex': '2023.3.22', 'tifffile': '2023.3.21', 'certifi': '2022.12.7', 
's3fs': '2022.11.0', 'fsspec': '2022.11.0', 'pytz': '2022.7.1', 'tzdata': '2022.7', 'setuptools': '65.6.3', 
'cryptography': '38.0.4', 'pyzmq': '25.0.2', 'black': '23.1.0', 'pip': '23.0.1', 'packaging': '23.0', 'attrs': 
'22.2.0', 'contextlib2': '21.6.0', 'argon2-cffi': '21.3.0', 'argon2-cffi-bindings': '21.2.0', 
'azure-storage-blob': '12.9.0', 'rich': '12.6.0', 'pillow': '9.4.0', 'more-itertools': '9.1.0', 'phonenumbers': 
'8.13.7', 'ipython': '8.11.0', 'tenacity': '8.2.2', 'click': '8.1.3', 'python-slugify': '8.0.1', 'ipywidgets': 
'7.7.4', 'azure-servicebus': '7.6.0', 'nbconvert': '7.2.10', 'coverage': '7.2.2', 'jupyter-client': '7.2.0', 
'ipykernel': '6.22.0', 'notebook': '6.5.3', 'tornado': '6.2', 'ftfy': '6.1.1', 'multidict': '6.0.4', 'docker': 
'6.0.1', 'bleach': '6.0.0', 'plotly': '5.14.1', 'psutil': '5.9.4', 'traitlets': '5.9.0', 'nbformat': '5.8.0', 
'pyyaml': '5.4.1', 'jupyter-core': '5.3.0', 'decorator': '5.1.1', 'configobj': '5.0.8', 'mailchecker': '5.0.7', 
'dash-table': '5.0.0', 'smmap': '5.0.0', 'tqdm': '4.65.0', 'fonttools': '4.39.2', 'transformers': '4.27.2', 
'jsonschema': '4.17.3', 'importlib-metadata': '4.13.0', 'beautifulsoup4': '4.12.0', 'antlr4-python3-runtime': 
'4.9.3', 'lxml': '4.9.2', 'rsa': '4.9', 'pexpect': '4.8.0', 'pytest': '4.6.11', 'opencv-python-headless': 
'4.6.0.66', 'typing-extensions': '4.5.0', 'isort': '4.3.21', 'azure-keyvault-secrets': '4.3.0', 'tzlocal': '4.3',
'cachetools': '4.2.4', 'altair': '4.2.2', 'gitdb': '4.0.10', 'async-timeout': '4.0.2', 'bcrypt': '4.0.1', 
'singledispatch': '4.0.0', 'pytest-cov': '4.0.0', 'protobuf': '3.20.1', 'zipp': '3.15.0', 'ply': '3.11', 
'filelock': '3.10.2', 'aiohttp': '3.8.4', 'matplotlib': '3.7.1', 'h5py': '3.7.0', 'widgetsnbextension': '3.6.3', 
'anyio': '3.6.2', 'markdown': '3.4.3', 'idna': '3.4', 'nltk': '3.4', 'lightgbm': '3.3.5', 'oauthlib': '3.2.2', 
'flufl.lock': '3.2', 'pytest-mock': '3.2.0', 'azure-ai-formrecognizer': '3.2.0b3', 'gitpython': '3.1.18', 
'jinja2': '3.1.2', 'atpublic': '3.1.1', 'platformdirs': '3.1.1', 'threadpoolctl': '3.1.0', 'prompt-toolkit': 
'3.0.38', 'chardet': '3.0.4', 'intervaltree': '3.0.2', 'zc.lockfile': '3.0.post1', 'networkx': '3.0', 'watchdog':
'3.0.0', 'sagemaker': '2.117.0', 'boto': '2.49.0', 'requests': '2.28.1', 'imageio': '2.26.1', 'pycparser': 
'2.21', 'fastjsonschema': '2.16.3', 'pygments': '2.14.0', 'tensorboard': '2.12.0', 'aws-xray-sdk': '2.11.0', 
'psycopg2-binary': '2.9.5', 'dash': '2.9.3', 'python-dateutil': '2.8.1', 'portalocker': '2.7.0', 'pyjwt': 
'2.6.0', 'pyparsing': '2.4.7', 'pylint': '2.4.4', 'aiobotocore': '2.4.2', 'soupsieve': '2.4', 'sortedcontainers':
'2.4.0', 'astroid': '2.3.3', 'pygtrie': '2.3.2', 'omegaconf': '2.3.0', 'werkzeug': '2.2.3', 'flask': '2.2.3', 
'asttokens': '2.2.1', 'cloudpickle': '2.2.1', 'termcolor': '2.2.0', 'dpath': '2.1.4', 'markupsafe': '2.1.2', 
'itsdangerous': '2.1.2', 'base58': '2.1.1', 'charset-normalizer': '2.1.1', 'pycocotools': '2.0.6', 'mistune': 
'2.0.5', 'greenlet': '2.0.2', 'portion': '2.0.2', 'tomli': '2.0.1', 'dash-html-components': '2.0.0', 
'dash-core-components': '2.0.0', 'googleapis-common-protos': '1.56.4', 'grpcio': '1.51.3', 'google-auth': 
'1.35.0', 'botocore': '1.27.59', 'urllib3': '1.26.15', 'pypdf2': '1.26.0', 'boto3': '1.24.59', 'numpy': '1.24.0',
'jupyter-server': '1.23.6', 'azure-core': '1.22.1', 'pymupdf': '1.18.17', 'funcy': '1.18', 'msal': '1.17.0', 
'google-api-core': '1.17.0', 'six': '1.16.0', 'cffi': '1.15.1', 'mypy-boto3': '1.14.40.0', 'boto3-stubs': 
'1.14.40.0', 'dvc': '1.11.16', 'wrapt': '1.11.2', 'py': '1.11.0', 'torch': '1.11.0', 'backoff': '1.10.0', 
'scipy': '1.9.3', 'pydantic': '1.9.2', 'shapely': '1.8.5', 'yarl': '1.8.2', 'tensorboard-plugin-wit': '1.8.1', 
'alembic': '1.8.1', 'distro': '1.8.0', 'send2trash': '1.8.0', 'azure-identity': '1.8.0', 'ppft': '1.7.6.6', 
'passlib': '1.7.4', 'pysocks': '1.7.1', 'debugpy': '1.6.6', 'uamqp': '1.6.4', 'monotonic': '1.6', 'shtab': 
'1.5.8', 'nest-asyncio': '1.5.6', 'jsonpath-ng': '1.5.3', 'websocket-client': '1.5.1', 'blinker': '1.5', 
'pandocfilters': '1.5.0', 'sqlalchemy': '1.4.41', 'kiwisolver': '1.4.4', 'appdirs': '1.4.4', 'typed-ast': 
'1.4.3', 'lazy-object-proxy': '1.4.3', 'pydot': '1.4.2', 'dash-bootstrap-components': '1.4.1', 'atomicwrites': 
'1.4.1', 'pdf2image': '1.4.1', 'pywavelets': '1.4.1', 'analytics-python': '1.4.post1', 'absl-py': '1.4.0', 
'pyahocorasick': '1.4.0', 'unidecode': '1.3.6', 'frozenlist': '1.3.3', 'hydra-core': '1.3.2', 
'requests-oauthlib': '1.3.1', 'aiosignal': '1.3.1', 'sniffio': '1.3.0', 'pytest-datadir': '1.3.0', 
'text-unidecode': '1.3', 'mako': '1.2.4', 'tinycss2': '1.2.1', 'executing': '1.2.0', 'joblib': '1.2.0', 'xlrd': 
'1.2.0', 'azure-common': '1.1.28', 'pandas': '1.1.5', 'jupyterlab-widgets': '1.1.3', 'shortuuid': '1.0.11', 
'contourpy': '1.0.7', 'scikit-learn': '1.0.2', 'pymysql': '1.0.2', 'smdebug-rulesconfig': '1.0.1', 
'google-cloud-vision': '1.0.0', 'mypy-extensions': '1.0.0', 'requests-aws4auth': '1.0.0', 'streamlit': '0.77.0', 
'multiprocess': '0.70.14', 'wheel': '0.38.4', 'sqlalchemy-utils': '0.38.3', 'zenml': '0.37.0', 'python-benedict':
'0.30.0', 'cython': '0.29.33', 'dulwich': '0.21.3', 'python-dotenv': '0.21.0', 'pyrsistent': '0.19.3', 
'httplib2': '0.19.1', 'future': '0.18.3', 'validators': '0.18.2', 'jedi': '0.18.2', 'scikit-image': '0.18.1', 
'ruamel.yaml': '0.17.21', 'terminado': '0.17.1', 'prometheus-client': '0.16.0', 'huggingface-hub': '0.13.3', 
'tokenizers': '0.13.2', 'pluggy': '0.13.1', 'voluptuous': '0.13.1', 'xmltodict': '0.13.0', 'python-levenshtein': 
'0.12.1', 'torchvision': '0.12.0', 'toolz': '0.12.0', 'pathspec': '0.11.1', 'cycler': '0.11.0', 'torchaudio': 
'0.11.0', 'aioitertools': '0.11.0', 'toml': '0.10.2', 'imbalanced-learn': '0.10.1', 'python-terraform': '0.10.1',
'python-fsutil': '0.10.0', 'jmespath': '0.10.0', 'rtree': '0.9.5', 'commonmark': '0.9.1', 'tabulate': '0.9.0', 
'dictdiffer': '0.9.0', 'parso': '0.8.3', 'astor': '0.8.1', 'dgl': '0.8.0.post1', 'pydeck': '0.8.0', 
'pickleshare': '0.7.5', 'schema': '0.7.5', 'nbclient': '0.7.2', 'pytorch-crf': '0.7.2', 'defusedxml': '0.7.1', 
'ptyprocess': '0.7.0', 'tensorboard-data-server': '0.7.0', 'msrest': '0.6.21', 'stack-data': '0.6.2', 'isodate': 
'0.6.1', 'string-grouper': '0.6.1', 'mccabe': '0.6.1', 'multipledispatch': '0.6.0', 'grandalf': '0.6', 
's3transfer': '0.6.0', 'detectron2': '0.6', 'pyocr': '0.5.3', 'nbclassic': '0.5.3', 'nanotime': '0.5.2', 
'webencodings': '0.5.1', 'pytorch-nlp': '0.5.0', 'pyasn1': '0.4.8', 'colorama': '0.4.6', 'google-auth-oauthlib': 
'0.4.6', 'flatten-dict': '0.4.2', 'entrypoints': '0.4', 'dill': '0.3.6', 'pox': '0.3.2', 
'sparse-dot-topn-for-blocks': '0.3.1.post3', 'msal-extensions': '0.3.1', 'dash-cytoscape': '0.3.0', 
'click-params': '0.3.0', 'pathos': '0.3.0', 'pyasn1-modules': '0.2.8', 'ruamel.yaml.clib': '0.2.7', 'wcwidth': 
'0.2.6', 'jupyterlab-pygments': '0.2.2', 'notebook-shim': '0.2.2', 'pure-eval': '0.2.2', 'ipython-genutils': 
'0.2.0', 'backcall': '0.2.0', 'xmljson': '0.2.0', 'google-pasta': '0.2.0', 'iopath': '0.1.9', 'yacs': '0.1.8', 
'matplotlib-inline': '0.1.6', 'fvcore': '0.1.5.post20221221', 'wordninja': '0.1.5', 'protobuf3-to-dict': '0.1.5',
'comm': '0.1.3', 'pytz-deprecation-shim': '0.1.0.post0', 'sqlmodel': '0.0.8', 'topn': '0.0.7', 
'sqlalchemy2-stubs': '0.0.2a32', 'imblearn': '0.0'}
The attribute instance_type of class SagemakerStepOperatorConfig will be deprecated soon.
The stack francesco_stack contains components that require building Docker images. Older versions of ZenML always built these images locally, but since version 0.32.0 this behavior can be configured using the image_builder stack component. This stack will temporarily default to a local image builder that mirrors the previous behavior, but this will be removed in future versions of ZenML. Please add an image builder to this stack:
zenml image-builder register <NAME> ...
zenml stack update 1958ff2b-7f48-4d4a-944e-f51095dbeeed -i <NAME>

CURRENT STACK

Name: francesco_stack
ID: 1958ff2b-7f48-4d4a-944e-f51095dbeeed
Shared: No
User: francesco / 5135c97c-454d-4ea4-b04c-67c9b9464cef
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

ORCHESTRATOR: default

Name: default
ID: 987988bd-3b0d-4a12-b62d-38a479080d9a
Type: orchestrator
Flavor: local
Configuration: {}
Shared: No
User: francesco / 5135c97c-454d-4ea4-b04c-67c9b9464cef
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

ARTIFACT_STORE: francesco_store

Name: francesco_store
ID: 95497a65-abfd-4f93-a480-df52805ebc08
Type: artifact_store
Flavor: s3
Configuration: {'authentication_secret': None, 'path': 's3://zenml-store-francesco', 'key': '********', 'secret':
'********', 'token': '********', 'client_kwargs': None, 'config_kwargs': None, 's3_additional_kwargs': None}
Shared: No
User: francesco / 5135c97c-454d-4ea4-b04c-67c9b9464cef
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

CONTAINER_REGISTRY: ecr_registry

Name: ecr_registry
ID: a6c5fbfc-d79b-44b0-ab2b-5da060bd2ee0
Type: container_registry
Flavor: aws
Configuration: {'authentication_secret': None, 'uri': '470832953632.dkr.ecr.eu-west-1.amazonaws.com'}
Shared: Yes
User: default / b45d3a21-bf75-4f6f-9aac-88aeb712cf75
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

SECRETS_MANAGER: francesco-secret-store

Name: francesco-secret-store
ID: bf8a71a1-b33c-4a01-bc14-a4c91242629c
Type: secrets_manager
Flavor: aws
Configuration: {'scope': <SecretsManagerScope.COMPONENT: 'component'>, 'namespace': None, 'region_name': 
'eu-west-1'}
Shared: No
User: francesco / 5135c97c-454d-4ea4-b04c-67c9b9464cef
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

STEP_OPERATOR: sagemaker_trainer_cpu

Name: sagemaker_trainer_cpu
ID: 9aa6203f-4a27-47c9-9cd8-3d1c2cef7082
Type: step_operator
Flavor: sagemaker
Configuration: {'instance_type': 'ml.c5.18xlarge', 'experiment_name': None, 'input_data_s3_uri': None, 
'estimator_args': {}, 'role': 'notebook-role-datascience-zenml', 'bucket': None}
Shared: Yes
User: francesco / 5135c97c-454d-4ea4-b04c-67c9b9464cef
Workspace: default / 375b424b-a9f0-4806-aaac-20c3d6932740

What happened?

When running a pipeline with a step operator in SageMaker I may get:
An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value 'generic-training-pipeline-2023-04-12-15-36-41-124330-model-trainer' at 'trainingJobName' failed to satisfy constraint: Member must have length less than or equal to 63
If the pipeline name is medium sized.

Reproduction steps

  1. Create a pipeline with only one step
  2. Give it a medium size name, such as 'generic-training-pipeline'
  3. Define a step operator in SageMaker
  4. Use the step decorator to run the step on SageMaker
    ...

Relevant log output

Si è verificata un'eccezione: ClientError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value 'generic-training-pipeline-2023-04-13-06-05-26-177259-model-trainer' at 'trainingJobName' failed to satisfy constraint: Member must have length less than or equal to 63
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/botocore/client.py", line 915, in _make_api_call
    raise error_class(parsed_response, operation_name)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/session.py", line 611, in submit
    self.sagemaker_client.create_training_job(**request)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/session.py", line 4344, in _intercept_create_request
    return create(request)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/session.py", line 613, in train
    self._intercept_create_request(train_request, submit, self.train.__name__)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/estimator.py", line 2042, in start_new
    estimator.sagemaker_session.train(**train_args)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/estimator.py", line 1125, in fit
    self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/sagemaker/workflow/pipeline_context.py", line 272, in wrapper
    return run_func(*args, **kwargs)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/integrations/aws/step_operators/sagemaker_step_operator.py", line 207, in launch
    estimator.fit(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 430, in _run_step_with_step_operator
    step_operator.launch(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 375, in _run_step
    self._run_step_with_step_operator(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/step_launcher.py", line 198, in launch
    self._run_step(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py", line 186, in run_step
    launcher.launch()
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/local/local_orchestrator.py", line 82, in prepare_or_run_pipeline
    self.run_step(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/orchestrators/base_orchestrator.py", line 166, in run
    result = self.prepare_or_run_pipeline(
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/stack/stack.py", line 864, in deploy_pipeline
    return self.orchestrator.run(deployment=deployment, stack=self)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/site-packages/zenml/pipelines/base_pipeline.py", line 599, in run
    stack.deploy_pipeline(deployment=deployment_model)
  File "/home/francesco/repos/shipamax-vulcan-review/data-science/src/modelling_tools/zenml/run_standard_training_pipeline.py", line 46, in <module>
    pipeline.run(config_path="standard_training_config.yaml")
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/francesco/miniconda3/envs/zenml/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value 'generic-training-pipeline-2023-04-13-06-05-26-177259-model-trainer' at 'trainingJobName' failed to satisfy constraint: Member must have length less than or equal to 63

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Frank995 Frank995 added the bug Something isn't working label May 4, 2023
@strickvl
Copy link
Contributor

strickvl commented May 4, 2023

Thanks for this report. @christianversloot's #1505 was merged into develop very recently which fixes this (and the previous issue #1502). I'll be in the next release. Thanks for reporting it, though!

@strickvl strickvl closed this as completed May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants