Tensorflowmodel points to images that do not exist #912

NoahDolev · 2019-07-07T11:53:15Z

Please fill out the form below.

System Information

Tensorflow:
Fails for all versions:
*Fails for py3 and py2:
Fails for CPU and GPU:
No custom image:

Describe the problem

If I try to deploy a pre-built model like so:

sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model0100.tar.gz',
                                  role = role,
                                  framework_version='1.13', py_version='py3',
                                  entry_point = 'train.py')

Will fail upon deploying:

predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.p2.xlarge')

I receive:

ValueError: Error hosting endpoint sagemaker-tensorflow-2019-07-07-11-50-45-473: Failed Reason:  The image '520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.13-gpu-py3' does not exist.

I can get past this error by specifying the image (which is not well-documented - took a lot of digging to find a link that worked):

sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model0100.tar.gz',
                                  role = role,
                                  framework_version='1.13', py_version='py3',
                                  entry_point = 'train.py', image = '763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.13-gpu' )

Any idea how to solve this?

The text was updated successfully, but these errors were encountered:

chuyang-deng · 2019-07-08T15:41:08Z

Hi @NoahDolev, thank you for using SageMaker! From the code you provided, it seems you want to train your model with train.py?

In order to use TensorFlow script mode to train your model (and then deploy), you want to start with the Tensorflow Estimator class: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L188

You either set script_mode=True or py_version="py3" to enable script mode.

NoahDolev · 2019-07-09T11:21:29Z

Hi @ChuyangDeng ,

I am not sure that has anything to do with the issue I posted. I am reporting to you that the docker image which SageMaker searches for by default is not correct for eu-west-1. Also, script_mode is not a valid flag of TensorFlowModel. This flag exists only in TensorFlow to the best of my knowledge.

Best,
Noah

chuyang-deng · 2019-07-09T17:19:24Z

Hi @NoahDolev,

Are you trying to do training or hosting here? Our TensorFlow script mode is only supported for training. And a TensoFlowModel class is for hosting, that's why the docker image uri is not correct (cannot be found).

If you are training your model, you should use TensorFlow estimator class so that you can train with our script mode image.

If you are deploying your trained model, you will use TensorFlowModel class, but no script mode is supported with deploying.

yuchuang1979 · 2019-07-10T05:07:44Z

@NoahDolev @ChuyangDeng I met the same error when I follow this link:
https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/
to deploy a pre-trained model in SageMaker with a different model. Since I am using py3 in my model, so I have to specify the image like this:

`sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
py_version='py3',
framework_version = '1.12',
entry_point = 'train.py')

predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.p2.xlarge')`

ValueError: Error hosting endpoint sagemaker-tensorflow-2019-07-10-05-06-02-075: Failed Reason: The image '520713654638.dkr.ecr.us-east-2.amazonaws.com/sagemaker-tensorflow:1.12-gpu-py3' does not exist.

When I delete py_version='py3' there is no error anymore.

NoahDolev · 2019-07-10T05:20:48Z

Hi @yuchuang1979 ,

Precisely what I am referring to. I am trying to deploy a model I trained elsewhere. You can also specify the image to solve the problem. My point, however, is that the default is pointing to the wrong docker image. It's a bug.

Best,
Noah

yuchuang1979 · 2019-07-10T05:36:07Z

@NoahDolev thanks for pointing out that there is another route by specifying the image. I am totally new to SageMaker and just began the work several days ago.

How could you create the image before specifying it in the function?

ChoiByungWook · 2019-07-10T20:58:14Z

Just some context.

There are two TensorFlow solutions that handle serving in the Python SDK.

They have different class representations and documentation as shown here.

TensorFlowModel - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/model.py#L47
Doc: https://github.com/aws/sagemaker-python-sdk/tree/v1.12.0/src/sagemaker/tensorflow#deploying-directly-from-model-artifacts
Key difference: Uses a proxy GRPC client to sent requests
Container impl: https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/serve.py
Model - https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L96
Doc: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
Key difference: Utilizes the TensorFlow serving rest API
Container impl: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/container/sagemaker/serve.py

Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object.

If you need Python 3 then you will need to use the Model object defined in #2 above. The inference script format will change if you need to handle pre and post processing. https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing.

Also your inference requests will need to follow the TFS rest API.
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#making-predictions-against-a-sagemaker-endpoint

Since you train externally you're going to need to make sure your model artifacts follow the correct format. https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#deploying-more-than-one-model-to-your-endpoint

Here is an example that does for the most part what you're trying to do. https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_serving_container/tensorflow_serving_container.ipynb

Sorry for the confusion and wall of text and links. Please let me know if there is anything I can clarify.

Thanks!

yuchuang1979 · 2019-07-10T22:05:29Z

@ChoiByungWook This is quite clear. Thanks!

panfeng-hover · 2019-07-23T22:50:53Z

@ChoiByungWook Thanks for your introduction! I am wondering when will tf 1.14 be supported for serving?

I tried cpu, gpu and elastic ones, but it seems the corresponding images are all not available:

The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:1.14-cpu' does not exist.

The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:1.14-gpu' does not exist.

I used your second one:

from sagemaker import get_execution_role
from sagemaker.tensorflow.serving import Model
role = get_execution_role()

sagemaker_model = Model(model_data = 's3://sagemaker-hover/Models/zulu/tpu/model.tar.gz',
                        role = role,
                        framework_version='1.14')
predictor = sagemaker_model.deploy(initial_instance_count=1, 
                                   instance_type='ml.p2.xlarge',
                                   endpoint_name='test-001')

And also for the TensorFlowModel module, it seems it only supports until tf 1.12.

tomislavmitic2012 · 2020-01-21T19:46:11Z

We have to use the proxy server with circle to run this.

keelerh · 2020-04-13T14:08:06Z

Did the format for specifying images change after TensorFlow 2 support was added? Or are there just no pre-built images for TensorFlow frameworks 2.0 and 2.1? I get

UnexpectedStatusException: Error hosting endpoint sagemaker-tensorflow-2020-04-13-14-02-35-992: Failed. Reason:  The image '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow:2.1.0-cpu-py2' does not exist..

UnexpectedStatusException: Error hosting endpoint sagemaker-tensorflow-2020-04-13-14-02-35-992: Failed. Reason:  The image '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow:2.1.0-gpu-py2' does not exist..

UnexpectedStatusException: Error hosting endpoint sagemaker-tensorflow-2020-04-13-14-02-35-992: Failed. Reason:  The image '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow:2.1.0-cpu-py3' does not exist..

UnexpectedStatusException: Error hosting endpoint sagemaker-tensorflow-2020-04-13-14-02-35-992: Failed. Reason:  The image '520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tensorflow:2.1.0-gpu-py3' does not exist..

When trying to specify

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = '2.1.0',
                                  entry_point = 'train.py')

in the sample notebook available at https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/.

ratulray · 2020-05-06T17:33:16Z

@ChoiByungWook The container implementation code locations given above (for TensorflowModel & Model) are outdated. Can you please point to the current implementations?

laurenyu · 2020-05-06T18:27:21Z

@keelerh @ratulray I believe the class you're looking for is sagemaker.tensorflow.serving.Model (the second one that @ChoiByungWook mentioned): https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html#tensorflow-serving-model. That class should retrieve the correct image URI for the TF 2.x images.

if you have any further questions, please open a new issue (it'll help with our internal tracking)

ratulray · 2020-05-07T18:12:38Z

Thanks Lauren for your response. Actually my question was not that. I opened a new issue #1472

abdelhamidnouh · 2021-02-03T13:07:46Z

What should i do?

laurenyu · 2021-02-03T14:01:04Z

@abdelhamednouh you're commenting on an old, closed issue with an unrelated error message - can you open a new issue?

ChoiByungWook added type: documentation type: question labels Jul 10, 2019

panfeng-hover mentioned this issue Aug 1, 2019

Support for tensorflow 1.14? aws/sagemaker-tensorflow-training-toolkit#226

Closed

jesterhazy closed this as completed Aug 13, 2019

ratulray mentioned this issue May 7, 2020

TensorflowModel & gRPC #1472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflowmodel points to images that do not exist #912

Tensorflowmodel points to images that do not exist #912

NoahDolev commented Jul 7, 2019 •

edited

Loading

chuyang-deng commented Jul 8, 2019

NoahDolev commented Jul 9, 2019

chuyang-deng commented Jul 9, 2019

yuchuang1979 commented Jul 10, 2019 •

edited

Loading

NoahDolev commented Jul 10, 2019

yuchuang1979 commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

yuchuang1979 commented Jul 10, 2019

panfeng-hover commented Jul 23, 2019 •

edited

Loading

tomislavmitic2012 commented Jan 21, 2020

keelerh commented Apr 13, 2020 •

edited

Loading

ratulray commented May 6, 2020

laurenyu commented May 6, 2020

ratulray commented May 7, 2020

abdelhamidnouh commented Feb 3, 2021

laurenyu commented Feb 3, 2021

Tensorflowmodel points to images that do not exist #912

Tensorflowmodel points to images that do not exist #912

Comments

NoahDolev commented Jul 7, 2019 • edited Loading

System Information

Describe the problem

chuyang-deng commented Jul 8, 2019

NoahDolev commented Jul 9, 2019

chuyang-deng commented Jul 9, 2019

yuchuang1979 commented Jul 10, 2019 • edited Loading

NoahDolev commented Jul 10, 2019

yuchuang1979 commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

yuchuang1979 commented Jul 10, 2019

panfeng-hover commented Jul 23, 2019 • edited Loading

tomislavmitic2012 commented Jan 21, 2020

keelerh commented Apr 13, 2020 • edited Loading

ratulray commented May 6, 2020

laurenyu commented May 6, 2020

ratulray commented May 7, 2020

abdelhamidnouh commented Feb 3, 2021

laurenyu commented Feb 3, 2021

NoahDolev commented Jul 7, 2019 •

edited

Loading

yuchuang1979 commented Jul 10, 2019 •

edited

Loading

panfeng-hover commented Jul 23, 2019 •

edited

Loading

keelerh commented Apr 13, 2020 •

edited

Loading