Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Bert on Inferentia2 for text classification - Neuron container runtime errors [bug] #3895

Open
2 of 6 tasks
madhurprash opened this issue May 8, 2024 · 1 comment

Comments

@madhurprash
Copy link

madhurprash commented May 8, 2024

Checklist

Concise Description:
I tried deploying Bert on inf2, using the http://763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.36.2-neuronx-py310-sdk2.16.1-ubuntu20.04: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers

I tried running the container in this notebook linked here: https://github.com/huggingface/notebooks/blob/main/sagemaker/18_inferentia_inference/sagemaker-notebook.ipynb

This notebook uses inf1, but I had to use inf2, so had to make package updates, etc, and used the following versions in the HuggingFaceModel object:

huggingface_model = HuggingFaceModel(
   model_data=s3_model_uri,       # path to your model and script
   role=role,    
   transformers_version="4.36",  # transformers version used
   pytorch_version="1.13.1",        # pytorch version used
   py_version='py310',            # python version used
   image_uri = ecr_image,
)

Error

I get an error as following in the logs:

error 1:
Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
for this, I tried and running the error on older and newer versions, but still got this.

error 2:
WorkerLifeCycle - RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged

error 3:
WorkerLifeCycle - 2024-May-08 14:02:23.699765 64:64 ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:1 Available:0

i got the error above even through i am setting the os.env variable for nrt_allocate_neuron_cores before importing torch_neuron in the inference.py file.

However, interestingly, I am able to get inference from the model sometimes as follows:

[{'label': 'POSITIVE', 'score': 0.9998840093612671}]

Error (sometimes gives the predictions, sometimes it does not):

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",

DLC image/dockerfile:

Current behavior:
Throws an error (explained above in detail)

Expected behavior:
Bert to be deployed on an inf2 (with the latest container updates and optimum-neuron updates) example to be provided. A couple of other links that I tried:

Additional context:

@madhurprash madhurprash changed the title Testing Bert on Inferentia2[bug] Testing Bert on Inferentia2 for text classification - updated example needed [bug] May 8, 2024
@madhurprash madhurprash changed the title Testing Bert on Inferentia2 for text classification - updated example needed [bug] Testing Bert on Inferentia2 for text classification - Neuron container runtime errors [bug] May 9, 2024
@philschmid
Copy link
Contributor

@madhurprash the error

Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())

Indicates that the neuron-cc version you used to compile the model is different to the version, which is run in the container. We just added a new feature in the latest version that allows you to compile the model on start up to avoid this scenarios.

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'distilbert/distilbert-base-uncased-finetuned-sst-2-english',
	'HF_TASK':'text-classification',
	'HF_OPTIMUM_BATCH_SIZE': 1, # Batch size used to compile the model
	'HF_OPTIMUM_SEQUENCE_LENGTH': 512, # Sequence length used to compile the model
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.36.2',
	pytorch_version='2.1.2',
	py_version='py310',
	env=hub,
	role=role, 
)

# Let SageMaker know that we compile on startup
huggingface_model._is_compiled_model = True

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.inf2.xlarge' # ec2 instance type
)

predictor.predict({
	"inputs": "I like you. I love you",
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants