-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Unable to utilize multiple instances in sagemaker batch transform request #3134
Comments
I ran into this same problem and I'm really surprised that the behavior is like this where it assigns a whole file to a host. This could also cause more subtle performance issues like if you have some files that are much larger than others it won't be immediately obvious there's an issue because the other hosts will still be doing some work. When I split up the files one per host it did work as expected for me though, but as was said in the previous thread the sharding should happen at a record/batch level not a file level |
@grantdelozier , could you clarify why this was closed? I am running into the same behavior: only one instance being used, even when the number of input files greatly exceeds the number of instances. |
@jholmes-godaddy it looks like this is a feature not a bug. The solution is to split your input file into multiple pieces, though it seems that @grantdelozier also had trouble with that route. |
The short answer to why I closed this issue is that it stopped happening to me. I deleted and re-created my sagemaker model artifact, rebuilt my inference container on ECR, and double triple checked that my batch inference invocation parameters were correct, confirming through the Sagmaker batch inference management UI that I had given parameters and arguments correctly. After doing this, everything started working as expected when i specified an instancecount > 1. So I guess I had simply misconfigured something. I would encourage others struggling with this issue to go through the whole process of creating the sagemaker model, ECR image, and batch transform to double verify that everything has been set up correctly. |
can you share work flow how you did it? |
Describe the bug
Throughout sagemaker batch transform documentation it is suggested that multiple instances can be utilized to fulfill inference requests. API documentation for createTransformJob accepts a parameter called
InstanceCount
However whenever I create transform jobs which utilize more than one instance, only one instance is actually utilized for fulfilling inferences. I can see through logs that multiple instances are started, but only one instance is used to fulfill requests.
It looks as though someone else noticed this previously, but the issue was closed without being resolved. In this thread @djarpin suggests that multiple instances will be utilized if multiple input files are utilized. However, this still doesn't seem to work either. If you include a folder with multiple files in your
TransformInput
argument, it will utilize both files but still only send all invocations to a single instance.To reproduce
Invoke a
createTransformJob()
request withwhere instanceCount > 1. In cloudwatch observe that all invocations are sent to a single instance while all other instance sit idle.
Here is a full list of the parameters I include in my
createTransformJob()
requestThe text was updated successfully, but these errors were encountered: