-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] BWC Rolling Upgrade Tests are failing on neural search due to changes in MLPredictionTaskRequest Object #1838
Comments
I have discussed the issue with @reta . |
I am still writing the test cases now, since in previous we only used built in function to test and we need to test neural models this time it’s actually like writing from very beginning. |
Can confirm bug reproduced in manually setup cluster when sending predict request to 2.11 cluster with 2.9 master node:
|
In the meantime we noticed that pretrained model registration tasks also failed from 2.9 to 2.11
|
Stack trace for the error:
|
After fixing, the predict still return the exception:
|
It seems this is a change from 2.9.0 to 2.9.1 (Not release) at this line: |
PR is merged and I manually did the sanity test on local neural models on different version, looks good to me, but when will this update be applied to neural search team's CI workflow still seems uncleared to me. |
When testing remote model:
|
|
@b4sjoo Can you share what is the current update on this bug? |
@vibrantvarun Right now with a new master node (>= 2.11) and an old data node (<= 2.10), it seems remote model cannot be deployed in this case, except for this issue our bwc looks good. |
@vibrantvarun Quick notes: there are backward compatibility issue in ml plugin 2.9 and 2.10 towards 2.11 or later version. There was PR fixed this issue recently but since 2.9.0/2.10.0 release has already cutoff and no patch version has been released, they are still using old problematics source code to build. Accordingly, since neural search team's bwc test workflow use them as test artifacts, it will failed anyways. |
What is the bug?
The BWC Rolling upgrade tests are failing in the neural search because of recent changes in ML-Commons.
So the scenario is when the bwc version is 2.10 and current version is 2.12.0-SNAPSHOT
The tests fails with a reason
It is because there is one more constructor for MLPredictionTaskRequest has been added in the PR mentioned above.
In BWC tests when cluster has 3 nodes and 2 of them are running on 2.10 and one node gets upgraded to 2.12.0-SNAPSHOT. Now
when neural search calls predict api then while initializing the MLPredictionTaskRequest here it fails at this line with error mentioned above.
It is happening because it is either getting more parameters or no parameters for boolean while executing in.readBoolean().
The same is happening when the BWC tests are executed for 2.9 OS version.
The tests are running fine for 2.11 OS version.
But it makes this tests execution flaky.
Because if Out of 3 nodes if it uses the node which is running 2.11 version for calling predict api rather than 2.12.0-SNAPSHOT then it will face the same issue.
How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
The BWC rolling upgrade tests should run with all previous version with no error.
What is your host/environment?
The text was updated successfully, but these errors were encountered: