-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose dense vector iterator in painless #51964
Comments
Pinging @elastic/es-core-infra (:Core/Infra/Scripting) |
I would suggest NOT to expose dense vector iterator.
@stu-elastic I am wondering if you have any specific need to expose dense vector iterator? |
@mayya-sharipova this is based on a request from @honzakral. After chatting we thought an iterator was pretty lightweight. I'm trying to understand point 2, if we iterated over a vector and provided the values, what's the issue there? We can simply ignore the magnitude at the end. |
@honzakral I am interested to learn about a use case to access vector values directly.
For now depending on index version it is just magnitude at the end. But later we may add more metadata. Were you planning to iterate over Binary DocValues as it would be tricky for a user to decode these docvalues? |
That's what we were thinking, allow users to get their data back. |
The use case we encountered with is the ability to access data in order, just as #49695. The idea was to store historical records (price at a point of time) where we are then interested in deltas between two arbitrary points (price yesterday compared to price a year ago). Another use case that came up was a user implementing custom vector function in painless. |
@honzakral thanks, looks to be valid use cases to me.
@stu-elastic thanks, makes sense. We need to think how to implement it as |
Relevant request from another user of exposing vector functions in other painless contexts besides |
@mayya-sharipova Just wanted to check to see if you would still like this to be exposed. |
@jdconrad @mayya-sharipova You'll have to excuse me if this isn't on topic, but to my understanding the following example could be another use case to have access to the vector fields. I'm currently experiencing something weird in the following:
The result that is added to the array, is always the cosineSimilarity outcome of the first iteration, however if you log the xt[i], it's definitely looping through the vectors we pass as parameters. This is my best hunch so far: is this because the script is compiled, where the function is compiled without having the variable values injected? I just can't wrap my head around how I should be using these functions within a for/while loop :/ If I'd have access to the dense vector field, I could write my cosSim. function which might get rid of that problem...but I'm not sure about that one. Any pointers on this would be great! |
@coreation I see the problem. Thank you for reporting it. I can reproduce your problem with this example: PUT test_index
{
"mappings": {
"properties": {
"v": {
"type": "dense_vector",
"dims" : 3
}
}
}
}
POST test_index/_bulk
{ "index" : { "_id" : "1" } }
{"v" : [10, 10, 10]}
{ "index" : { "_id" : "2" } }
{"v" : [10, 20, 30]}
POST test_index/_explain/1
{
"query": {
"script_score" : {
"query" : {"match_all" : {}},
"script": {
"source": """
def aha =[];
def xt = params.filterVectors;
for (int i=0; i< xt.size(); i++)
{
aha.add(cosineSimilarity(xt[i], 'v'));
}
Debug.explain(aha);
""",
"params": {
"filterVectors": [[10, 10, 10], [10, 20, 30] ]
}
}
}
}
}
@jdconrad @stu-elastic The problem here is that the result of |
@coreation and @mayya-sharipova let's move this discussion to a new issue. You'll want to look at the implementation of the |
@stu-elastic @mayya-sharipova thanks for following up! I've already made a separate issue for this and it has been triaged :) The ticket is #70437 |
Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getVectorMagnitude – returns a vector's magnitude Closes elastic#51964
Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getMagnitude – returns a vector's magnitude Closes #51964
Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getMagnitude – returns a vector's magnitude Closes elastic#51964 Backport for elastic#71313
Painless does not have access to
dense_vector
at the moment except as an opaque value, users can pass it around but not access it's contents.We can expose the vector as an iterator, which will give access to the data without revealing the internal representation.
cc: @mayya-sharipova @jtibshirani
The text was updated successfully, but these errors were encountered: