Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deep nested map type configuration issue in text_embedding processor #686

Open
zane-neo opened this issue Apr 11, 2024 · 9 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@zane-neo
Copy link
Collaborator

What is the bug?

When configured with deep nested map type configuration in text_embedding processor, the embedding result will override the original value of document key:
pipeline configuration:

{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "qhO5xY4BYwgbtrHt7KDf",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}

And simulate the pipeline processor:

{
    "docs": [
        {
            "_index": "neural-search-index-v2",
            "_id": "1",
            "_source": {
                "category": [
                    {
                        "name": {
                            "en": "this is a name"
                        }
                    },
                    {
                        "name": {
                            "en": "hello world"
                        }
                    }
                ]
            }
        }
    ]
}

Result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": [
                                -0.10758455,
                                0.07971476,
                                -0.04948872,
                                ...
                            ]
                        },
                        {
                            "name": [
                                -0.034477253,
                                0.031023245,
                                0.006734962,
                                ...
                            ]
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

Expected result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": {
                                "category_name_vector": [
                                    -0.10758455,
                                    0.07971476,
                                    -0.04948872,
                                    ...
                                ],
                                "en": "this is a name"
                            }
                            
                        },
                        {
                            "name": {
                                "name": [
                                    -0.034477253,
                                    0.031023245,
                                    0.006734962,
                                    ...
                                ],
                                "en": "hello world"
                            }
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

How can one reproduce the bug?

Steps to reproduce the behavior.

What is the expected behavior?

The generated embedding results should be placed in the right position of the document.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

@krishy91
Copy link
Contributor

Hi, I'll look into this! Seems like a issue with nesting of depth of 2.

@zane-neo
Copy link
Collaborator Author

@krishy91 Since we're supporting list of map type, we don't want any limitation on this, e.g. supporting only depth of 2 or 3. We should consider to support deeply nested cases if possible.

@krishy91
Copy link
Contributor

krishy91 commented May 6, 2024

Could reproduce the issue. Will push the fix & additional integrration test for such deep nesting cases.

@naveentatikonda
Copy link
Member

Could reproduce the issue. Will push the fix & additional integrration test for such deep nesting cases.

@krishy91 Is there any update on the fix?

@jmazanec15
Copy link
Member

@zane-neo Is this still an issue? Can this be fixed?

@zane-neo
Copy link
Collaborator Author

zane-neo commented Oct 9, 2024

Yes, this is still an issue, it seems @krishy91 doesn't have bandwidth on this, I'll pick up this.

@yizheliu-amazon
Copy link
Contributor

yizheliu-amazon commented Dec 16, 2024

Hello,

When I was testing on top of this commit of main branch, I followed same process in here, but I got exception, instead of the overriding issue mentioned here:

How to reproduce:

  1. Create ingest pipeline
PUT /_ingest/pipeline/nlp-ingest-pipeline-v4

{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "H7tN0ZMBSNRyUB1vcEi2",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}
  1. Simulate
POST /_ingest/pipeline/nlp-ingest-pipeline-v4/_simulate

{
    "docs": [
        {
            "_index": "neural-search-index-v2",
            "_id": "1",
            "_source": {
                "category": [
                    {
                        "name": {
                            "en": "this is 1st name"
                        }
                    },
                    {
                        "name": {
                            "en": "this is 2nd name"
                        }
                    }
                ]
            }
        }
    ]
}
  1. Result
{
    "docs": [
        {
            "error": {
                "root_cause": [
                    {
                        "type": "class_cast_exception",
                        "reason": "class java.util.LinkedHashMap cannot be cast to class java.util.List (java.util.LinkedHashMap and java.util.List are in module java.base of loader 'bootstrap')"
                    }
                ],
                "type": "class_cast_exception",
                "reason": "class java.util.LinkedHashMap cannot be cast to class java.util.List (java.util.LinkedHashMap and java.util.List are in module java.base of loader 'bootstrap')"
            }
        }
    ]
}

Exception is thrown at this line

It seems to be introduced by PR #913 . I think we may close this issue and create a new one for the class_cast_exception. Please let me know if anyone has any concerns.

Thanks.

@heemin32
Copy link
Collaborator

Let's keep it open for now. Once the class_cast_exception issue is resolved, the original issue might resurface.

@yizheliu-amazon
Copy link
Contributor

Agree. I created issue #1024 to keep track of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants