Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] [OpenAI] Malformed FunctionAppLogs documents found in azure_openai.logs dataset #9964

Closed
lucian-ioan opened this issue May 24, 2024 · 3 comments · Fixed by #10055
Closed
Assignees
Labels
bug Something isn't working, use only for issues Integration:azure Azure Logs

Comments

@lucian-ioan
Copy link
Contributor

In my analysis of Azure OpenAI logs, 21/91 documents contain malformed JSONs, breaking the processing of the fields and displaying the same error.message:

Error Message

The documents appear unrelated to Azure OpenAI (from FunctionAppLogs instead):
document

The Azure OpenAI metrics contain 1153 documents, all without the error.message field populated. This is likely because this issue is eventhub-specific and metrics use Azure Monitor.

Elasticsearch Query
GET ${exampleVariable1} // _search
{
  "size": 0,
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "data_stream.dataset": "azure.open_ai"
          }
        },
        {
          "term": {
            "data_stream.dataset": "azure_openai.logs"
          }
        }
      ]
    }
  },
  "aggs": {
    "datasets": {
      "terms": {
        "field": "data_stream.dataset",
        "size": 2
      },
      "aggs": {
        "error_message_exists": {
          "filter": {
            "exists": {
              "field": "error.message"
            }
          }
        },
        "error_message_not_exists": {
          "filter": {
            "bool": {
              "must_not": {
                "exists": {
                  "field": "error.message"
                }
              }
            }
          }
        }
      }
    }
  }
}

Possible fix

This issue can likely be fixed by enabling sanitization in the logs data stream.

Right now it's missing from the manifest.yml file. I will test it and open a PR to add and enable it by default.

Question

Why do the malformed JSONs that break the processing default to data_stream.dataset: azure_openai.logs in this case?

@lucian-ioan lucian-ioan added the Integration:azure Azure Logs label May 24, 2024
@lucian-ioan lucian-ioan self-assigned this May 24, 2024
@lucian-ioan lucian-ioan added the bug Something isn't working, use only for issues label May 24, 2024
@muthu-mps
Copy link
Contributor

@lucian-ioan - This logs are from the Azure function app. These logs are also getting pushed to the event-hub earlier for testing. I don't think we need to sanitise these unstructured logs which is not relevant to OpenAI service.

@lucian-ioan
Copy link
Contributor Author

lucian-ioan commented May 24, 2024

@muthu-mps the issue is the event.dataset and data_stream.type are the exact same as for normal OpenAI logs in these Function App Logs:

dataset

@muthu-mps
Copy link
Contributor

@muthu-mps the issue is the event.dataset and data_stream.type are the exact same as for normal OpenAI logs in these Function App Logs:

dataset

Seems like these logs are not getting dropped in the pipeline because of the errors. We can enable sanitisation and verify there is no malformed error logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, use only for issues Integration:azure Azure Logs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants