x-pack/metricbeat/module/openai: Add new module #41516

shmsr · 2024-11-04T20:26:34Z

Proposed commit message

Implement a new module for OpenAI usage collection. This module operates on https://api.openai.com/v1/usage (by default; also configurable for Proxy URLs, etc.) and collects the limited set of usage metrics emitted from the undocumented endpoint.

Example how the usage endpoints emits metrics:

Given timestamps t0, t1, t2, ... tn in ascending order:

At t0 (first collection):

   usage_metrics_1: *

At t1 (after new API usage):

   usage_metrics_1: *
   usage_metrics_2: *

At t2 (continuous collection):

   usage_metrics_1: *
   usage_metrics_2: *
   usage_metrics_3: *

and so on.

Example response:

{
  "object": "list",
  "data": [
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "aggregation_timestamp": 1725389580,
      "n_requests": 1,
      "operation": "completion",
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "n_context_tokens_total": 62,
      "n_generated_tokens_total": 21,
      "email": null,
      "api_key_id": null,
      "api_key_name": null,
      "api_key_redacted": null,
      "api_key_type": null,
      "project_id": null,
      "project_name": null,
      "request_type": ""
    },
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "aggregation_timestamp": 1725389640,
      "n_requests": 1,
      "operation": "completion",
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "n_context_tokens_total": 97,
      "n_generated_tokens_total": 17,
      "email": null,
      "api_key_id": null,
      "api_key_name": null,
      "api_key_redacted": null,
      "api_key_type": null,
      "project_id": null,
      "project_name": null,
      "request_type": ""
    }
  ],
  "tpm_data": [
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "day_timestamp": 1725321600,
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "operation": "completion",
      "p90_context_tpm": 97,
      "p90_generated_tpm": 21,
      "p90_provisioned_context_tpm": 0,
      "p90_provisioned_generated_tpm": 0,
      "max_context_tpm": 97,
      "max_generated_tpm": 21,
      "max_provisioned_context_tpm": 0,
      "max_provisioned_generated_tpm": 0
    }
  ],
  "ft_data": [],
  "dalle_api_data": [],
  "whisper_api_data": [],
  "tts_api_data": [],
  "assistant_code_interpreter_data": [],
  "retrieval_storage_data": []
}

As soon as the API is used, usage is generated after a few times. So, if collecting using the module real-time and that too multiple times of the day, it would collect duplicates and it is not good for storage as well as analytics of the usage data.

It's better to collect time.Now() (in UTC) - 24h so that we get full usage collection of the past day (in UTC) and it avoids duplication. So that's why I have introduced a config realtime and set it to false as the collection is 24h delayed; we are now getting daily data. realtime: true will work as any other normal collection where metrics are fetched in set intervals. Our recommendation is to keep realtime: false.

As this is a metricbeat module, we do not have existing package that gives us support to store the cursor. So, in order to avoid pulling already pulled data, timestamps are being stored per API key. Logic for the same is commented in the code on how it is stored. We are using a new custom code to store the state in order to store the cursor and begin from the next available date.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

Check the state store
Validate with usage dashboard of OpenAI

How to test this PR locally

Run metricbeat and use your OpenAI's API key to collect usage metrics.

mergify · 2024-11-04T20:27:13Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @shmsr? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-11-04T20:27:13Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

shmsr · 2024-11-05T17:47:25Z

Getting hit by this error: #41174 (comment) and hence the CI is failing. Rest all okay.

shmsr · 2024-11-06T09:38:51Z

To continue with my testing and to avoid: Limit of total fields [10000] has been exceeded error until it is fixed for 8.15.x and older, I am using: setup.template.fields where I specify a new field that only contains the ecs fields and openai fields from fields.yml and nothing else.

See this: https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-template.html

So, this has currently unblocked me but yes we definitely need a fix for this.

elasticmachine · 2024-11-12T07:32:31Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

shmsr · 2024-11-12T09:11:52Z

I've explain the complicated collection mechanism in the PR description itself. Rest is self-explanatory from the code. Please let me know, if anything needs further clarification.

x-pack/metricbeat/module/openai/usage/_meta/fields.yml

x-pack/metricbeat/module/openai/usage/client.go

x-pack/metricbeat/module/openai/usage/config.go

x-pack/metricbeat/module/openai/usage/usage.go

x-pack/metricbeat/module/openai/usage/usage_integration_test.go

AndersonQ · 2024-11-13T14:40:15Z

x-pack/metricbeat/module/openai/usage/usage_integration_test.go

+    }
+  ],
+  "ft_data": [],
+  "dalle_api_data": [],


It'd be good to have data for each data set

Yeah, I tried generating ft (fine-tuning) data but it doesn't seem to work. As OpenAI provides this API as undocumented, I couldn't find a single source with any samples. Not even sure they even populate for the response of this particular endpoint. For dalle_api_data, I'll add.

x-pack/metricbeat/module/openai/usage/usage_integration_test.go

ishleenk17 · 2024-12-10T05:56:28Z

metricbeat/docs/modules/openai.asciidoc

+  # - "k2: v2"
+  ## Rate Limiting Configuration
+  # rate_limit:
+  #   limit: 60 # requests per second


Is this to be changes to 12 as well ?
Why have we changed the limit from 60 to 12 ?
I thought 60 was the agreed upon limit ?

I was testing everything from scratch today and that too thoroughly. Noticed a slower rate of firing of requests. So, understanding of limit and burst was confusing and I did put incorrect values there which I have corrected now.

This part of the doc needs to be updated with make update; I will run that. Rest all doc files are updated.

The rate limiter works as follows: limit: 12 means one request every 12 seconds (60 seconds / 5 requests = 12 seconds per request) burst: 1 means only 1 request can be made in burst This ensures you never exceed 5 requests per minute

So nothing changed. It's just that it wasn't configured properly by default. Rate limit is still 5 req/ min as per OpenAI.

shmsr · 2024-12-10T09:19:07Z

I hope all major review comments I've addressed. Now I'll begin a thorough testing to check:

Data is matching with OpenAI's dashboard.
Cursor management is working fine (across metricbeat restarts)
Similar viz. can be created like OpenAI's own dashboard
Any other bug / enhancement that I might've to incorporate

Thanks to all the reviewers!

shmsr · 2024-12-10T19:37:52Z

So far with testing everything looks good. I did run it for a few hours today and collected all my limited OpenAI API usage over 4 months period. So far the data has matched and also found a case where OpenAI's own usage dashboard doesn't show a specific data although it present in the JSON when we hit the usage API. But in our dashboard, it shows perfectly which is good thing.

Here's a basic sample dashboard which has panels similar to that of OpenAI's usage dashboard.

shmsr · 2024-12-10T19:38:09Z

I think we are ready to merge now unless there are more comments.

shmsr · 2024-12-11T09:42:46Z

@ishleenk17 / @devamanv Let me know if you have any comments? Also @muthu-mps do you have any comments wrt Azure OpenAI vs this?

metricbeat/docs/modules/openai.asciidoc

Co-authored-by: Brandon Morelli <[email protected]>

bmorelli25 · 2024-12-12T07:20:43Z

run docs-build

ishleenk17

Changes look good.
CI passes, then we are GTG!

shmsr · 2024-12-12T08:56:58Z

Updated the CODEOWNERS too.

cc: @lalit-satapathy Can you please approve as well?

lalit-satapathy

LGTM codeowner changes.

(cherry picked from commit 93b018a)

x-pack/metricbeat/module/openai: Add new module

0e1175c

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 4, 2024

mergify bot assigned shmsr Nov 4, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 4, 2024

shmsr added Module: openai Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Nov 4, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 4, 2024

shmsr added 2 commits November 5, 2024 01:59

Merge branch 'main' into openai-metrics

bf45bf6

update module

8725cbb

shmsr requested a review from a team November 5, 2024 07:51

shmsr added 6 commits November 5, 2024 17:35

update module

03995f6

update module

406c1a4

update module

a3c46b7

update module

72488f7

Merge branch 'main' into openai-metrics

b5c750a

fix bug

2d4027d

update module

e3f8fe2

shmsr marked this pull request as ready for review November 12, 2024 07:28

shmsr requested a review from a team as a code owner November 12, 2024 07:28

shmsr requested review from AndersonQ and belimawr November 12, 2024 07:28

pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 12, 2024

Merge branch 'main' into openai-metrics

82cd979

AndersonQ reviewed Nov 13, 2024

View reviewed changes

make check

942454d

ishleenk17 reviewed Dec 10, 2024

View reviewed changes

shmsr added 2 commits December 10, 2024 11:35

make update

be9b5f2

nitpick

aff36b5

Logging and other basic improvements

549f26e

shmsr force-pushed the openai-metrics branch from e2cdc77 to 549f26e Compare December 10, 2024 19:46

Merge branch 'main' into openai-metrics

e42472c

shmsr requested review from ishleenk17 and muthu-mps December 10, 2024 19:47

bmorelli25 reviewed Dec 12, 2024

View reviewed changes

metricbeat/docs/modules/openai.asciidoc Outdated Show resolved Hide resolved

shmsr and others added 2 commits December 12, 2024 12:23

Update metricbeat/docs/modules/openai.asciidoc

33daed7

Co-authored-by: Brandon Morelli <[email protected]>

Merge branch 'main' into openai-metrics

33b7267

Fix docs

2b836c2

ishleenk17 approved these changes Dec 12, 2024

View reviewed changes

Update CODEOWNER

32a9593

shmsr requested a review from a team as a code owner December 12, 2024 08:56

Merge branch 'main' into openai-metrics

9698cac

devamanv approved these changes Dec 13, 2024

View reviewed changes

lalit-satapathy approved these changes Dec 13, 2024

View reviewed changes

shmsr added 2 commits December 13, 2024 13:04

Make linter happy

ae85fa8

Merge branch 'main' into openai-metrics

0b970bd

shmsr merged commit 93b018a into elastic:main Dec 13, 2024
32 checks passed

mergify bot pushed a commit that referenced this pull request Dec 13, 2024

x-pack/metricbeat/module/openai: Add new module (#41516)

62c0417

(cherry picked from commit 93b018a)

mergify bot mentioned this pull request Dec 13, 2024

[8.x](backport #41516) x-pack/metricbeat/module/openai: Add new module #42033

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x-pack/metricbeat/module/openai: Add new module #41516

x-pack/metricbeat/module/openai: Add new module #41516

shmsr commented Nov 4, 2024 •

edited

Loading

mergify bot commented Nov 4, 2024

mergify bot commented Nov 4, 2024

shmsr commented Nov 5, 2024 •

edited

Loading

shmsr commented Nov 6, 2024

elasticmachine commented Nov 12, 2024

shmsr commented Nov 12, 2024

AndersonQ Nov 13, 2024

shmsr Nov 25, 2024

ishleenk17 Dec 10, 2024

shmsr Dec 10, 2024 •

edited

Loading

shmsr commented Dec 10, 2024

shmsr commented Dec 10, 2024 •

edited

Loading

shmsr commented Dec 10, 2024

shmsr commented Dec 11, 2024

bmorelli25 commented Dec 12, 2024

ishleenk17 left a comment

shmsr commented Dec 12, 2024 •

edited

Loading

lalit-satapathy left a comment

x-pack/metricbeat/module/openai: Add new module #41516

x-pack/metricbeat/module/openai: Add new module #41516

Conversation

shmsr commented Nov 4, 2024 • edited Loading

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

mergify bot commented Nov 4, 2024

mergify bot commented Nov 4, 2024

shmsr commented Nov 5, 2024 • edited Loading

shmsr commented Nov 6, 2024

elasticmachine commented Nov 12, 2024

shmsr commented Nov 12, 2024

AndersonQ Nov 13, 2024

Choose a reason for hiding this comment

shmsr Nov 25, 2024

Choose a reason for hiding this comment

ishleenk17 Dec 10, 2024

Choose a reason for hiding this comment

shmsr Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

shmsr commented Dec 10, 2024

shmsr commented Dec 10, 2024 • edited Loading

shmsr commented Dec 10, 2024

shmsr commented Dec 11, 2024

bmorelli25 commented Dec 12, 2024

ishleenk17 left a comment

Choose a reason for hiding this comment

shmsr commented Dec 12, 2024 • edited Loading

lalit-satapathy left a comment

Choose a reason for hiding this comment

shmsr commented Nov 4, 2024 •

edited

Loading

shmsr commented Nov 5, 2024 •

edited

Loading

shmsr Dec 10, 2024 •

edited

Loading

shmsr commented Dec 10, 2024 •

edited

Loading

shmsr commented Dec 12, 2024 •

edited

Loading