Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-pack/metricbeat/module/openai: Add new module #41516

Merged
merged 49 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
0e1175c
x-pack/metricbeat/module/openai: Add new module
shmsr Nov 4, 2024
bf45bf6
Merge branch 'main' into openai-metrics
shmsr Nov 4, 2024
8725cbb
update module
shmsr Nov 5, 2024
03995f6
update module
shmsr Nov 5, 2024
406c1a4
update module
shmsr Nov 5, 2024
a3c46b7
update module
shmsr Nov 5, 2024
72488f7
update module
shmsr Nov 5, 2024
b5c750a
Merge branch 'main' into openai-metrics
shmsr Nov 5, 2024
2d4027d
fix bug
shmsr Nov 5, 2024
e3f8fe2
update module
shmsr Nov 6, 2024
82cd979
Merge branch 'main' into openai-metrics
shmsr Nov 12, 2024
be628fe
Merge branch 'main' into openai-metrics
shmsr Nov 25, 2024
3259612
Address review comments
shmsr Nov 25, 2024
40b1131
Address review comments
shmsr Nov 25, 2024
2936540
Address review comments
shmsr Nov 25, 2024
850fc43
Merge branch 'main' into openai-metrics
shmsr Nov 25, 2024
5061e46
Address review comments
shmsr Nov 25, 2024
34a7128
Improvements
shmsr Nov 25, 2024
7f4cc75
Fix *.yml
shmsr Nov 26, 2024
7919c81
Address review comments
shmsr Nov 26, 2024
2a4bb56
Improvements
shmsr Nov 26, 2024
4899f3a
Merge branch 'main' into openai-metrics
shmsr Nov 26, 2024
217f7d5
Make linter happy
shmsr Nov 26, 2024
b5dbfb0
Address review comments
shmsr Nov 27, 2024
492eca1
gofumpt'ed
shmsr Nov 28, 2024
413010f
Address review comments
shmsr Nov 28, 2024
92f5062
Address review comments
shmsr Nov 28, 2024
fdb81c9
Merge branch 'main' into openai-metrics
shmsr Nov 28, 2024
78bbd11
Merge branch 'main' into openai-metrics
shmsr Nov 29, 2024
9c28e4d
Address review comments
shmsr Dec 4, 2024
f6067b8
Add more dummy data
shmsr Dec 4, 2024
ab98369
Better prealloc
shmsr Dec 4, 2024
1201845
Address review comments
shmsr Dec 8, 2024
90c2680
make update
shmsr Dec 9, 2024
63703d4
Merge branch 'main' into openai-metrics
shmsr Dec 9, 2024
e1ea8d2
Include OpenAI with Agentbeat
shmsr Dec 9, 2024
e907d98
More changes
shmsr Dec 9, 2024
942454d
make check
shmsr Dec 9, 2024
be9b5f2
make update
shmsr Dec 10, 2024
aff36b5
nitpick
shmsr Dec 10, 2024
549f26e
Logging and other basic improvements
shmsr Dec 10, 2024
e42472c
Merge branch 'main' into openai-metrics
shmsr Dec 10, 2024
33daed7
Update metricbeat/docs/modules/openai.asciidoc
shmsr Dec 12, 2024
33b7267
Merge branch 'main' into openai-metrics
shmsr Dec 12, 2024
2b836c2
Fix docs
shmsr Dec 12, 2024
32a9593
Update CODEOWNER
shmsr Dec 12, 2024
9698cac
Merge branch 'main' into openai-metrics
shmsr Dec 12, 2024
ae85fa8
Make linter happy
shmsr Dec 13, 2024
0b970bd
Merge branch 'main' into openai-metrics
shmsr Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
619 changes: 619 additions & 0 deletions metricbeat/docs/fields.asciidoc

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions metricbeat/docs/modules/openai.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
////
This file is generated! See scripts/mage/docs_collector.go
////

:modulename: openai
:edit_url: https://github.com/elastic/beats/edit/main/x-pack/metricbeat/module/openai/_meta/docs.asciidoc


[[metricbeat-module-openai]]
[role="xpack"]
== openai module

beta[]

This is the openai module.



:edit_url:

[float]
=== Example configuration

The openai module supports the standard configuration options that are described
in <<configuration-metricbeat>>. Here is an example configuration:

[source,yaml]
----
metricbeat.modules:
- module: openai
metricsets: ["usage"]
enabled: false
period: 1h

# # Project API Keys - Multiple API keys can be specified for different projects
# api_keys:
# - key: "api_key1"
# - key: "api_key2"

# # API Configuration
# ## Base URL for the OpenAI usage API endpoint
# api_url: "https://api.openai.com/v1/usage"
# ## Custom headers to be included in API requests
# headers:
# - "k1: v1"
# - "k2: v2"
## Rate Limiting Configuration
# rate_limit:
# limit: 60 # requests per second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to be changes to 12 as well ?
Why have we changed the limit from 60 to 12 ?
I thought 60 was the agreed upon limit ?

Copy link
Member Author

@shmsr shmsr Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing everything from scratch today and that too thoroughly. Noticed a slower rate of firing of requests. So, understanding of limit and burst was confusing and I did put incorrect values there which I have corrected now.

This part of the doc needs to be updated with make update; I will run that. Rest all doc files are updated.

The rate limiter works as follows:

limit: 12 means one request every 12 seconds (60 seconds / 5 requests = 12 seconds per request)
burst: 1 means only 1 request can be made in burst
This ensures you never exceed 5 requests per minute

So nothing changed. It's just that it wasn't configured properly by default. Rate limit is still 5 req/ min as per OpenAI.

# burst: 5 # burst size
# ## Request timeout duration
# timeout: 30s

# # Data Collection Configuration
# collection:
# ## Number of days to look back when collecting usage data
# lookback_days: 30
# ## Whether to collect usage data in realtime. Defaults to false as how
# # OpenAI usage data is collected will end up adding duplicate data to ES
# # and also making it harder to do analytics. Best approach is to avoid
# # realtime collection and collect only upto last day (in UTC). So, there's
# # at most 24h delay.
# realtime: false----
shmsr marked this conversation as resolved.
Show resolved Hide resolved

[float]
=== Metricsets

The following metricsets are available:

* <<metricbeat-metricset-openai-usage,usage>>

include::openai/usage.asciidoc[]

:edit_url!:
29 changes: 29 additions & 0 deletions metricbeat/docs/modules/openai/usage.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
////
This file is generated! See scripts/mage/docs_collector.go
////
:edit_url: https://github.com/elastic/beats/edit/main/x-pack/metricbeat/module/openai/usage/_meta/docs.asciidoc


[[metricbeat-metricset-openai-usage]]
[role="xpack"]
=== openai usage metricset

beta[]

include::../../../../x-pack/metricbeat/module/openai/usage/_meta/docs.asciidoc[]


:edit_url:

==== Fields

For a description of each field in the metricset, see the
<<exported-fields-openai,exported fields>> section.

Here is an example document generated by this metricset:

[source,json]
----
include::../../../../x-pack/metricbeat/module/openai/usage/_meta/data.json[]
----
:edit_url!:
3 changes: 3 additions & 0 deletions metricbeat/docs/modules_list.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,8 @@ This file is generated! See scripts/mage/docs_collector.go
|<<metricbeat-metricset-nats-subscriptions,subscriptions>>
|<<metricbeat-module-nginx,Nginx>> |image:./images/icon-yes.png[Prebuilt dashboards are available] |
.1+| .1+| |<<metricbeat-metricset-nginx-stubstatus,stubstatus>>
|<<metricbeat-module-openai,openai>> beta[] |image:./images/icon-no.png[No prebuilt dashboards] |
.1+| .1+| |<<metricbeat-metricset-openai-usage,usage>> beta[]
|<<metricbeat-module-openmetrics,Openmetrics>> beta[] |image:./images/icon-no.png[No prebuilt dashboards] |
.1+| .1+| |<<metricbeat-metricset-openmetrics-collector,collector>> beta[]
|<<metricbeat-module-oracle,Oracle>> |image:./images/icon-yes.png[Prebuilt dashboards are available] |
Expand Down Expand Up @@ -381,6 +383,7 @@ include::modules/munin.asciidoc[]
include::modules/mysql.asciidoc[]
include::modules/nats.asciidoc[]
include::modules/nginx.asciidoc[]
include::modules/openai.asciidoc[]
include::modules/openmetrics.asciidoc[]
include::modules/oracle.asciidoc[]
include::modules/panw.asciidoc[]
Expand Down
2 changes: 2 additions & 0 deletions x-pack/metricbeat/include/list.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions x-pack/metricbeat/metricbeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,41 @@ metricbeat.modules:
# Path to server status. Default nginx_status
server_status_path: "nginx_status"

#-------------------------------- Openai Module --------------------------------
- module: openai
metricsets: ["usage"]
enabled: false
period: 1h

# # Project API Keys - Multiple API keys can be specified for different projects
# api_keys:
# - key: "api_key1"
# - key: "api_key2"

# # API Configuration
# ## Base URL for the OpenAI usage API endpoint
# api_url: "https://api.openai.com/v1/usage"
# ## Custom headers to be included in API requests
# headers:
# - "k1: v1"
# - "k2: v2"
## Rate Limiting Configuration
# rate_limit:
# limit: 60 # requests per second
# burst: 5 # burst size
# ## Request timeout duration
# timeout: 30s

# # Data Collection Configuration
# collection:
# ## Number of days to look back when collecting usage data
# lookback_days: 30
# ## Whether to collect usage data in realtime. Defaults to false as how
# # OpenAI usage data is collected will end up adding duplicate data to ES
# # and also making it harder to do analytics. Best approach is to avoid
# # realtime collection and collect only upto last day (in UTC). So, there's
# # at most 24h delay.
# realtime: false
#----------------------------- Openmetrics Module -----------------------------
- module: openmetrics
metricsets: ['collector']
Expand Down
34 changes: 34 additions & 0 deletions x-pack/metricbeat/module/openai/_meta/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
- module: openai
metricsets: ["usage"]
enabled: false
period: 1h

# # Project API Keys - Multiple API keys can be specified for different projects
# api_keys:
# - key: "api_key1"
# - key: "api_key2"

# # API Configuration
# ## Base URL for the OpenAI usage API endpoint
# api_url: "https://api.openai.com/v1/usage"
# ## Custom headers to be included in API requests
# headers:
# - "k1: v1"
# - "k2: v2"
## Rate Limiting Configuration
# rate_limit:
# limit: 60 # requests per second
# burst: 5 # burst size
# ## Request timeout duration
# timeout: 30s

# # Data Collection Configuration
# collection:
# ## Number of days to look back when collecting usage data
# lookback_days: 30
# ## Whether to collect usage data in realtime. Defaults to false as how
# # OpenAI usage data is collected will end up adding duplicate data to ES
# # and also making it harder to do analytics. Best approach is to avoid
# # realtime collection and collect only upto last day (in UTC). So, there's
# # at most 24h delay.
# realtime: false
2 changes: 2 additions & 0 deletions x-pack/metricbeat/module/openai/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This is the openai module.

10 changes: 10 additions & 0 deletions x-pack/metricbeat/module/openai/_meta/fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
- key: openai
title: "openai"
release: beta
description: >
openai module
fields:
- name: openai
type: group
description: >
fields:
6 changes: 6 additions & 0 deletions x-pack/metricbeat/module/openai/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

// Package openai is a Metricbeat module that contains MetricSets.
package openai
23 changes: 23 additions & 0 deletions x-pack/metricbeat/module/openai/fields.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 38 additions & 0 deletions x-pack/metricbeat/module/openai/usage/_meta/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"@timestamp": "2017-10-12T08:05:34.853Z",
"event": {
"dataset": "openai.usage",
"duration": 115000,
"module": "openai"
},
"metricset": {
"name": "usage",
"period": 10000
},
"openai": {
"usage": {
"data": {
"aggregation_timestamp": "2024-11-04T05:01:00Z",
"api_key_id": null,
"api_key_name": null,
"api_key_redacted": null,
"api_key_type": null,
"email": null,
"n_cached_context_tokens_total": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmsr , @ishleenk17 - Are we okay with the field names? IMO, Updating the field names something similar to
n_context_tokens_total -> tokens.total or tokens_total in the context of usability. But still it is also good to keep as it is to relate the ES fields with the openai API metrics. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should keep fields more readable and inline with other LLM Integrations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to keep it like this here and change the names in ingest pipelines in integrations. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a parity between beats and Integrations, then ideally we should have similar field names

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't want beats to be used right. So felt that ingest pipelines in integrations are also a good idea. But yes, no strong opinion. Doesn't matter much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we try to keep the fields in beats module and Integrtaions same.
Here if particularly we are sure we are never going to use beats module. It should be ok.
@lalit-satapathy : WDYT ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if we are following that for other modules, let's do it here. I will rename the fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing the name change of the fields ?
Then this sample json should also be updated along with the field name change

Copy link
Member Author

@shmsr shmsr Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the names based on fields names from Azure OpenAI Elastic Integration docs. Although only a few fields were similar. But yes, I have changed the field names.

"n_context_tokens_total": 118,
"n_generated_tokens_total": 35,
"n_requests": 1,
"operation": "completion-realtime",
"organization_id": "org-dummy",
"organization_name": "Personal",
"project_id": null,
"project_name": null,
"request_type": "",
"snapshot_id": "gpt-4o-realtime-preview-2024-10-01"
}
}
},
"service": {
"type": "openai"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is the usage metricset of the module openai.
Loading
Loading