Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add openai batch API agent #2353

Merged
merged 62 commits into from
May 22, 2024
Merged

add openai batch API agent #2353

merged 62 commits into from
May 22, 2024

Conversation

samhita-alla
Copy link
Contributor

@samhita-alla samhita-alla commented Apr 16, 2024

Tracking issue

Fixes flyteorg/flyte#5274

Why are the changes needed?

The Flyte OpenAI batch API agent can be used to create a batch and poll the status of the OpenAI batch in real-time while remaining dormant. This means there's no need to incur costs when there's no work to be done. OpenAI's 50% model discount, as explained here, further enhances cost savings.

The agent seamlessly integrates with end-to-end Flyte workflows, making it especially useful when downstream tasks depend on batch processing jobs.

What changes were proposed in this pull request?

This PR adds an agent to support OpenAI batch inference. It also adds a JSONL type transformer, allowing users to specify the Iterator[JSON] type, which automatically handles the conversion between iterator and JSONL FlyteFile. The JSONL type can be used when sending input to the OpenAI batch API agent.

from typing import Iterator

from flytekit import workflow, Secret
from flytekit.types.file import JSONLFile
from flytekit.types.iterator import JSON
from flytekitplugins.openai import create_batch, BatchResult


def jsons():
    for x in [
        {
            "custom_id": "request-1",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-3.5-turbo",
                "messages": [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": "What is 2+2?"},
                ],
            },
        },
        {
            "custom_id": "request-2",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-3.5-turbo",
                "messages": [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": "Who won the world series in 2020?"},
                ],
            },
        },
    ]:
        yield x


it_batch = create_batch(
    name="gpt-3.5-turbo",
    openai_organization="your-org",
    secret=Secret(group="openai-secret", key="api-key"),
)

file_batch = create_batch(
    name="gpt-3.5-turbo",
    openai_organization="your-org",
    secret=Secret(group="openai-secret", key="api-key"),
    is_json_iterator=False,
)


@workflow
def json_iterator_wf(json_vals: Iterator[JSON] = jsons()) -> BatchResult:
    return it_batch(jsonl_in=json_vals)


@workflow
def jsonl_wf(jsonl_file: JSONLFile = "data.jsonl") -> BatchResult:
    return batch(jsonl_in=jsonl_file)

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

TODO:

After #2297 PR is merged,

  • Run test cases
  • Run a real-world example on the Flyte cluster with secrets configured

Copy link
Contributor

@kumare3 kumare3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome start. It's worth a blog!!

@samhita-alla samhita-alla changed the title add openai batch endpoint agent add openai batch API agent Apr 23, 2024
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
@samhita-alla samhita-alla marked this pull request as ready for review April 23, 2024 15:36
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 88.75000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 78.02%. Comparing base (70332db) to head (2152efc).
Report is 3 commits behind head on master.

Current head 2152efc differs from pull request most recent head 7191e39

Please upload reports for the commit 7191e39 to get more accurate results.

Files Patch % Lines
flytekit/types/iterator/json_iterator.py 93.10% 3 Missing and 1 partial ⚠️
flytekit/interaction/click_types.py 57.14% 2 Missing and 1 partial ⚠️
flytekit/core/type_engine.py 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2353      +/-   ##
==========================================
+ Coverage   72.10%   78.02%   +5.91%     
==========================================
  Files         181      182       +1     
  Lines       18395    18470      +75     
  Branches     3601     3609       +8     
==========================================
+ Hits        13264    14411    +1147     
+ Misses       4506     3458    -1048     
+ Partials      625      601      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question about JSONIterator

flytekit/types/iterator/json_iterator.py Show resolved Hide resolved
pingsutw
pingsutw previously approved these changes May 21, 2024
@samhita-alla samhita-alla merged commit fe0467c into master May 22, 2024
45 of 46 checks passed
austin362667 pushed a commit to austin362667/flytekit that referenced this pull request Jun 15, 2024
* add openai batch endpoint agent

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl type transformer, modify agent

Signed-off-by: Samhita Alla <[email protected]>

* revert format changes

Signed-off-by: Samhita Alla <[email protected]>

* revert iterator edits

Signed-off-by: Samhita Alla <[email protected]>

* typealias

Signed-off-by: Samhita Alla <[email protected]>

* add jsonlines

Signed-off-by: Samhita Alla <[email protected]>

* remove typealias

Signed-off-by: Samhita Alla <[email protected]>

* update readthedocs python version

Signed-off-by: Samhita Alla <[email protected]>

* update docs python version

Signed-off-by: Samhita Alla <[email protected]>

* replace dict with enum

Signed-off-by: Samhita Alla <[email protected]>

* modify json type; add a check to validate if iterator's empty

Signed-off-by: Samhita Alla <[email protected]>

* ignore mypy check

Signed-off-by: Samhita Alla <[email protected]>

* modify JSON type

Signed-off-by: Samhita Alla <[email protected]>

* move to openai folder

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* update setup.py

Signed-off-by: Samhita Alla <[email protected]>

* batch_api to batch, add plugin to setup.py

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* fix lint error

Signed-off-by: Samhita Alla <[email protected]>

* remove guess_python_type

Signed-off-by: Samhita Alla <[email protected]>

* modify json type

Signed-off-by: Samhita Alla <[email protected]>

* modify tests

Signed-off-by: Samhita Alla <[email protected]>

* json to iterator[json]

Signed-off-by: Samhita Alla <[email protected]>

* update plugin readme

Signed-off-by: Samhita Alla <[email protected]>

* replace flytefile with jsonlfile

Signed-off-by: Samhita Alla <[email protected]>

* modify output type of batch; add typealias

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* replace openai-batch-endpoint with openai-batch

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* revert to secrets

Signed-off-by: Samhita Alla <[email protected]>

* fix openai batch code; add json iterator click type

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* fix types

Signed-off-by: Samhita Alla <[email protected]>

* add shim tasks

Signed-off-by: Samhita Alla <[email protected]>

* dict to dataclass; add container image; add guess_python_type to json iterator

Signed-off-by: Samhita Alla <[email protected]>

* json iterator check in type engine and flytefile

Signed-off-by: Samhita Alla <[email protected]>

* update image version

Signed-off-by: Samhita Alla <[email protected]>

* add secret; update json iterator

Signed-off-by: Samhita Alla <[email protected]>

* add secret to shim task init method

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* add secret to dict

Signed-off-by: Samhita Alla <[email protected]>

* fix logging error; remove iterator copy; remove  in flyte entity names

Signed-off-by: Samhita Alla <[email protected]>

* openai tests

Signed-off-by: Samhita Alla <[email protected]>

* lint and remove auto spec in openai tests

Signed-off-by: Samhita Alla <[email protected]>

* fix test

Signed-off-by: Samhita Alla <[email protected]>

* json key check

Signed-off-by: Samhita Alla <[email protected]>

* modify input type of upload json data to jsonlfile only

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl to mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type and fix tests

Signed-off-by: Samhita Alla <[email protected]>

* add json mime type

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* incorporate kevin's suggestion

Signed-off-by: Samhita Alla <[email protected]>

* requests 2.32.2 doesn't work either

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
austin362667 pushed a commit to austin362667/flytekit that referenced this pull request Jun 15, 2024
* add openai batch endpoint agent

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl type transformer, modify agent

Signed-off-by: Samhita Alla <[email protected]>

* revert format changes

Signed-off-by: Samhita Alla <[email protected]>

* revert iterator edits

Signed-off-by: Samhita Alla <[email protected]>

* typealias

Signed-off-by: Samhita Alla <[email protected]>

* add jsonlines

Signed-off-by: Samhita Alla <[email protected]>

* remove typealias

Signed-off-by: Samhita Alla <[email protected]>

* update readthedocs python version

Signed-off-by: Samhita Alla <[email protected]>

* update docs python version

Signed-off-by: Samhita Alla <[email protected]>

* replace dict with enum

Signed-off-by: Samhita Alla <[email protected]>

* modify json type; add a check to validate if iterator's empty

Signed-off-by: Samhita Alla <[email protected]>

* ignore mypy check

Signed-off-by: Samhita Alla <[email protected]>

* modify JSON type

Signed-off-by: Samhita Alla <[email protected]>

* move to openai folder

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* update setup.py

Signed-off-by: Samhita Alla <[email protected]>

* batch_api to batch, add plugin to setup.py

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* fix lint error

Signed-off-by: Samhita Alla <[email protected]>

* remove guess_python_type

Signed-off-by: Samhita Alla <[email protected]>

* modify json type

Signed-off-by: Samhita Alla <[email protected]>

* modify tests

Signed-off-by: Samhita Alla <[email protected]>

* json to iterator[json]

Signed-off-by: Samhita Alla <[email protected]>

* update plugin readme

Signed-off-by: Samhita Alla <[email protected]>

* replace flytefile with jsonlfile

Signed-off-by: Samhita Alla <[email protected]>

* modify output type of batch; add typealias

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* replace openai-batch-endpoint with openai-batch

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* revert to secrets

Signed-off-by: Samhita Alla <[email protected]>

* fix openai batch code; add json iterator click type

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* fix types

Signed-off-by: Samhita Alla <[email protected]>

* add shim tasks

Signed-off-by: Samhita Alla <[email protected]>

* dict to dataclass; add container image; add guess_python_type to json iterator

Signed-off-by: Samhita Alla <[email protected]>

* json iterator check in type engine and flytefile

Signed-off-by: Samhita Alla <[email protected]>

* update image version

Signed-off-by: Samhita Alla <[email protected]>

* add secret; update json iterator

Signed-off-by: Samhita Alla <[email protected]>

* add secret to shim task init method

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* add secret to dict

Signed-off-by: Samhita Alla <[email protected]>

* fix logging error; remove iterator copy; remove  in flyte entity names

Signed-off-by: Samhita Alla <[email protected]>

* openai tests

Signed-off-by: Samhita Alla <[email protected]>

* lint and remove auto spec in openai tests

Signed-off-by: Samhita Alla <[email protected]>

* fix test

Signed-off-by: Samhita Alla <[email protected]>

* json key check

Signed-off-by: Samhita Alla <[email protected]>

* modify input type of upload json data to jsonlfile only

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl to mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type and fix tests

Signed-off-by: Samhita Alla <[email protected]>

* add json mime type

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* incorporate kevin's suggestion

Signed-off-by: Samhita Alla <[email protected]>

* requests 2.32.2 doesn't work either

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
* add openai batch endpoint agent

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl type transformer, modify agent

Signed-off-by: Samhita Alla <[email protected]>

* revert format changes

Signed-off-by: Samhita Alla <[email protected]>

* revert iterator edits

Signed-off-by: Samhita Alla <[email protected]>

* typealias

Signed-off-by: Samhita Alla <[email protected]>

* add jsonlines

Signed-off-by: Samhita Alla <[email protected]>

* remove typealias

Signed-off-by: Samhita Alla <[email protected]>

* update readthedocs python version

Signed-off-by: Samhita Alla <[email protected]>

* update docs python version

Signed-off-by: Samhita Alla <[email protected]>

* replace dict with enum

Signed-off-by: Samhita Alla <[email protected]>

* modify json type; add a check to validate if iterator's empty

Signed-off-by: Samhita Alla <[email protected]>

* ignore mypy check

Signed-off-by: Samhita Alla <[email protected]>

* modify JSON type

Signed-off-by: Samhita Alla <[email protected]>

* move to openai folder

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* update setup.py

Signed-off-by: Samhita Alla <[email protected]>

* batch_api to batch, add plugin to setup.py

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* modify return type

Signed-off-by: Samhita Alla <[email protected]>

* fix lint error

Signed-off-by: Samhita Alla <[email protected]>

* remove guess_python_type

Signed-off-by: Samhita Alla <[email protected]>

* modify json type

Signed-off-by: Samhita Alla <[email protected]>

* modify tests

Signed-off-by: Samhita Alla <[email protected]>

* json to iterator[json]

Signed-off-by: Samhita Alla <[email protected]>

* update plugin readme

Signed-off-by: Samhita Alla <[email protected]>

* replace flytefile with jsonlfile

Signed-off-by: Samhita Alla <[email protected]>

* modify output type of batch; add typealias

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* replace openai-batch-endpoint with openai-batch

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* revert to secrets

Signed-off-by: Samhita Alla <[email protected]>

* fix openai batch code; add json iterator click type

Signed-off-by: Samhita Alla <[email protected]>

* update readme

Signed-off-by: Samhita Alla <[email protected]>

* fix types

Signed-off-by: Samhita Alla <[email protected]>

* add shim tasks

Signed-off-by: Samhita Alla <[email protected]>

* dict to dataclass; add container image; add guess_python_type to json iterator

Signed-off-by: Samhita Alla <[email protected]>

* json iterator check in type engine and flytefile

Signed-off-by: Samhita Alla <[email protected]>

* update image version

Signed-off-by: Samhita Alla <[email protected]>

* add secret; update json iterator

Signed-off-by: Samhita Alla <[email protected]>

* add secret to shim task init method

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* fix secret

Signed-off-by: Samhita Alla <[email protected]>

* add secret to dict

Signed-off-by: Samhita Alla <[email protected]>

* fix logging error; remove iterator copy; remove  in flyte entity names

Signed-off-by: Samhita Alla <[email protected]>

* openai tests

Signed-off-by: Samhita Alla <[email protected]>

* lint and remove auto spec in openai tests

Signed-off-by: Samhita Alla <[email protected]>

* fix test

Signed-off-by: Samhita Alla <[email protected]>

* json key check

Signed-off-by: Samhita Alla <[email protected]>

* modify input type of upload json data to jsonlfile only

Signed-off-by: Samhita Alla <[email protected]>

* add jsonl to mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type

Signed-off-by: Samhita Alla <[email protected]>

* change mime type and fix tests

Signed-off-by: Samhita Alla <[email protected]>

* add json mime type

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* incorporate kevin's suggestion

Signed-off-by: Samhita Alla <[email protected]>

* requests 2.32.2 doesn't work either

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants