-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPIK-99] Implement batching for spans creation #298
[OPIK-99] Implement batching for spans creation #298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, but mostly LGTM. I had maybe some comment about optimising the batcher lock, but I don't want to create additional noise.
Once this is reviewed and validated, from my side we're good to go.
sdks/python/src/opik/message_processing/batching/base_batcher.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's debug why span name
is none with this change before proceeding with the merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments while the discussed items are implemented.
.github/workflows/sdk-e2e-tests.yaml
Outdated
@@ -34,7 +34,7 @@ jobs: | |||
- name: Run latest Opik server | |||
run: | | |||
cd deployment/docker-compose | |||
docker compose up -d | |||
docker compose up -d --build --force-recreate --no-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's double check if these flags --force-recreate --no-deps
were just a temporary fix or if we need it in the long run:
If the later, we need to propagate this everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that --build only builds if the tag specified doesn't exist yet. However, all PR seems to run with the value.
https://github.com/comet-ml/opik/blob/main/deployment/docker-compose/docker-compose.yaml#L61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This forces the rebuid of the image/tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will conflict with this PR:
- build opik server while runing it for tests #310
Please @liyaka leave your thoughts here.
args: | ||
OPIK_VERSION: ${OPIK_VERSION:-latest} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, let's double check if this was temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thiagohora ? I thought it's a part of the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, shit seems to be missing in our docker image. We such a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liyaka please see this thread above.
sdks/python/src/opik/message_processing/batching/batch_manager_constuctors.py
Outdated
Show resolved
Hide resolved
sdks/python/src/opik/message_processing/batching/flushing_thread.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just wait on some input from @liyaka for the build fixes. Anything else LGTM.
.github/workflows/sdk-e2e-tests.yaml
Outdated
@@ -34,7 +34,7 @@ jobs: | |||
- name: Run latest Opik server | |||
run: | | |||
cd deployment/docker-compose | |||
docker compose up -d | |||
docker compose up -d --build --force-recreate --no-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will conflict with this PR:
- build opik server while runing it for tests #310
Please @liyaka leave your thoughts here.
args: | ||
OPIK_VERSION: ${OPIK_VERSION:-latest} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liyaka please see this thread above.
…add tests for span batcher
…anually in Opik clients created under the hood of SDK
6357040
to
ed29744
Compare
9d5d062
to
944fa9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undone the e2e infra changes as they conflict and they should tackled separately.
* Batch processing draft * Batching draft * [DRAFT] Update backend emulator for tests, update message processing to use batching * Fix lint errors * Implement _process_create_span_batch_message method in MessageProcessor * Remove assert, add debug log message * Add base batcher unit test * Make batch manager flush all batches if streamer.flush() was called, add tests for span batcher * Add tests for batch manager * Add test for flushing thread * Add one more test for batch manager * Fix lint errors * Fix lint errors * Fix bug when backend emulator didn't add feedback scores to traces * Fix lint errors * Rename flush_interval to flush_interval_seconds * Enable debug logs for e2e tests. Update debug messages in message_processors.py * Remove export statement from e2e workflow * Enable backend build. Enable debug logs for e2e tests * Update docker compose and e2e workflow file * Make batching disabled by default in Opik client. It is now enabled manually in Opik clients created under the hood of SDK * Add more unit tests for message processing * Add docstring for _use_batching parameter * Add missing _SECONDS suffix * Rename constant * Undone e2e tests infra changes --------- Co-authored-by: Andres Cruz <[email protected]>
Details
Added batching for CREATE-SPAN requests.
Batches are flushed to the streamer queue automatically every second or whenever accumulated message items reach max batch size limit (at the moment - 1000 items).
Batching works only inside the SDK, if someone tries to do the following
... batching will not work for that client (and all clients derived from it).
Under the hood of the SDK we instantiate Opik client like that
Issues
Resolves the issue when SDK sends a lot of CREATE-SPAN requests to the backend independetly, loading it and utilizing more network resources.
Backend should be less loaded,
Testing
Added unit tests. E2E tests also pass, but this PR requires extra QA.
Documentation