-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kedro-telemetry
: Spike to reduce redundant telemetry events
#730
Comments
kedro-telemetry
: Spike to reduce redundant telemetry events
I have some idea, which may depends on There are 2 event sent originally.
The 3 was added in
Is the performance critical? Especially if we can send the event only at the end, I don't think there will be a strong performance hit (I haven't seen any complains about this before). One thing that I am certain that helps is combine the CLIHook and regualr Hook as one single class. In general we enrich the event and only send it at the end. We also have to consider different entrypoint: |
After checking the data, the first 2 events are almost identical. The only difference is that the 1st get a differnet "event_name" so it could have some ergonomic benefit on HEAP specifically. I go through the charts quickly and couldn't find any usage of it, so I think it's safe to remove. |
Description
Currently, when running a
kedro new
command with telemetry enabled, we send mostly the same data using_send_heap_event()
3! times. This seems excessive, especially as we move to opt-out telemetry with #715, which will definitely increase traffic.The reason for this redundancy is that we send two identical events after each kedro command in the
before_command_run
hook:I don't know the exact reasons for this duplication, but it seems like the last one is sufficient since it fully contains the first one.
Additionally, if
after_catalog_created
is triggered, we send one more event with data that was already included in the previous two events, along with additional project properties like the number of nodes and pipelines.So proposal is to send only one piece of data per command:
after_catalog_created
will be triggered after the command, send all information for that command in that hook.before_command_run
hook.We need to gather requirements about what exactly is needed on the
HEAP
/ snowflake(?) side.Additionally, after revising, please address the following:
The text was updated successfully, but these errors were encountered: