-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce an analytics sampling rate for VIP sites #48
Conversation
@@ -107,7 +107,7 @@ public function test_rest_api_returns_blocks_for_post() { | |||
'mediaId' => 6, | |||
'mediaLink' => 'https://gutenberg-block-data-api-test.go-vip.net/?attachment_id=6', | |||
'mediaType' => 'image', | |||
'align' => 'wide', | |||
'align' => 'none', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my local testing, it looks like the default value for this has changed so this keeps it in line with that. Wonder if it's a 6.3 change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Looks like this was changed 6 months ago in Gutenberg, which was later merged into WordPress 6.3 like you said.
src/analytics/analytics.php
Outdated
$minutes = $current_timestamp->format( 'i' ); | ||
|
||
// Get the seconds from the date. | ||
$seconds = $current_timestamp->format( 's' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format()
produces strings, so we should probably intval()
these or expressions like 0 !== $seconds
(i.e. 0 !== "0"
) won't work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about this a bit more we may not need checks if seconds
/minutes
is 0
. If I'm doing my math right, the code (with proper types) would be skipping 1/6th of the sampling times:
00:00:00 -> Skip, seconds is 0
00:00:10 -> Send analytics
00:00:20 -> Send analytics
00:00:30 -> Send analytics
00:00:40 -> Send analytics
00:00:50 -> Send analytics
00:01:00 -> Skip, seconds is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yip, updated that! Good catch 👍🏾
src/analytics/analytics.php
Outdated
* | ||
* Current sampling algorithm is that every 10s or 10m, we send analytics. | ||
* | ||
* Max calls possible based on 1 call every 10s: 8640. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingeniumed Please correct me if I'm wrong, but does this math work? We could have several requests come through in the same second which would result in multiple analytics calls in the same second. I think that's fine for sampling purposes, but it doesn't necessarily give us an upper limit on analytics requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I added that in mainly for reference. But I don't want to make us think that we can use this to determine actual usage easily. I've taken it out
src/analytics/analytics.php
Outdated
$seconds = $current_timestamp->format( 's' ); | ||
|
||
// Only send analytics every 10 minutes or 10 seconds. | ||
if ( ( 0 !== $seconds && 0 === $seconds % WPCOMVIP__BLOCK_DATA_API__STAT_SAMPLING_RATE_SEC ) || ( 0 !== $minutes && 0 === $minutes % WPCOMVIP__BLOCK_DATA_API__STAT_SAMPLING_RATE_MIN ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingeniumed Can you explain a bit why we're doing one sample every 10 seconds and another every 10 minutes? It seems like the sampling math is easier to estimate real usage if we stick with just one of these. Even just doing seconds would reduce analytics by 90% but still give us relatively good coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed, and I have dropped the 10m leaving just 10s in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sampling is a great way to reduce our analytics overhead with no extra db/caching cost, I love it. I have a few questions above about the implementation.
@@ -942,6 +942,8 @@ The plugin records two data points for analytics, on VIP sites: | |||
|
|||
Both of these data points are a counter that is incremented, and do not contain any other telemetry or sensitive data. You can see what's being [collected in code here][repo-analytics], and WPVIP's privacy policy [here](https://wpvip.com/privacy/). | |||
|
|||
In addition, the analytics are sent every 10 seconds only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could say "analytics are sampled" that way we don't have to update the README if we change the sampling rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I do see what you mean, I would lean towards being open about the sampling rate as well as the method with which we do that in the README. That way, we are being transparent about it and when we do change it we also have to update the README to reflect that.
$seconds = intval( $current_timestamp->format( 's' ) ); | ||
|
||
// Only send analytics every 10 seconds. | ||
if ( 0 === ( $seconds % WPCOMVIP__BLOCK_DATA_API__STAT_SAMPLING_RATE_SEC ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someday we may want a hook so that customers can sent their own sampling rate no lower than X, but for a different day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see the value of that, especially if they are at a scale where even the tiniest change could have a large impact. We would need to figure out a good floor that we can use, that would still make the data useful to us. I'll add this to the backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does what it says on the tin!
Description
This PR will introduce a sampling algorithm for the analytics that sent, when this plugin is used on a VIP site. The sampling algorithm is as follows:
This will ensure that we catch burst usage, or regular usage throughout the day without losing the core purpose behind the analytics.
Fixes #45
Steps to Test
vip dev-env
is_wpvip_site()
to give back trueis_it_sampling_time
.http://<slug>.vipdev.lndo.site/wp-json/vip-block-data-api/v1/posts/1/blocks
, while keeping an eye on the time