-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InteractiveComplex3 #14
Comments
So this is a weird one. Here's the original query:
The context is, given a person id, two country names, a start date and a duration, expand out from that single user to FRIENDS..2 that are not the same as the originating person (they can't be because we don't have reciprocal relationships, that check can be dropped), where the friends are not located in a place that is a part of either of the two countries. Expand out from those friends to messages that they've created that are located in either of the two countries and that meet the time range criteria for the message createdDate. Per friend, aggregate the counts of messages for each of the two countries (we must have non-zero counts of messages per country) and sum those counts. With the top 20 ordered by the total count and then the friend's id, project out the friend's id, first name, last name, each of the counts, and the combined count. The traditional approach here is expensive. With the parameters given, the person has about 8k distinct friends, who have collectively created near 6 million messages, and the expand from friends to messages, and from messages to country they are located in takes about 18 million db hits, then we have to filter the remaining by date. On my laptop this takes about 1 minute or so. I've got an improved query that takes around 5 seconds or so, but it does so in a completely backwards way that wouldn't make sense for any sane version of this dataset. We start with gathering all the FRIENDS..2 into a list, then expand down from the two countries to their messages and their creators, keeping only those whose creators are in that friends list. Then we filter the messages by their created time, then count() messages per friend and country, do some CASE magic to get the counts into separate columns, sum the counts, filter out any with counts of 0, then get total count and do our ordering and limit. While this works well with the given countries of Scotland and Laos, I don't think this will hold up for the larger countries. Also if we start with users with a less dense friend network (and/or friends who create far fewer messages) then the original approach would probably beat this out. I don't have strong confidence of this as a general replacement, but I'll leave that decision to you, as in this case it's clearly a win.
|
No description provided.
The text was updated successfully, but these errors were encountered: