-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add source suggestions for Brave News #25563
Add source suggestions for Brave News #25563
Comments
I was thinking something like this. For the comparing priority, I would treat indirectly subscribed sources (via Channels) as simple unsubscribed sources, and only consider the following signals:
As for showing:
I wouldn't consider the indirect subscription signal unless it's supported by a stronger interest signal (history), because for some categories/channels there might be sources that the user might entirely ignore and we should not prioritise those (i.e. a user subscribed to Entertainment but that is not interested in Music [Pitchfork, NME] at all). |
The Source Suggestions spec doc has been updated to reflect these details. It says we want to reflect direct subscriptions and history, not channel membership. |
Okay, I'm working on implementing this at the moment, and it would be good to formalize this a bit more (say with some weights we give to everything). Apart from that, I have a few questions:
// 0 - 1, depending on whether the publisher is enabled.
const getEnabledWeighting = (publisher) => {
// Maybe we want to do some extra weighting here for being in channels the user is subscribed to?
return publisher.subscribed ? 1 : 0;
}
// Completely arbitrary, but 0.4 - 1, based on how often the user has visited this publisher in the past.
// |normalizedVisitWeights| are the visits to each publisher, divided by the visits to the most visited
// publisher in the last 200 days.
const getVisitRating = (publisher, normalizedVisitWeights) => {
const kMinWeight = 0.4;
const kMaxWeight = 1;
return normalizedVisitWeights[publisher] * (kMaxWeight - kMinWeight) + kMinWeight;
} |
cc @LorenzoMinto and @aurangzaib048 |
hey @fallaciousreasoning are you talking about suggestions ranking or feed ranking? If referring to suggestions ranking, my thought is that we should prioritise visits over similar to visits. Because visits would be the strongest signal there. As for similar to subscribed vs similar to visits I would prioritise the first. Wdyt? PS: Just noticed the function in the spec is way outdated. Working now on coming up with a new one that reflects the above priorities. |
We could have three different regularised contributors:
(the scores ranges are pretty much arbitrary, we can discuss). The final score for each source would then be the sum of the three score above
and finally we would sample from the score distribution Each contributor is normalised independently over the scores from all other sources and projected to that specific range. I would suggest we only look at the top-10 similar sources to compute the
the score would then be normalised over the entire Let me know what you think, maybe there's a simpler solution to achieve those priorities. This solution (with a decent tuning) would allow a gradual blending of them. |
For the questions around which locales to pull suggestions from, if we consider the same sources can appear in multiple locales, then I think we have to consider a user's "locale list" to decide which similarity files to download and pull from. Here is a scenario I'm thinking about: If the user is subscribed to "XYZ News" and it is a source that is "in" both EN_US and EN_CA, we do not want to suggest other EN_CA news sources to user's which have the EN_US locale. So perhaps the list of similarity locales to consider for a source are:
If there's still no match, which is possible especially if the user has no channel subscriptions, then perhaps we scan the list of subscriptions for the most common locale, or maybe just combine all the relevant locale similarity matrices. |
If this creates a lot of work then it might be best to identify a single rule that likely covers the most common cases. I think we would cover a lot of ground with "The locale the user's OS is set to". |
Absolutely fine to at least start with that then build incrementally if needed, since that's contained within the suggestion above. |
Okay, I have a first pass implementation based on our discussion (brave/brave-core#15447). While writing it I came up with a few more questions:
@LorenzoMinto, I agree, |
@LorenzoMinto what do you think? |
Yes. Fully agree on using the visit score to weight visited domains contributions 👌 |
Verification
|
Brave | 1.46.79 Chromium: 107.0.5304.62 (Official Build) dev (x86_64) |
---|---|
Revision | 1eec40d3a5764881c92085aaee66d25075c159aa-refs/branch-heads/5304@{#942} |
OS | macOS Version 11.7.1 (Build 20G918) |
Steps:
- installed
1.46.79
- launched Brave
- opened
brave://flags/
- set
brave://flags/#brave-news-v2
toEnabled
- clicked
Relaunch
- opened a new-tab page
- scrolled down
- clicked on
Show Brave News
- clicked on
Customize
- searched for
the drive
- clicked on the
Follow
button - clicked on the
x
to close theCustomize
dialog - reloaded the Brave News tab
- clicked
Customize
again - examined the
Suggestions
list
Confirmed I got Car & Driver
, PopularMechanics
(sic), and Ars Technica
suggestions
step 5 | step 8 | step 9 | step 10 | step 11 | step 15 |
---|---|---|---|---|---|
Url format
https://[hostname]/source-suggestions/source_similarity_t10.[region].json
Where region is, e.g.
en_US
The format is:
There is also a human readable file at
https://[hostname]/source-suggestions/source_similarity_t10_hr.[region].json
, the only purpose of which is to more easily check expected results, where the format is:Each file provides a lookup for a given PublisherID to a list of similar PublisherIDs with a score ranking for how similar they are to each other (higher score means more similar).
Sources we should compare from, in priority order:
We will take that source list and use the similarity matrix map to produce a list of "suggested sources".
List we should show, in priority order:
(We should not show sources that the user is already directly subscribed to)
Note: when talking about "direct" subscriptions above, we refer to any mode of subscription: combined sources or rss feed.
Which similarity region files to download? Any regions which the user has channel or feed subscriptions. i.e. the same regions we download feed.json files for.
When should we download the similarity files? An appropriate time seems to be when downloading feed subscriptions, since that occurs when the user modifies their feed subscriptions, and is also when we calculate which regions to download from. However, there may be a couple benefits to doing it when downloading sources, since that is when we search through history. However, we can search history for publisher matches again at this new "source similarity comparison" time.
The text was updated successfully, but these errors were encountered: