Add source suggestions for Brave News #25563

petemill · 2022-09-22T17:38:37Z

Url format https://[hostname]/source-suggestions/source_similarity_t10.[region].json
Where region is, e.g. en_US

The format is:

{
  [key: PublisherID]: {
    source: PublisherID
    score: number
  }[]
}

There is also a human readable file at https://[hostname]/source-suggestions/source_similarity_t10_hr.[region].json, the only purpose of which is to more easily check expected results, where the format is:

{
  [key: PublisherName]: {
    source: PublisherName
    score: number
  }[]
}

Each file provides a lookup for a given PublisherID to a list of similar PublisherIDs with a score ranking for how similar they are to each other (higher score means more similar).

Sources we should compare from, in priority order:

Sources the user has directly subscribed to
Sources the user has indirectly subscribed to (i.e. as part of a channel) and the user has visited the site recently
Sources the user has indirectly subscribed to (i.e. as part of a channel) and we have no interest signal

We will take that source list and use the similarity matrix map to produce a list of "suggested sources".

List we should show, in priority order:

Sources that the user is not directly or indirectly subscribed to
Sources that the user is indirectly subscribed to (i.e. as part of a channel)
(We should not show sources that the user is already directly subscribed to)

Note: when talking about "direct" subscriptions above, we refer to any mode of subscription: combined sources or rss feed.

Which similarity region files to download? Any regions which the user has channel or feed subscriptions. i.e. the same regions we download feed.json files for.

When should we download the similarity files? An appropriate time seems to be when downloading feed subscriptions, since that occurs when the user modifies their feed subscriptions, and is also when we calculate which regions to download from. However, there may be a couple benefits to doing it when downloading sources, since that is when we search through history. However, we can search history for publisher matches again at this new "source similarity comparison" time.

The text was updated successfully, but these errors were encountered:

LorenzoMinto · 2022-09-23T10:18:20Z

I was thinking something like this. For the comparing priority, I would treat indirectly subscribed sources (via Channels) as simple unsubscribed sources, and only consider the following signals:

Sources the user has directly subscribed to
Sources the user has visited recently a threshold of times (independently of subscription status)

As for showing:

Sources that the user is not directly subscribed to and that have strong interest signal (history) (not coming via suggestions)
Sources that the user is not directly subscribed to (coming from suggestions)

I wouldn't consider the indirect subscription signal unless it's supported by a stronger interest signal (history), because for some categories/channels there might be sources that the user might entirely ignore and we should not prioritise those (i.e. a user subscribed to Entertainment but that is not interested in Music [Pitchfork, NME] at all).

mattmcalister · 2022-09-26T14:37:31Z

The Source Suggestions spec doc has been updated to reflect these details. It says we want to reflect direct subscriptions and history, not channel membership.

fallaciousreasoning · 2022-10-16T23:58:29Z

Okay, I'm working on implementing this at the moment, and it would be good to formalize this a bit more (say with some weights we give to everything). Apart from that, I have a few questions:

To confirm, we SHOULD suggest sources similar to ones the user has visited, even if they aren't subscribed to that source?
Should visits to a source that isn't subscribed mean we should suggest subscribing to that sources? How do we calculate the rank here?
How do we handle different locales here? For example, if a user is subscribed to a feed in en_US and es_MX we should show suggestions for both of these locales. Simplest for me, I think would be to download both similarity matrices and merge them into one big similarity matrix. However, I'm not sure what to do if a publisher is in multiple locales with different weights? Just take the highest/lowest score? Average them?

// 0 - 1, depending on whether the publisher is enabled.
const getEnabledWeighting = (publisher) => {
    // Maybe we want to do some extra weighting here for being in channels the user is subscribed to?
	return publisher.subscribed ? 1 : 0;
}

// Completely arbitrary, but 0.4 - 1, based on how often the user has visited this publisher in the past.
// |normalizedVisitWeights| are the visits to each publisher, divided by the visits to the most visited
// publisher in the last 200 days.
const getVisitRating = (publisher, normalizedVisitWeights) => {
    const kMinWeight = 0.4;
	const kMaxWeight = 1;
	return normalizedVisitWeights[publisher] * (kMaxWeight - kMinWeight) + kMinWeight;
}

mattmcalister · 2022-10-17T15:01:36Z

cc @LorenzoMinto and @aurangzaib048

LorenzoMinto · 2022-10-17T15:53:49Z

hey @fallaciousreasoning are you talking about suggestions ranking or feed ranking?

If referring to suggestions ranking, my thought is that we should prioritise visits over similar to visits. Because visits would be the strongest signal there. As for similar to subscribed vs similar to visits I would prioritise the first. Wdyt?

PS: Just noticed the function in the spec is way outdated. Working now on coming up with a new one that reflects the above priorities.

LorenzoMinto · 2022-10-17T17:15:44Z

We could have three different regularised contributors:

visited: [0.4, 1], similar_sub: [0, 0.4], similar_visited: [0, 0.3]

(the scores ranges are pretty much arbitrary, we can discuss). The final score for each source would then be the sum of the three score above

s(i) = visited[i] + similar_sub[i] + similar_visited[i]  # min: 0, max: 1.7

and finally we would sample from the score distribution s to create the actual suggestion list.

Each contributor is normalised independently over the scores from all other sources and projected to that specific range. I would suggest we only look at the top-10 similar sources to compute the similar_sub and similar_visited scores. For example, in pseudo code:

similar_sub[i] = sum([sim(i,j)*getEnabledWeighting(j) for j in top_similar(i, 10)])

the score would then be normalised over the entire similar_sub vector and projected over [0, 0.4] in the case of this predictor, like @fallaciousreasoning did for the getVisitRating.

Let me know what you think, maybe there's a simpler solution to achieve those priorities. This solution (with a decent tuning) would allow a gradual blending of them.

petemill · 2022-10-17T20:24:20Z

For the questions around which locales to pull suggestions from, if we consider the same sources can appear in multiple locales, then I think we have to consider a user's "locale list" to decide which similarity files to download and pull from.

Here is a scenario I'm thinking about: If the user is subscribed to "XYZ News" and it is a source that is "in" both EN_US and EN_CA, we do not want to suggest other EN_CA news sources to user's which have the EN_US locale.
However, if the user has purposefully subscribed to "ZYX News" which is only in EN_CA then, even if the user has the EN_US locale, we should suggest other EN_CA news sources to the user. This becomes more important where the user has a locale which we don't have a direct list of sources for, e.g. EN_FR.

So perhaps the list of similarity locales to consider for a source are:
If the source has a single locale

The source's local
If the source has multiple locales
The locale the user's OS is set to, if there's a match
OR any of those locales which the user also has channel subscriptions to
OR any of those locales which the user also has single-locale source subscriptions to

If there's still no match, which is possible especially if the user has no channel subscriptions, then perhaps we scan the list of subscriptions for the most common locale, or maybe just combine all the relevant locale similarity matrices.

mattmcalister · 2022-10-17T20:43:42Z

If this creates a lot of work then it might be best to identify a single rule that likely covers the most common cases. I think we would cover a lot of ground with "The locale the user's OS is set to".

petemill · 2022-10-17T20:54:57Z

If this creates a lot of work then it might be best to identify a single rule that likely covers the most common cases. I think we would cover a lot of ground with "The locale the user's OS is set to".

Absolutely fine to at least start with that then build incrementally if needed, since that's contained within the suggestion above.

fallaciousreasoning · 2022-10-18T01:51:59Z

Okay, I have a first pass implementation based on our discussion (brave/brave-core#15447). While writing it I came up with a few more questions:

Should sources the user has disabled ever be suggested? (in the PR, they are).
For a source which is similar to one the user has visited before, should that similarity be multiplied by the visit score (i.e. if I visit theatlantic.com lots, should sources similar to it be more recommended than sources similar to fox.com, which I only visited once).

@LorenzoMinto, I agree, visits should probably be our strongest signal, then similar to subscribed then similar to visits.

mattmcalister · 2022-10-18T07:36:28Z

We definitely shouldn't suggest a source that a user has chosen to "Hide". And also if they "Unfollow" a source then it would probably be expected that it is not suggested but that's a weaker signal.
good point. and since visits are the signal we value most maybe it should be used to weight the suggestions.

@LorenzoMinto what do you think?

LorenzoMinto · 2022-10-18T10:41:24Z

Yes. Fully agree on using the visit score to weight visited domains contributions 👌

stephendonner · 2022-10-25T22:05:58Z

Verification `PASSED` using

Brave	1.46.79 Chromium: 107.0.5304.62 (Official Build) dev (x86_64)
Revision	1eec40d3a5764881c92085aaee66d25075c159aa-refs/branch-heads/5304@{#942}
OS	macOS Version 11.7.1 (Build 20G918)

Steps:

installed 1.46.79
launched Brave
opened brave://flags/
set brave://flags/#brave-news-v2 to Enabled
clicked Relaunch
opened a new-tab page
scrolled down
clicked on Show Brave News
clicked on Customize
searched for the drive
clicked on the Follow button
clicked on the x to close the Customize dialog
reloaded the Brave News tab
clicked Customize again
examined the Suggestions list

Confirmed I got `Car & Driver`, `PopularMechanics` (sic), and `Ars Technica` suggestions

step 5	step 8	step 9	step 10	step 11	step 15

srirambv · 2022-11-03T06:09:44Z

Removing OS/Android label as the front-end work is not yet done for Android. Logged follow up issue #26497. More info here

petemill added QA/Yes release-notes/include OS/Android Fixes related to Android browser functionality OS/Desktop labels Sep 22, 2022

petemill assigned petemill and fallaciousreasoning Sep 22, 2022

fallaciousreasoning mentioned this issue Oct 13, 2022

Brave News Source Suggestions brave/brave-core#15447

Merged

25 tasks

petemill mentioned this issue Oct 13, 2022

[Desktop] Set Brave News 2.0 flag (new customize UX) to enabled #25890

Closed

13 tasks

petemill closed this as completed in brave/brave-core#15447 Oct 18, 2022

brave-builds added this to the 1.46.x - Nightly milestone Oct 18, 2022

fallaciousreasoning mentioned this issue Oct 18, 2022

[Brave News] Suggestions Followup for #15447 brave/brave-core#15522

Merged

25 tasks

stephendonner added feature/brave-news formerly brave-today QA Pass-macOS labels Oct 25, 2022

stephendonner mentioned this issue Oct 25, 2022

Customize Dashboard dialog disappears when clicking Back to Dashboard #26252

Closed

srirambv mentioned this issue Nov 3, 2022

Brave News Source suggestions #26497

Open

srirambv removed the OS/Android Fixes related to Android browser functionality label Nov 3, 2022

LaurenWags mentioned this issue Nov 30, 2022

Release notes for 1.46.x - Release #27065

Closed

rebron changed the title ~~Brave News Source suggestions~~ Add source suggestions for Brave News Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add source suggestions for Brave News #25563

Add source suggestions for Brave News #25563

petemill commented Sep 22, 2022

LorenzoMinto commented Sep 23, 2022

mattmcalister commented Sep 26, 2022

fallaciousreasoning commented Oct 16, 2022

mattmcalister commented Oct 17, 2022

LorenzoMinto commented Oct 17, 2022 •

edited

Loading

LorenzoMinto commented Oct 17, 2022

petemill commented Oct 17, 2022

mattmcalister commented Oct 17, 2022

petemill commented Oct 17, 2022

fallaciousreasoning commented Oct 18, 2022

mattmcalister commented Oct 18, 2022

LorenzoMinto commented Oct 18, 2022

stephendonner commented Oct 25, 2022 •

edited

Loading

srirambv commented Nov 3, 2022

Add source suggestions for Brave News #25563

Add source suggestions for Brave News #25563

Comments

petemill commented Sep 22, 2022

LorenzoMinto commented Sep 23, 2022

mattmcalister commented Sep 26, 2022

fallaciousreasoning commented Oct 16, 2022

mattmcalister commented Oct 17, 2022

LorenzoMinto commented Oct 17, 2022 • edited Loading

LorenzoMinto commented Oct 17, 2022

petemill commented Oct 17, 2022

mattmcalister commented Oct 17, 2022

petemill commented Oct 17, 2022

fallaciousreasoning commented Oct 18, 2022

mattmcalister commented Oct 18, 2022

LorenzoMinto commented Oct 18, 2022

stephendonner commented Oct 25, 2022 • edited Loading

Verification PASSED using

Steps:

Confirmed I got Car & Driver, PopularMechanics (sic), and Ars Technica suggestions

srirambv commented Nov 3, 2022

LorenzoMinto commented Oct 17, 2022 •

edited

Loading

stephendonner commented Oct 25, 2022 •

edited

Loading

Verification `PASSED` using

Confirmed I got `Car & Driver`, `PopularMechanics` (sic), and `Ars Technica` suggestions