Create averageClusteringScoringStrategy #232

paulalbert1 · 2018-07-12T14:19:25Z

Background

Clustering is generally reliable. We can use shared associations to ensure that articles in the same cluster have scores that are more similar to one another.

However, in cases where it fails it can really undermine accuracy. (Examples: amc2056, cos2006, rjk9003).

In such cases, it is possible to infer to compare firstName of targetAuthors in a given cluster to infer how reliable that cluster is. If the names are consistent, we use the full cluster score. If it is not, we discount the cluster score.

Properties

Store this in application.properites.

clusterReliabilityScoreFactor: 3
clusterScore-Factor: 0.4

Psuedocode

If an article is in its own cluster with no other members, do not use this strategy.
Is use.gold.standard.evidence=true in application.properties?

If yes, go to 2.
If no, go 3.

Add up all evidence scores for that article including, if they exist, acceptedArticleScore or rejectedArticleScore. We will call this totalArticleScore-WithoutClustering. Go to 4.
Add up all evidence scores for that article excluding acceptedArticleScore or rejectedArticleScore. We will call this totalArticleScore-WithoutClustering. Go to 4.
Take average of values of totalArticleScore-WithoutClustering in a given cluster. We will call this clusterScore-Average.
For every article in a given cluster, retrieve all instances of articleAuthorName.firstName. For example:

firstName=[RaeKwon]
firstName=[RaeKwon]
firstName=[RaeKwon]
firstName=[RaeKwon]
firstName=[RaeKwon]
firstName=[RK]
firstName=[RockBum]
firstName=[RockBum]
firstName=[RockBum]
firstName=[RockBum]
firstName=[RockBum]
firstName=[RulBin]
firstName=[RyeoJin]
firstName=[RyeoJin]
firstName=[RyoonHo]

Remove all capital letters. For example:

firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ulin]
firstName=[yeoin]
firstName=[yeoin]
firstName=[yoono]

Remove cases where there are no longer any letters. For example:

firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[aewon]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ockum]
firstName=[ulin]
firstName=[yeoin]
firstName=[yeoin]
firstName=[yoono]

Count the total number of names remaining and the count of the most frequent. For example, the totalNameCount is 14, and the maxIdenticalNameCount is 5 ("aewon" and "ockum" are both 5).
Compute clusterReliabilityScore using this formula.

(maxIdenticalNameCount / totalNameCount) ^ clusterReliabilityScoreFactor

For example: (5/14)^3 = 0.0455.

Now let's figure out on an article by article basis how much clusterScore-Average should affect any one cluster.

Retrieve clusterScore-Factor from application.properties.

clusterScore-Factor: 0.4

For each given article, we will calculate clusterScoreDiscrepancy.

clusterScore-Discrepancy = (totalArticleScore-WithoutClustering - clusterScore-Average) * clusterScore-Factor * clusterReliabilityScoreFactor

Calculate totalArticleScore-nonStandardized

totalArticleScore-nonStandardized = totalArticleScore-WithoutClustering - clusterScore-Discrepancy

Output the following at the article level:

totalArticleScore-nonStandardized = 7.2 /* example */
clusterScore-Average: 7.5 /* example */
clusterScore-Discrepancy = 0.3 /* example */

The text was updated successfully, but these errors were encountered:

paulalbert1 assigned sarbajitdutta Jul 17, 2018

sarbajitdutta added the enhancement label Jul 18, 2018

paulalbert1 mentioned this issue Jul 18, 2018

Create totalArticleScore-standardized and totalArticleScore-nonStandardized #233

Closed

paulalbert1 closed this as completed Nov 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create averageClusteringScoringStrategy #232

Create averageClusteringScoringStrategy #232

paulalbert1 commented Jul 12, 2018 •

edited

Loading

Create averageClusteringScoringStrategy #232

Create averageClusteringScoringStrategy #232

Comments

paulalbert1 commented Jul 12, 2018 • edited Loading

Background

Properties

Psuedocode

paulalbert1 commented Jul 12, 2018 •

edited

Loading