You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clustering is generally reliable. We can use shared associations to ensure that articles in the same cluster have scores that are more similar to one another.
However, in cases where it fails it can really undermine accuracy. (Examples: amc2056, cos2006, rjk9003).
In such cases, it is possible to infer to compare firstName of targetAuthors in a given cluster to infer how reliable that cluster is. If the names are consistent, we use the full cluster score. If it is not, we discount the cluster score.
If an article is in its own cluster with no other members, do not use this strategy.
Is use.gold.standard.evidence=true in application.properties?
If yes, go to 2.
If no, go 3.
Add up all evidence scores for that article including, if they exist, acceptedArticleScore or rejectedArticleScore. We will call this totalArticleScore-WithoutClustering. Go to 4.
Add up all evidence scores for that article excluding acceptedArticleScore or rejectedArticleScore. We will call this totalArticleScore-WithoutClustering. Go to 4.
Take average of values of totalArticleScore-WithoutClustering in a given cluster. We will call this clusterScore-Average.
For every article in a given cluster, retrieve all instances of articleAuthorName.firstName. For example:
Count the total number of names remaining and the count of the most frequent. For example, the totalNameCount is 14, and the maxIdenticalNameCount is 5 ("aewon" and "ockum" are both 5).
Compute clusterReliabilityScore using this formula.
Background
Clustering is generally reliable. We can use shared associations to ensure that articles in the same cluster have scores that are more similar to one another.
However, in cases where it fails it can really undermine accuracy. (Examples: amc2056, cos2006, rjk9003).
In such cases, it is possible to infer to compare firstName of targetAuthors in a given cluster to infer how reliable that cluster is. If the names are consistent, we use the full cluster score. If it is not, we discount the cluster score.
Properties
Store this in application.properites.
Psuedocode
If an article is in its own cluster with no other members, do not use this strategy.
Is use.gold.standard.evidence=true in application.properties?
Add up all evidence scores for that article including, if they exist,
acceptedArticleScore
orrejectedArticleScore
. We will call thistotalArticleScore-WithoutClustering
. Go to 4.Add up all evidence scores for that article excluding acceptedArticleScore or rejectedArticleScore. We will call this
totalArticleScore-WithoutClustering
. Go to 4.Take average of values of
totalArticleScore-WithoutClustering
in a given cluster. We will call thisclusterScore-Average
.For every article in a given cluster, retrieve all instances of articleAuthorName.firstName. For example:
Count the total number of names remaining and the count of the most frequent. For example, the totalNameCount is 14, and the maxIdenticalNameCount is 5 ("aewon" and "ockum" are both 5).
Compute clusterReliabilityScore using this formula.
For example: (5/14)^3 = 0.0455.
clusterScore-Average
should affect any one cluster.Retrieve
clusterScore-Factor
from application.properties.For each given article, we will calculate clusterScoreDiscrepancy.
totalArticleScore-nonStandardized
The text was updated successfully, but these errors were encountered: