-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize referrer domains #4033
Comments
Good idea to use a list to improve the referrer website. We could also implement this as a plugin in the upcoming marketplace at: http://plugins.piwik.org/ |
Another very smart solution would be to do just group the visits by domain and subdomain. This seems to be easier as we don't need to maintain the effective tld list at all. The result could look like this: ||= Website =||= Visits =|| Ok, we might still need to maintain a shorter list of effective TLDs where we put some country-specific TLDs in, such as co.uk, but we don't need to cover company specific TLDs such as blogsp0t.com, as users can easily unfold the domain to see what blogs are linking most. (btw I hate this comment system which always blacklists my comments just because I include blogsp0t.com. silly!) |
Great idea to add a new "view" of the report with subtables showing subdomains. Maybe we show such new report as a new footer link Related Report "Websites by Domain" under "Websites" report
Or maybe as a "COG" dropdown option. |
I would prefer making the hierarchical view the new default and then let the user "make it flat" as we are doing with the Pages report. Anyone thinking that the flat view is better than grouping by domain? |
Nice idea for a plugin which could filter out the Referrers dataTable to make the grouping as explained here! |
As a first step toward this I worked on a PHP implementation for extracting the "effective" domain name of an hostname. Usage is very simple: > include('EffectiveDomainName.php');
> print EffectiveDomainName::get('mobile.nytimes.com') . "\n";
nytimes.com
> print EffectiveDomainName::get('flightjs.github.io') . "\n";
flightjs.github.io
> print EffectiveDomainName::get('www.google.com.br') . "\n";
google.com.br |
@gka Thanks for the tip. Weird that this issue got closed, I don't think I closed it unless it was by mistake... It would be relatively easy to create a plugin that will either modify existing |
Would you also group and maybe group |
Since facebook.com is not listed as effective TLD (aka "public suffix"), any subdomain *.facebook.com will indeed be "normalized" to facebook.com. However, t.co is not being "grouped" with twitter.com, as both are entirely different domains. |
Hi @gka alright maybe we could use your list and then customise it with all known social networks domains for example. we'd simply apply the normalisation function in a custom filter, that would |
Listing the referrer websites can be significantly improved by normalizing the domain names. Currently subdomains such as "www7" are treated as separate website. Here's an example of such a referrer list, in which you see that lemonde.fr is listed several times:
[[Image(http://new.tinygrab.com/f3aa221edeba52ea05e91e20b51690a2c38c508b47.png)]]
Of course this is not trivial, as some sub-domains are pointing to separate websites while others are only mirrors or mobile variants of the same site.
To solve this issue, Mozilla maintains a list of "effective" tld names. This list includes domains such as bl0gsp0t.com and dyndns.org, because X.dyndns.org should be treated as a separate websites.
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Using this list it is easy to normalize the domains, or in other words, to extract the "effective" websites. The list is not perfect (for instance tumbr.com is missing) but it should solve 95% of the problem.
The text was updated successfully, but these errors were encountered: