Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy 2024 - CNAMEs and cleanup #117

Merged
merged 12 commits into from
Jun 10, 2024
53 changes: 39 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,50 @@ To add a new custom metric to HTTP Archive:

1. For scripts that return a JSON object, the key should be named according to what it's measuring, for example `meta-nodes` returns an array of all `<meta>` nodes and their attributes:

```js
return JSON.stringify({
'meta-nodes': (() => {
// Returns a JSON array of meta nodes and their key/value attributes.
var nodes = document.querySelectorAll('head meta');
var metaNodes = parseNodes(nodes);

return metaNodes;
})(),

// check if there is any picture tag containing an img tag
'has_picture_img': document.querySelectorAll('picture img').length > 0
});
```
```js
return JSON.stringify({
'meta-nodes': (() => {
// Returns a JSON array of meta nodes and their key/value attributes.
var nodes = document.querySelectorAll('head meta');
var metaNodes = parseNodes(nodes);

return metaNodes;
})(),

// check if there is any picture tag containing an img tag
'has_picture_img': document.querySelectorAll('picture img').length > 0
});
```

2. Test your changes on WPT using the workflow below.

3. Submit a pull request. Include one or more links to test results in your PR description to verify that the script is working.

## Custom WPT data objects

The following objects are available for use in custom metrics:

- `$WPT_REQUESTS` - All request data except for bodies (significantly smaller)
- `$WPT_BODIES` - All request data including bodies in the "response_body" entry
- `$WPT_ACCESSIBILITY_TREE` - Array of the nodes of the Chromium Accessibility tree (with the DOM node info recorded in node_info for each node in the array)
- `$WPT_COOKIES` - Array of cookies set by the page
- `$WPT_DNS` - Array of DNS records for the page

More details can be found in the [WPT custom metrics documentation](https://docs.webpagetest.org/custom-metrics/).

You can explore them by running WPT with the following custom metric:

```js
[custom_wpt_objects]
return {
requests: $WPT_REQUESTS,
bodies: $WPT_BODIES,
accessibility: $WPT_ACCESSIBILITY_TREE,
cookies: $WPT_COOKIES,
dns: $WPT_DNS
};
```

## Testing

### Manual testing using webpagetest.org website
Expand Down
167 changes: 106 additions & 61 deletions dist/privacy.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@
// 3. Test your change by following the instructions at https://github.com/HTTPArchive/almanac.httparchive.org/issues/33#issuecomment-502288773.
// 4. Submit a PR to update this file.

const response_bodies = $WPT_BODIES.filter(body => body.type === 'Document' || body.type === 'Script')
const response_bodies = $WPT_BODIES.filter(body => (body.response_body && (body.type === 'Document' || body.type === 'Script')))

/**
* @function testPropertyStringInResponseBodies
* Test that a JS property string is accessed in response bodies
* (given that wrapping properties to log accesses is not possible as metrics run at the end)
* only in Document and Script resources (HTML/JS)
* inspired by https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/custom_metrics/event-names.js
*
* @param {string} pattern - Regex pattern to match in the response bodies.
Expand All @@ -21,14 +19,7 @@ const response_bodies = $WPT_BODIES.filter(body => body.type === 'Document' || b
function testPropertyStringInResponseBodies(pattern) {
try {
let re = new RegExp(pattern);
return response_bodies
.some(body => {
if (body.response_body) {
return re.test(body.response_body);
} else {
return false;
}
});
return response_bodies.some(body => body.response_body ? re.test(body.response_body) : false);
} catch (error) {
return error.toString();
}
Expand All @@ -41,13 +32,79 @@ return JSON.stringify({
* words = privacy_wording.map(country => country.words).filter((v, i, a) => a.indexOf(v) === i).flat().sort().join('|');
*/
privacy_wording_links: (() => {
let words =
'adatkezelési|adatvédelem|adatvédelmi|andmekaitsetingimused|aviso legal|beskyttelse af personlige oplysninger|cgu|cgv|confidentialitate|confidentialite|confidentialité|confidentialité|confidentialité|confidentialité|confidentialité|confidențialitate|cookie policy|cookie-uri|cookie-urilor|cookiepolitik|cookies|data policy|data policy|data policy|data policy|datapolicy|datapolitik|datenrichtlinie|datenrichtlinie|datenrichtlinie|datenrichtlinie|datenschutz|datenschutz|datenschutz|datenschutz|datenschutzbestimmungen|datenschutzrichtlinie|donnees personelles|gdpr|gegevensbeleid|gegevensbeleid|gizlilik|gizlilik|integritetspolicy|isikuandmete|isikuandmete töötlemise|kasutustingimused|kişisel verilerin korunması|kolačići|konfidencialiteti|konfidentsiaalsuse|kvkk|küpsised|mbrojtja e të dhënave|mentions légales|mentions légales|normativa sui dati|ochrana dat|ochrana osobních údajů|ochrana osobných údajov|ochrana soukromí|ochrana súkromia|ochrana udaju|ochrana údajov|ochrany osobných údajov|osobné údaje|personlige data|personoplysninger|personuppgifter|personvern|persónuvernd|piškotki|piškotkih|podmínky|policy|politica de utilizare|politika e të dhënave|politikat e privatesise|politikat e privatësisë|politique d’utilisation des données|politique d’utilisation des données|politique d’utilisation des données|politique d’utilisation des données|politique d’utilisation des données|política de dados|política de dados|política de datos|política de datos|pravila o upotrebi podataka|privaatsus|privacidad|privacidad|privacidade|privacidade|privacy|privacy|privacy|privacy|privacy|privacy policy|privacybeleid|privacybeleid|privatezza|privatlivspolitik|privatnost|privatnost|privatnosti|privatssphäre|privatumas|privatumo|privatësia|privātuma|privātums|protecció de dades|protecţia datelor|prywatnosci|prywatności|prywatność|regler om fortrolighed|rekisteriseloste|retningslinjer for data|rgpd|sekretess|slapukai|soukromi|soukromí|személyes adatok védelme|súkromie|sīkdatne|sīkdatņu|tietokäytäntö|tietosuoja|tietosuojakäytäntö|tietosuojaseloste|varstvo podatkov|veri i̇lkesi|veri i̇lkesi|veri politikası|vie privée|webbplatsen|yksityisyyden suoja|yksityisyydensuoja|yksityisyys|zasady dotyczące danych|zasebnost|zaštita podataka|zásady ochrany osobných|zásady používání dat|zásady používání dat|zásady využívania údajov|απόρρητο|απόρρητο|πολιτική απορρήτου|πολιτική δεδομένων|προσωπικά δεδομένα|όροι και γνωστοποιήσεις|конфиденциальность|конфіденційність|поверителност|политика за бисквитки|политика за данни|политика использования данных|политика конфиденциальности|политика о подацима|политика о подацима|политика о подацима|политика обработки персональных данных|приватност|приватност|приватност|условия|условия за ползване|מדיניות נתונים|פרטיות|الخصوصية|سياسة البيانات|数据使用政策|數據使用政策|私隱政策|隐私权政策';
let pattern = new RegExp('\\b(?:' + words + ')\\b', 'ig');
const languageKeywords = {
af: "beskyttelse af personlige oplysninger|privatlivspolitik|persondata",
ar: "الخصوصية|سياسة البيانات|سياسة الخصوصية|سياسة الخصوصية والبيانات",
az: "məxfilik|şəxsi məlumatlar",
be: "абарона дадзеных|палітыка прыватнасці",
bg: "поверителност|политика за бисквитки|политика за данни|условия|условия за ползване|политика за поверителност",
bn: "গোপনীয়তা|ডেটা নীতি|গোপনীয়তা নীতি",
bs: "privatnost|politika privatnosti|politika podataka|pravila o privatnosti",
ca: "protecció de dades|política de privacitat",
cs: "ochrana dat|ochrana osobních údajů|ochrana soukromí|ochrana súkromia|ochrana udaju|ochrana údajov|ochrany osobných údajov|podmínky|soukromi|soukromí|zásady používání dat|zásady používání cookies",
da: "cookiepolitik|datapolicy|beskyttelse af personlige oplysninger|personlige data|personoplysninger|privatlivspolitik|regler om fortrolighed",
de: "datenrichtlinie|datenschutz|datenschutzbestimmungen|datenschutzrichtlinie|privatssphäre|cookie-richtlinie|privatsphärenerklärung",
el: "απόρρητο|πολιτική απορρήτου|πολιτική δεδομένων|προσωπικά δεδομένα|όροι και γνωστοποιήσεις|πολιτική cookies",
en: "cookie policy|cookies|data policy|datapolicy|privacy|privacy policy|cookiepolicy",
es: "aviso legal|confidencialidad|confidencialite|confidentialité|política de datos|privacidad|privacidad|politica de datos|política de privacidad|política de cookies",
et: "andmekaitsetingimused|isikuandmete|isikuandmete töötlemise|kasutustingimused|privaatsuspoliitika|andmepoliitika|küpsisepoliitika",
eu: "privatua|datu pertsonalen babesa|datu pertsonalen politika",
fa: "حریم خصوصی|سیاست حفظ حریم خصوصی|سیاست داده|داده های شخصی",
fi: "yksityisyyden suoja|yksityisyydensuoja|yksityisyys|tietokäytäntö|tietosuoja|tietosuojakäytäntö|tietosuojaseloste|evästekäytäntö",
fil: "patakaran sa cookies",
fr: "cgu|cgv|confidentialité|mentions légales|politique d’utilisation des données|rgpd|vie privée|politique de confidentialité|politique de données|politique de cookie",
ga: "beartas príobháideachta|beartas sonraí|beartas fianán|beartas sonraí pearsanta",
he: "מדיניות נתונים|פרטיות",
hi: "गोपनीयता|डेटा नीति|गोपनीयता नीति",
hr: "privatnost|pravila o privatnosti|pravila o podacima|pravila o kolačićima",
hu: "adatvédelem|adatvédelmi|személyes adatok védelme|adatvédelmi nyilatkozat|adatkezelési tájékoztató|cookie-kra vonatkozó irányelv",
id: "integritetspolicy|piškotki|kebijakan privasi",
is: "persónuvernd|persónuverndarstefna",
it: "normativa sui dati|privatezza|informativa sulla privacy|informativa sui dati|informativa sui cookie|politica dei dati|politica dei cookies",
ja: "プライバシー|データポリシー|個人情報保護",
ko: "개인정보|개인정보 처리방침|개인정보 보호정책|개인정보 보호|정보 처리 방침",
ka: "კერძო წამყვანი|პირადი ინფორმაციის დაცვა|პირადი ინფორმაციის პოლიტიკა",
lt: "privatumas|privatumo|slapukai|slapukkih|privatumo politika|duomenų politika|slapukų politika|privatumo pareiškimas",
lv: "sīkdatne|sīkdatņu|privātuma|privātums|privātuma politika|datu politika|sīkdatņu politika|privātuma politikas paziņojums",
mt: "politika dwar il-privatezza|politika tad-data|politika tal-cookies|politika dwar id-dati",
ms: "privasi|polisi data|polisi privasi|data peribadi|terma dan syarat",
nb: "personvern|informasjonskapselregler",
nl: "gegevensbeleid|privacybeleid|cookiebeleid|privacyverklaring",
no: "personvern|personvernerklæring|informasjonskapsler|personvernspolicy",
pl: "prywatnosci|prywatności|prywatność|zasady dotyczące danych|polityka prywatności|polityka danych|polityka plików cookie",
pt: "privacidade|política de privacidade|política de dados|política de cookies",
ro: "confidențialitate|politica de utilizare|protectia datelor|politica de confidențialitate|politica de date|politica cookie",
ru: "конфиденциальность|политика использования данных|политика конфиденциальности|политика данных|политика файлов cookie|персональных данных",
si: "piškotki",
sk: "ochrana osobných údajov|zásady ochrany osobných|zásady používání dat|zásady využívania údajov|zásady ochrany osobných údajov|zásady používania údajov|zásady používania cookies|ochrana údajov",
sl: "piškotki|varstvo podatkov|zasebnost|pravilnik o zasebnosti|pravilnik o podatkih|pravilnik o piškotkih|politika zasebnosti",
sq: "konfidencialiteti|politika e privatësisë|politika e të dhënave personale",
sr: "konfidentsiaalsuse|pravila o upotrebi podataka|privatnost|privatnosti|prywatnosci|prywatności|prywatność|protecţia datelor|политика о подацима|приватност|защита података",
sv: "integritetspolicy|personuppgifter|privatlivspolitik|sekretess|webbplatsen|yksityisyyden suoja|yksityisyydensuoja|yksityisyys|datapolitik",
sw: "política de datos",
tr: "gizlilik|kişisel verilerin korunması|politika e të dhënave|politikat e privatesise|politikat e privatësisë|veri i̇lkesi|veri politikası|gizlilik politikası|veri politikası|çerez politikası",
th: "ความเป็นส่วนตัว|นโยบายความเป็นส่วนตัว|นโยบายข้อมูล|ข้อมูลส่วนบุคคล|เงื่อนไข",
vi: "quyền riêng tư|chính sách bảo mật|chính sách dữ liệu|dữ liệu cá nhân|điều khoản và điều kiện",
uk: "конфіденційність|конфіденційності|політика даних|файлів cookie|персональних даних|захисту даних",
zh: "数据使用政策|隐私政策|数据保护政策|隐私保护政策|數據使用政策|隱私政策|數據保護政策|隱私保護政策"
}
const websiteLanguage = document.documentElement.lang.slice(0, 2).toLowerCase();
if (websiteLanguage == 'en') {
keywords = languageKeywords[websiteLanguage]
} else if (!(websiteLanguage in languageKeywords)) {
keywords = Object.values(languageKeywords).join('|');
} else {
keywords = languageKeywords[websiteLanguage] + '|' + languageKeywords['en']
}
const pattern = new RegExp(`(?:${keywords})`, 'gi');

let privacy_links = Array.from(document.querySelectorAll('a')).map(
a => ({ keywords: a.innerText.match(pattern), text: a.innerText })
).filter(a => a.keywords); // filter out non-matching texts (keywords = null)
const privacy_links = Array.from(document.querySelectorAll('a')).filter(a =>
pattern.test(a.innerText)
).map(
a => ({
text: a.innerText,
})
);

return privacy_links;
})(),
Expand Down Expand Up @@ -151,62 +208,31 @@ return JSON.stringify({
}
})(),

/**
* Ads Transparency Spotlight Data Disclosure schema
* Only for top frame, can't access child frames (same-origin policy)
*/
ads_transparency_spotlight: (() => {
// Check `meta` tag cf. https://github.com/Ads-Transparency-Spotlight/documentation/blob/main/implement.md
meta_tag = document.querySelector('meta[name="AdsMetadata"]');
let ats = {
present: meta_tag != null,
ads_metadata: null,
};
if (ats.present) {
ats.ads_metadata = meta_tag.content;
}
return ats;
})(),

/**
* FLoC (Federated Learning of Cohorts) - deprecated
*
* Test site: https://floc.glitch.me/
*
* @todo Check function/variable accesses through string searches (wrappers cannot be used, as the metrics are only collected at the end of the test)
*/
document_interestCohort: testPropertyStringInResponseBodies('document.+interestCohort'),

/**
* Do Not Track (DNT)
* https://www.eff.org/issues/do-not-track
*/
navigator_doNotTrack: testPropertyStringInResponseBodies('navigator.+doNotTrack'),
navigator_doNotTrack: testPropertyStringInResponseBodies('doNotTrack'),

/**
* Global Privacy Control
* https://globalprivacycontrol.org/
*/
navigator_globalPrivacyControl: testPropertyStringInResponseBodies(
'navigator.+globalPrivacyControl'
'globalPrivacyControl'
),

// Sensitive resources

/**
* Permissions policy
* https://www.w3.org/TR/permissions-policy-1/#introspection
* Previously known as Feature policy
* iframes properties in `almanac` and `security` custom metrics.
*/
document_permissionsPolicy: testPropertyStringInResponseBodies('document.+permissionsPolicy'),

/**
* Feature policy
* (previous name of Permission policy: https://www.w3.org/TR/permissions-policy-1/#introduction)
*/
document_featurePolicy: testPropertyStringInResponseBodies('document.+featurePolicy'),

// Permissions Policy / Feature Policy on iframes already implemented in `security.js` custom metrics.

/**
* Referrer Policy
* https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy
Expand All @@ -218,13 +244,11 @@ return JSON.stringify({
link_relations: null,
};
// Referrer policy set for entire document using `meta` tag
// Test site: https://www.cnet.com/
let referrer_meta_tag = document.querySelector('meta[name="referrer"]');
if (referrer_meta_tag) {
rp.entire_document_policy = referrer_meta_tag.content; // Get policy value
}
// Referrer policy set for individual requests with the `referrerpolicy` attribute
// Test site: https://www.brilio.net/
let referrerpolicy_attributes = document.querySelectorAll('[referrerpolicy]');
// Leave `individual_requests` at `null` if no attributes are found.
if (referrerpolicy_attributes.length > 0) {
Expand All @@ -250,8 +274,8 @@ return JSON.stringify({
[]
);
}

// Referrer policy set for a link using `noreferrer` link relation
// Test site: https://www.cnet.com/
let noreferrer_link_relations = document.querySelectorAll('[rel*="noreferrer"]');
// Leave `link_relations` at `null` if no attributes are found.
if (noreferrer_link_relations.length > 0) {
Expand All @@ -275,13 +299,13 @@ return JSON.stringify({
*/
media_devices: {
navigator_mediaDevices_enumerateDevices: testPropertyStringInResponseBodies(
'navigator.+mediaDevices.+enumerateDevices'
'mediaDevices.+enumerateDevices'
),
navigator_mediaDevices_getUserMedia: testPropertyStringInResponseBodies(
'navigator.+mediaDevices.+getUserMedia'
'mediaDevices.+getUserMedia'
),
navigator_mediaDevices_getDisplayMedia: testPropertyStringInResponseBodies(
'navigator.+mediaDevices.+getDisplayMedia'
'mediaDevices.+getDisplayMedia'
),
},

Expand All @@ -291,10 +315,31 @@ return JSON.stringify({
*/
geolocation: {
navigator_geolocation_getCurrentPosition: testPropertyStringInResponseBodies(
'navigator.+geolocation.+getCurrentPosition'
'geolocation.+getCurrentPosition'
),
navigator_geolocation_watchPosition: testPropertyStringInResponseBodies(
'navigator.+geolocation.+watchPosition'
'geolocation.+watchPosition'
),
}
},

/**
* List of hostnames with CNAME record
*/
request_hostnames_with_cname: (() => {
let results = {};

for (const request of $WPT_REQUESTS) {
request_hostname = (new URL(request.url)).hostname;

for (const [origin, dns_info] of Object.entries($WPT_DNS)) {
dns_hostname = (new URL(origin)).hostname;
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved

if (request_hostname == dns_hostname && request_hostname !== dns_info.results.canonical_names[0]) {
results[dns_hostname] = dns_info.results.canonical_names;
}
}
}

return results;
})()
});