Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Privacy Sandbox APIs #129

Merged
merged 30 commits into from
Jun 11, 2024

Conversation

Yash-Vekaria
Copy link
Contributor

@Yash-Vekaria Yash-Vekaria commented Jun 6, 2024

@Yash-Vekaria Yash-Vekaria mentioned this pull request Jun 6, 2024
@tunetheweb
Copy link
Member

@Yash-Vekaria @max-ostapenko what's the latest on this PR as need to merge it today if we want it included in the Web Almanac 2024.

@Yash-Vekaria
Copy link
Contributor Author

@Yash-Vekaria @max-ostapenko what's the latest on this PR as need to merge it today if we want it included in the Web Almanac 2024.

@tunetheweb Topics API is ready to merge. Protected Audience API and Attribution APIs have been correctly implemented but I don't have a good example that simulates all the cases, so partial response was recorded on the current test website. I'll let @yohhaan comment if he plans to incorporate any other APIs. @max-ostapenko informed me that he shall be taking a look at the code later today to finalize these APIs and perhaps remove stale code to make the PR ready to merge.

@max-ostapenko
Copy link
Contributor

@tunetheweb let's merge it.
I'll make my PR on top of main then

@yohhaan
Copy link
Member

yohhaan commented Jun 10, 2024

Hi,
I am modifying at the moment the Topics API implementation (some errors in the current one), I will push shortly and add more details about the rest.

- previous implementation of Topics Detection was returning skipObservation = true if nothing detected and so unwanted urls were added to array
- refactored Topics detection:
  - result = array with object {origin, skipObservation
  - escaping special char { and } in regex
  - adding directly to array if anything detected
@yohhaan
Copy link
Member

yohhaan commented Jun 10, 2024

Can we add the following test sites to this PR for unit tests (I am still checking if header detection is working as expected)?

Also, I am not sure the async check for the attestation is working/returning

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of confusion of origins and domains in this. Suggested some clean ups.

dist/privacy-sandbox.js Outdated Show resolved Hide resolved
dist/privacy-sandbox.js Outdated Show resolved Hide resolved
dist/privacy-sandbox.js Outdated Show resolved Hide resolved
dist/privacy-sandbox.js Outdated Show resolved Hide resolved
dist/privacy-sandbox.js Outdated Show resolved Hide resolved
dist/privacy-sandbox.js Outdated Show resolved Hide resolved
Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@max-ostapenko any further comments or you happy to merge?

@tunetheweb
Copy link
Member

This sort of thing is very verbose and needlessly increases the size of custom metrics for all 15 million sites (most of which likely don't use shared storage):

        "sharedStorage": {
            "shared-storage": true,
            "shared-storage-select-url": true,
            "append": [],
            "clear": [],
            "delete": [],
            "set": [],
            "run": [],
            "selectURL": [],
            "addModule": []
        },

Can we exclude empty values in all the data in this custom metric?

I made a similar comment on #125 (review) because if EVERY custom metric adds this sort of thing then the custom metric (already 8.32TB or $50 for a single query) will become too large for anyone to query it without taking out a mortgage.

@yohhaan
Copy link
Member

yohhaan commented Jun 10, 2024

Sounds good, I am on it

This sort of thing is very verbose and needlessly increases the size of custom metrics for all 15 million sites (most of which likely don't use shared storage):

        "sharedStorage": {
            "shared-storage": true,
            "shared-storage-select-url": true,
            "append": [],
            "clear": [],
            "delete": [],
            "set": [],
            "run": [],
            "selectURL": [],
            "addModule": []
        },

Can we exclude empty values in all the data in this custom metric?

I made a similar comment on #125 (review) because if EVERY custom metric adds this sort of thing then the custom metric (already 8.32TB or $50 for a single query) will become too large for anyone to query it without taking out a mortgage.

@yohhaan
Copy link
Member

yohhaan commented Jun 10, 2024

Per @tunetheweb 's comment: I changed the format of the result object to make it smaller if APIs are not called.

@Yash-Vekaria: can you see if you can port the ProtectedAudience and Attribution Reporting APIs to a similar format?

@max-ostapenko
Copy link
Contributor

@yohhaan I couldn't find any way to verify cross-origin files existence.
Fetch doesn't fail because of 404.
Do you have any ideas?

Otherwise I would say we remove the attestation checks from custom metrics completely.

@tunetheweb
Copy link
Member

Same comment, do we need these all to be set to empty values or can we skip them when nothing of interest in them to save storage space/query costs?

        "fencedFrames": [],
        "floc": [],
        "privateAggregation": [],
        "privateStateTokens": [],
        "protectedAudienceAPI": {
            "interestGroups": {
                "joinAdInterestGroup": [],
                "leaveAdInterestGroup": [],
                "updateAdInterestGroups": [],
                "clearOriginJoinedAdInterestGroups": []
            },
            "runAdAuction": [],
            "generateBid": [],
            "scoreAd": [],
            "reportWin": [],
            "reportResult": []
        },
        "sharedStorage": [],
        "storageAccess": [],
        "topicsAPI": [],
        "userAgentClientHints": []

@yohhaan
Copy link
Member

yohhaan commented Jun 11, 2024

I am done with the edits I wanted to do on this custom metric, the tests have started to return results as expected since I fixed the double escape issue for the regex.

If people have more cycles to commit, they could try to find websites to test on for each API and see that the calls are actually correctly detected. I have done a few tests + the ones from the automated GitHub actions and things look good to me for now.
Also, note: some websites listed to test on require user interaction, thus the empty results sometimes.

Thanks to @Yash-Vekaria for all the help along the implementation of this custom metric and to @max-ostapenko as well!

@Yash-Vekaria
Copy link
Contributor Author

Thanks @yohhaan and @max-ostapenko for bringing this to completion.

@tunetheweb All the changes are locked from my side as well. We are ready to merge this PR.

@max-ostapenko Extraction of attestation has been removed as it was working in WPT without Privacy Sandbox API capturing in WPT but not with it on live checks (some async/promise issue thats hard to debug due to different behavior). We have decided to analyse it in analysis phase and remove it in custom metrics altogether.

@tunetheweb
Copy link
Member

I still think that is a lot of extra data to store per URL when in a lot of cases there won’t be any different. Is there no way to only report if the defaults are different?

I tried adding https://www.example.com to see what it would produce and reran the test but it didn’t pick them up. But I’m presuming it would produce the same as the blank ones?

@Yash-Vekaria
Copy link
Contributor Author

@tunetheweb One way to optimize further is that we remove all permission-policy flags (as shown below) because now that I'm thinking more about it, they should always be true (except for "top-level-storage-access"). Using command line activation fo Privacy Sandbox APIs for the browser being used makes these set to true by default.

"permissionsPolicy": {
            "attribution-reporting": true,
            "browsing-topics": true,
            "identity-credentials-get": true,
            "interest-cohort": true,
            "join-ad-interest-group": true,
            "private-aggregation": true,
            "run-ad-auction": true,
            "shared-storage": true,
            "shared-storage-select-url": true,
            "storage-access": true,
            "top-level-storage-access": false
        },

Next, for other APIs, we can have a domain-level mapping where key is domain and the value is a list of APIs used by that domain on the current website, rather than having separate keys for each API and same domain repeating in each of them.

So, reduced output will look like the following for operafootball.com:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false
        "securepubads.g.doubleclick.net": ["runAdAuction", "getHighEntropyValues"],
        "pagead2.googlesyndication.com": ["runAdAuction", "joinAdInterestGroup", "getHighEntropyValues"],
        "www.googletagmanager.com": ["joinAdInterestGroup", "getHighEntropyValues"],
        "googleads.g.doubleclick.net": ["browsingTopicsHeader|true",],
    }
}

@tunetheweb
Copy link
Member

That looks a lot better if you can do that. And happy to include permission policy data if it;'s different from the default (i.e. explicitly blocked) but seems silly to log that same default of all true millions of times.

Copy link

Custom metrics for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_9X_2N

Custom metrics for https://www.operafootball.com/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_2W_2P
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "securepubads.g.doubleclick.net": [
                "runAdAuction",
                "navigator.userAgentData.getHighEntropyValues",
                "fencedFrameJs",
                "sec-browsing-topics|false",
                "attribution-reporting-eligible"
            ],
            "pagead2.googlesyndication.com": [
                "runAdAuction",
                "navigator.userAgentData.getHighEntropyValues",
                "joinAdInterestGroup"
            ],
            "googleads.g.doubleclick.net": [
                "sec-browsing-topics|true",
                "sec-browsing-topics|false",
                "navigator.userAgentData.getHighEntropyValues"
            ],
            "www.googletagmanager.com": [
                "joinAdInterestGroup",
                "navigator.userAgentData.getHighEntropyValues"
            ],
            "fundingchoicesmessages.google.com": [
                "accept-ch|Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Form-Factor, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version"
            ],
            "cdn.ampproject.org": [
                "reportWin"
            ],
            "tpc.googlesyndication.com": [
                "navigator.userAgentData.getHighEntropyValues"
            ],
            "www.gstatic.com": [
                "navigator.userAgentData.getHighEntropyValues"
            ],
            "www.googleadservices.com": [
                "attribution-reporting-eligible",
                "attribution-reporting-register-source|destination=https://homeserve.com|epsilon=undefined",
                "attribution-reporting-register-source|destination=https://fubo.tv|epsilon=undefined",
                "privateAggregation.enableDebugMode",
                "attribution-reporting-register-source|destination=https://t-mobile.com|epsilon=undefined"
            ],
            "pixel.tapad.com": [
                "accept-ch|Sec-CH-UA\nSec-CH-UA-Arch\nSec-CH-UA-Bitness\nSec-CH-UA-Full-Version-List\nSec-CH-UA-Mobile\nSec-CH-UA-Model\nSec-CH-UA-Platform\nSec-CH-UA-Platform-Version\nSec-CH-UA-WoW64",
                "accept-ch|Sec-CH-UA\u0000Sec-CH-UA-Arch\u0000Sec-CH-UA-Bitness\u0000Sec-CH-UA-Full-Version-List\u0000Sec-CH-UA-Mobile\u0000Sec-CH-UA-Model\u0000Sec-CH-UA-Platform\u0000Sec-CH-UA-Platform-Version\u0000Sec-CH-UA-WoW64"
            ]
        }
    }
}
Custom metrics for https://floc.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_KW_2Q
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "floc.glitch.me": [
                "document.interestCohort"
            ]
        }
    }
}
Custom metrics for https://pets-animals-pets-cats.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_G3_2R
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "pets-animals-pets-cats.glitch.me": [
                "document.browsingTopics|false"
            ],
            "topics-demo.glitch.me": [
                "document.browsingTopics|false"
            ]
        }
    }
}
Custom metrics for https://tennis-tennis.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_V5_2S
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "tennis-tennis.glitch.me": [
                "document.browsingTopics|true"
            ],
            "topics-server.glitch.me": [
                "sec-browsing-topics|false"
            ]
        }
    }
}
Custom metrics for https://protected-audience-demo.web.app/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_60_2T
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}
Custom metrics for https://protected-audience-demo-advertiser.web.app/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_DD_2V
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "protected-audience-demo-dsp.web.app": [
                "joinAdInterestGroup"
            ]
        }
    }
}
Custom metrics for https://protected-audience-demo-publisher.web.app/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_3E_2W
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "protected-audience-demo-ssp.web.app": [
                "runAdAuction"
            ]
        }
    }
}
Custom metrics for https://fedcm-rp-demo.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_AV_2X
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "fedcm-idp-demo.glitch.me": [
                "navigator.credentials.get"
            ]
        }
    }
}
Custom metrics for https://fedcm-idp-demo.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_8T_2Y
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": {
            "fedcm-idp-demo.glitch.me": [
                "navigator.credentials.get",
                "IdentityProvider.getUserInfo"
            ]
        }
    }
}
Custom metrics for https://private-state-token-redeemer.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_6V_2Z
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}
Custom metrics for https://private-state-token-issuer.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_PT_30
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}
Custom metrics for https://user-agent-client-hints.glitch.me/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_X6_31
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}
Custom metrics for https://shared-storage-demo.web.app/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_2G_32
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}
Custom metrics for https://www.example.com/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240611_Q8_33
Changed custom metrics values:

{
    "_privacy-sandbox": {
        "top-level-storage-access": false,
        "privacySandBoxAPIUsage": []
    }
}

@Yash-Vekaria
Copy link
Contributor Author

@tunetheweb I have refactored the code as shared in the previous comment. If it looks good, we can merge.

@tunetheweb
Copy link
Member

Thanks for that. Merging!

@tunetheweb tunetheweb merged commit b30c356 into HTTPArchive:main Jun 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants