Skip to content

Latest commit

 

History

History
73 lines (59 loc) · 3.24 KB

README.md

File metadata and controls

73 lines (59 loc) · 3.24 KB

topics classifier

This repository reproduces Google's implementations of the Topics API for the Web and for Android. This is mainly used in my research to study the privacy and utility guarantees of these proposals: PETS'24 and SecWeb'24.

Getting started

Clone this repository, then install the required dependencies. A Dockerfile is provided under .devcontainer/, see here for direct integration with VS code or for manual deployment instructions.

Usage

usage: python3 classify.py [-h] -mv {chrome1,chrome4,chrome5,android1,android2} -ct {topics-api,model-only,raw-model} -i INPUTS [INPUTS ...] [-id [INPUTS_DESCRIPTION ...]] [-ohr]

Reimplementations of the Topics API

options:
  -h, --help            show this help message and exit
  -id [INPUTS_DESCRIPTION ...], --inputs_description [INPUTS_DESCRIPTION ...]
                        additional input description(s) (for android classification)
  -ohr, --output_human_readable
                        make output human readable, does not work with --classification-type raw-model

required optional arguments:
  -mv {chrome1,chrome4,chrome5,android1,android2}, --model_version {chrome1,chrome4,chrome5,android1,android2}
                        model version to use
  -ct {topics-api,model-only,raw-model}, -classification_type {topics-api,model-only,raw-model}
                        type of classification: either run the full Topics classification (override+model+filtering), the model only (model+filtering), or get the raw classification by the model
  -i INPUTS [INPUTS ...], --inputs INPUTS [INPUTS ...]
                        input(s) to classify

Supported versions

  • chrome1

    • Web model version: 1
    • Override list: 9 254 domains (about 10k)
    • Web taxonomy version: 1 (349 topics)
  • chrome4

    • Web model version: 4
    • Override list: 47 128 domains (about 50k) -> 625 domains are incorrectly formatted in the list shipped by Google, see here
    • Web taxonomy version: 2 (469 topics)
    • Introduction of utility buckets: version 1
  • chrome5

    • Web model version: 5
    • Override list: 45 270 domains (about 45k)
    • Web taxonomy version: 2 (469 topics)
    • Utility buckets version: 1
    • Note: only change with chrome4 is the modification of the override list, see here
  • android1

    • Android model version: 1
    • Override list: 10 012 apps (about 10k)
    • Android taxonomy version: 1 (349 topics)
  • android2

    • Android model version: 2
    • Override list: 10 014 apps (about 10k)
    • Android taxonomy version: 2 (446 topics)

If a new model for the Topics API has been released and is not available here yet, please let me know by contacting me or opening an issue.