Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category API #6

Open
rviscomi opened this issue Aug 23, 2023 · 8 comments
Open

Category API #6

rviscomi opened this issue Aug 23, 2023 · 8 comments
Assignees

Comments

@rviscomi
Copy link
Member

For feature parity in v1 we'll also need an API to list all of the technologies for each category.

You can see how it works in the existing dashboard:

image

Enter a category name

image

The Technology dropdown updates to display only the technologies of the filtered category

The shape of the API should be an object where the keys are category names and the values are arrays of technologies sorted by popularity:

{
  "Most popular category by total number of origins": [
    "Most popular technology in the category",
    "Second most popular technology",
    "..."
  ],
  "Second most popular category": [
    "..."
  ]
}

Here's an example query to extract the categories:

WITH categories AS (
  SELECT
    category,
    COUNT(DISTINCT root_page) AS origins
  FROM
    `httparchive.all.pages`,
    UNNEST(technologies) AS t,
    UNNEST(t.categories) AS category
  WHERE
    date = '2023-08-01' AND
    client = 'mobile'
  GROUP BY
    category
),

technologies AS (
  SELECT
    category,
    technology,
    COUNT(DISTINCT root_page) AS origins
  FROM
    `httparchive.all.pages`,
    UNNEST(technologies) AS t,
    UNNEST(t.categories) AS category
  WHERE
    date = '2023-08-01' AND
    client = 'mobile'
  GROUP BY
    category,
    technology
)


SELECT
  category,
  categories.origins,
  ARRAY_AGG(technology ORDER BY technologies.origins DESC) AS technologies
FROM
  categories
JOIN
  technologies
USING
  (category)
GROUP BY
  category,
  categories.origins
ORDER BY
  categories.origins DESC

I've formatted the output and saved the results to a static file: https://github.com/HTTPArchive/tech-report-apis/blob/main/static/categories.json

Also available via the CDN: https://cdn.httparchive.org/reports/cwvtech/categories.json

cc @sarahfossheim

@maceto
Copy link
Collaborator

maceto commented Sep 9, 2023

@rviscomi, should we have any mandatory param for this endpoint?

@rviscomi
Copy link
Member Author

I'd say only the category name should be a required parameter, but I'll defer to @sarahfossheim if it'd be useful to have any special behavior when it's omitted. For example, maybe it could list only the category names.

@sarahfossheim
Copy link

We do need to get the list of category names as well (for the category filter dropdown), so that'd be useful yes

@maceto
Copy link
Collaborator

maceto commented Sep 15, 2023

Example of how to consume this endpoint

One category or Multiple categories

curl --request GET \
  --url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=["Blogs"]'
curl --request GET \
 --url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=["Blogs","Domain parking"]'

or for only category names

curl --request GET \
 --url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?onlyname=true'

@rviscomi @sarahfossheim let me know if this is helpful in this way.

@rviscomi
Copy link
Member Author

rviscomi commented Sep 15, 2023

Per our chat, change to (here and other APIs):

https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=Blogs,Domain%20parking

On the frontend we'll need to URL-encode each input param

@maceto
Copy link
Collaborator

maceto commented Sep 16, 2023

@rviscomi @sarahfossheim all the changes discussed are already deployed.

New URL https://dev-gw-2vzgiib6.uk.gateway.dev/v1/categories

Documentation: https://github.com/HTTPArchive/tech-report-apis#get-categories

@maceto
Copy link
Collaborator

maceto commented Dec 4, 2023

Hi @rviscomi

why does the query for categories contain WHERE ... client = 'mobile' ? are there no categories for desktop ?

@rviscomi
Copy link
Member Author

rviscomi commented Dec 4, 2023

Every technology category that exists on desktop pages almost certainly exists on mobile, so this was a small query optimization to avoid processing half the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants