Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy 2020 #913

Closed
10 tasks done
foxdavidj opened this issue Jun 27, 2020 · 29 comments
Closed
10 tasks done

Privacy 2020 #913

foxdavidj opened this issue Jun 27, 2020 · 29 comments
Assignees
Labels
2020 chapter Tracking issue for a 2020 chapter writing Related to wording and content

Comments

@foxdavidj
Copy link
Contributor

foxdavidj commented Jun 27, 2020

Part II Chapter 10: Privacy

Content team

Authors Reviewers Analysts Draft Queries Results
@ydimova @ldevernay @ydimova @max-ostapenko Doc *.sql Sheet

Content team lead: @ydimova

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

  • Jul 6th: Project owners have selected an author to be the content team lead
  • Jul 13th: The content team has at least one author, reviewer, and analyst (minimally viable team formed)

1. Plan content

  • Jul 20th: The content team has completed the chapter outline in the draft doc
  • Jul 27th: Analysts have triaged the feasibility of all proposed metrics

2. Gather data

  • Aug 1 - 31: August crawl
  • Sep 7th: Analysts have queried all metrics and saved the output to the results sheet

3. Validate results

4. Draft content

  • Nov 12th: Authors have completed the first draft in the doc
  • Nov 26th: The content team has prototyped all data visualizations

5. Publication

  • Nov 26th: The content team has reviewed the final draft, converted to markdown, and filed a PR to add it to the 2020 content directory
  • Dec 9th: Target launch date
@foxdavidj foxdavidj added help wanted Extra attention is needed analysis Querying the dataset writing Related to wording and content labels Jun 27, 2020
@foxdavidj foxdavidj added this to the 2020 Content Planning milestone Jun 27, 2020
@tunetheweb
Copy link
Member

Great (and very relevant!) topic. Should it be merged with Cookies though as often quite related? Or do we think there's enough for them both to be their own chapter?

@rviscomi rviscomi added the 2020 chapter Tracking issue for a 2020 chapter label Jun 27, 2020
@foxdavidj
Copy link
Contributor Author

foxdavidj commented Jun 28, 2020

@bazzadp Good q. I think this is something that will be made clear as we brainstorm what metrics should belong in this chapter. If there ends up being a lot of overlap and not enough unique to this chapter alone... then we can talk about merging it with another chapter like Cookies

That's what we're doing with some other chapters like JAMstack

@ldevernay
Copy link
Contributor

I think I could contribute as a reviewer on this topic.

@zcorpan
Copy link
Contributor

zcorpan commented Jul 1, 2020

I nominate @johnwilander (see #876 )

@rviscomi rviscomi added help wanted: reviewers This chapter is looking for reviewers help wanted: analysts This chapter is looking for data analysts and removed help wanted Extra attention is needed labels Jul 2, 2020
@ydimova
Copy link
Contributor

ydimova commented Jul 6, 2020

I would like to volunteer as an analyst/author.

@rviscomi
Copy link
Member

rviscomi commented Jul 8, 2020

Thanks @ydimova! I'll put you down as an analyst. Would you also mind sharing some of your qualifications/experience with web privacy? I'm not able to find much info from a cursory search, as your full name is not associated with your GitHub profile. Just want to check before assigning the chapter to you :)

@johnwilander are you interested in coauthoring this chapter?

@ydimova
Copy link
Contributor

ydimova commented Jul 11, 2020

@rviscomi Of course! I'm a computer scientist and PhD student in web privacy and security (no publications yet). I have been using the httparchive dataset for some of my research so I think I could contribute as an analyst and coauthor.

@rviscomi rviscomi removed help wanted: analysts This chapter is looking for data analysts help wanted: reviewers This chapter is looking for reviewers labels Jul 11, 2020
@rviscomi
Copy link
Member

Thanks @ydimova, you sound like a great fit for this chapter! I've added you as an author. Can I also put you down as the content team lead? You'd be the point person for keeping the chapter on schedule. You're also free to add people as coauthors/reviewers as needed.

A few resources to get you started:

I've also added @ldevernay as a reviewer.

@johnwilander we'd still love to have you contribute as a coauthor/reviewer. Let us know!

@ydimova
Copy link
Contributor

ydimova commented Jul 13, 2020

@rviscomi Sure, thanks!

@foxdavidj
Copy link
Contributor Author

Hey @ydimova, just wanted to check in and see if there's anything you need from me to keep things moving forward.

We're tying to have the outline and metrics settled on by the end of the week so we have time to configure the Web Crawler to track everything you need :)

Also, can you remind your team to properly add and credit themselves in your chapter's Google Doc?

@tunetheweb
Copy link
Member

Unfortunately we've had to close the Cookie chapter, but think it's heavily related to Privacy anyway so that's another interesting angle to cover in this chapter if you want!

@max-ostapenko
Copy link
Contributor

Migrating to this one from Cookie chapter as an analyst ;)

@foxdavidj
Copy link
Contributor Author

@ydimova How is the outline coming along? We want to have that wrapped up by the end of the week so we have time to set up our Web Crawler :)

@foxdavidj
Copy link
Contributor Author

@ydimova Also don't forget to join the #web-almanac slack if you haven't already so @paulcalvano can invite you to the Analysts channel and help set you up.

@rockeynebhwani
Copy link
Contributor

@ydimova - As cookies chapter got closed, this may be of interest to this group - https://github.com/AliasIO/wappalyzer/issues/3219

It will be good to report on % of sites using cookie consent management solutions (obviously within EU, we will see higher %) and out of sites using cookie consent management solutions, how many are using explicit Vs implicit consent? Because of the way, HTTPArchive works, any sites with explicit consent will have less number of third parties reported and that can also impact Third Party chapter of Web Almanac. I am not sure if you are thinking in that direction OR any analysis has been done before.

@rviscomi / @simonhearne / @patrickhulce

@ydimova
Copy link
Contributor

ydimova commented Jul 17, 2020

@rockeynebhwani I think it would indeed be interesting to measure the percentage of websites using popular and less popular cookie consent managament platforms and IAB Europe's TCF (if feasible).

@max-ostapenko @ldevernay Could you join the outline document :)
https://docs.google.com/document/d/1hIllsWd_IqfYuGT_qUFA2ruoQaIvcbuYpNHJLB4AqkU
Feel free to change/add anything

@rviscomi
Copy link
Member

@ydimova I've sent you an invite to join the 2020 Authors team, which we'll use to communicate to authors about upcoming milestones. Could you visit https://github.com/HTTPArchive to accept the invitation? I want to make sure you're included in our messages :)

@rockeynebhwani
Copy link
Contributor

@ydimova @bazzadp @rviscomi - Created a PR for Wappalyzer - https://github.com/AliasIO/wappalyzer/pull/3227. This team can add more technology vendors to the list on top of this.

@foxdavidj
Copy link
Contributor Author

@ydimova @max-ostapenko Noticed there are a few metrics you might need custom metrics written for (e.g., finding policy links on a webpage). Can you make a list of what custom metrics you need by EOD tomorrow?

@rockeynebhwani
Copy link
Contributor

@rockeynebhwani I think it would indeed be interesting to measure the percentage of websites using popular and less popular cookie consent managament platforms and IAB Europe's TCF (if feasible).

@max-ostapenko @ldevernay Could you join the outline document :)
https://docs.google.com/document/d/1hIllsWd_IqfYuGT_qUFA2ruoQaIvcbuYpNHJLB4AqkU
Feel free to change/add anything

@ydimova - I have done the PR and now Wappalyzer has a new category called 'Cookie Compliance' and for now I managed to add 17 vendors to this category. But while working on this, I realized that there are way too many vendors in this space and I don't have time to add all. I am not familier with IAB Europe's TCF solution but if you can tell me a pattern using which we can detect and a sample site, I think we still have time to add to my PR.

@ydimova
Copy link
Contributor

ydimova commented Jul 27, 2020

@rockeynebhwani Great, thank you! Maybe we could just stick to the biggest vendors?
The presence of the TCF framework can easily be evaluated by detecting the presence of a __cmp() function in the window element.
For instance 'typeof window.__cmp()!== "undefined"' would work.

@rockeynebhwani
Copy link
Contributor

Thanks @ydimova . Can you please give me an example site to test TCF framework?

@max-ostapenko
Copy link
Contributor

max-ostapenko commented Jul 27, 2020

@rockeynebhwani FYI here are official framework docs from IAB: https://iabeurope.eu/tcf-2-0/
The test website will help indeed.
I was looking into vendor domain lists (e.g. https://vendorlist.consensu.org/vendorinfo.json), but no valuable data as of now.

@ydimova
Copy link
Contributor

ydimova commented Jul 27, 2020

@rockeynebhwani @max-ostapenko I found it on https://www.letudiant.fr/ by calling "window.__cmp" (without the brackets). https://www.senscritique.com/ is another one

@foxdavidj
Copy link
Contributor Author

@ydimova @max-ostapenko for the two milestones overdue on July 27 could you check the boxes if:

  • the outline has been reviewed and all feasible metrics have been identified
  • any necessary custom metrics have been created and you've created a draft PR to track which feasible metrics have had their queries implemented (we've updated the milestone description to clarify this)

Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!

@max-ostapenko max-ostapenko linked a pull request Jul 30, 2020 that will close this issue
10 tasks
@max-ostapenko max-ostapenko removed a link to a pull request Jul 30, 2020
10 tasks
@max-ostapenko max-ostapenko mentioned this issue Aug 14, 2020
10 tasks
@foxdavidj
Copy link
Contributor Author

I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:

  1. Enable authors/reviewers to analyze the results for each metric without running the queries themselves
  2. Generate data visualizations to be embedded in the chapter
  3. Serve as a public audit trail of this chapter's data collection/analysis, linked from the chapter footer

@foxdavidj
Copy link
Contributor Author

@ydimova in case you missed it, we've adjusted the milestones to push the launch date back from November 9 to December 9. This gives all chapters exactly 7 weeks from now to wrap up the analysis, write a draft, get it reviewed, and submit it for publication. So the next milestone will be to complete the first draft by November 12.

However if you're still on schedule to be done by the original November 9 launch date we want you to know that this change doesn't mean your hard work was wasted, and that you'll get the privilege of being part of our "Early Access" launch.

Please see the link above for more info and reach out to @rviscomi or me if you have any questions or concerns about the timeline. We hope this change gives you a bit more breathing room to finish the chapter comfortably and we're excited to see it go live!

@rviscomi rviscomi added ASAP This issue is blocking progress and removed analysis Querying the dataset labels Nov 30, 2020
@max-ostapenko
Copy link
Contributor

@ydimova FYI There is a cookie parameters data in Security chapter, in case you wanted to share some insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2020 chapter Tracking issue for a 2020 chapter writing Related to wording and content
Projects
None yet
Development

No branches or pull requests

8 participants