SecurityEngineering/ThirdPartyCookies/Telemetry

From MozillaWiki
Jump to: navigation, search

Defunct: this measurement plan is defunct since Firefox 40 when the Telemetry measures expired.

This is a Telemetry measurement plan for measuring the impact of blocking Third Party Cookies from non-visited sites.

Initial Questions and Answers

Initial whiteboarding session results:
whiteboard.

It's absolutely critical that for the pings we know what the profile's cookie setting is: allow-from-visited, allow-from-firstparty, or allow-all. This is how we will compare effect across cookie-blocking settings.

For each third-party site, how many other first parties embed them and result in cookie traffic?

This question is a little unclear but we need to know how many sites are affected by blocking their cookies on a wide range of first parties. So for each third party site, how many first parties reference them (involving cookies), and how often are their cookies blocked?

To answer

We will obtain two histograms: one to measure breadth of exposure, and one to measure impact of the new default. Site granularity is ETLD+1 (x.y.com and z.y.com are the same for this measurement) since the blocking feature's granularity is the same.

Breadth

First, we will collect a histogram of how many first parties each site shows up on. In depth: the buckets are a count of first parties, and the buckets are filled with each site that was referenced on that many first parties.

Example: socialnetwork.com has a "share" widget that's seen on 30 different web sites. socialnetwork.com is placed in the bucket labeled "30-34".

In the end, we won't be recording the site names, but we will record how many sites fall into each of these categories.

Granularity: Initially, we will create 30 buckets of width 5. The first bucket will be 0-4, and the last bucket will be for sites loaded as third party on 145 or more first parties.

Impact

Second, we will collect a histogram of how many first parties where a given site is not allowed cookies. In depth: the buckets are a count of first parties, and the buckets are filled with each other site that was blocked from cookies on that many first parties.

Example: A user who has visited socialnetwork.com browses and sees it has a "share" widget that's seen on 30 different web sites. socialnetwork.com is placed in the bucket labeled "0-5". For another user who has never visited socialnetwork.com, Firefox blocks cookies to all of the 30 first parties they've visited so it appears in bucket "30-34".

Granularity: Initially, we will create 30 buckets of width 5. The first bucket will be 0-4, and the last bucket will be for sites loaded as third party on 145 or more first parties.

Implementation

Create a hashtable (ht) keyed on third party sites (domain stripped to ETLD+1).

Each entry x of ht is a pair of lists: { blocked, allowed }

  • ht[x].blocked = list of sites (stripped to ETLD+1) where x's set-cookie was blocked as a third party
  • ht[x].allowed = list of sites (stripped to ETLD+1) where x's set-cookie was allowed as a third party


On attempted set-cookie:

 f = firstpartyDomain (stripped to ETLD+1)
 if domain x is thirdparty:
   if set-cookie is blocked by policy:
     ht[x].blocked.appendIfNotExists(f)
   else
     ht[x].allowed.appendIfNotExists(f)

On roll-up (prepare to ping):

 HA, HB = new histograms.
 for each key x in ht:
   HA.incrementBucketFor(ht[x].allowed.length)
   HB.incrementBucketFor(ht[x].blocked.length)

Reset ht and HA, HB per session.

Expected Results

Our hypothesis is that the most widely-third-party sites (those on a large number of other sites) will also be the sites impacted by this change. Some widely-deployed third parties will not be impacted when users have relationships with them.

How many set-cookie attempts did each third-party attempt?

This question is not useful alone, but measures the "persistence" dimension. Together with the "breadth" measurement, we can see which types of sites will be most affected. Will it be sites that show up all over and try setting cookies often? Or will it be sites that only try to set cookies on a few sites?

To answer

We will obtain two histograms: one to measure set-cookie attempts, and one to measure impact of the new default. Site granularity is ETLD+1 (x.y.com and z.y.com are the same for this measurement) since the blocking feature's granularity is the same.

attempts

First, we measure how many sites have a given number of attempts to set or get third-party cookies (in 24 hours). The histogram buckets are "attempts" and accumulate the number of sites that have attempted that many in 24 hours.

Example: User browses to site first.com, and over the course of 24 hours it causes 30 requests to third.com (resulting in set-cookie attempts). first.com is added to the histogram bucket "30-39".

Granularity: Initially, we will create 50 buckets of width 10. The first bucket will be 0-9, and the last bucket will be for sites that have attempted to set third party cookies 500 or more times in the last 24 hours.

impact

Second, we measure how many sites have had a given number of attempts to set or get third-party cookies blocked (in 24 hours). The histogram buckets are "blocked attempts" and accumulate the number of sites that have that many blocked in 24 hours.

Example: User browses to site first.com, and over the course of 24 hours it causes 30 requests to third.com (all with set-cookie attempts that were blocked by the new policy). first.com is added to the histogram bucket "30-39". Another user who has previously visited third.com will have it recorded in the "0-9" bucket.

Granularity: Initially, we will create 50 buckets of width 10. The first bucket will be 0-9, and the last bucket will be for sites that have attempted to set third party cookies 500 or more times in the last 24 hours.

Implementation

Create a hashtable (ht) keyed on third party sites (domain stripped to ETLD+1).

Each entry x of ht is a pair of counts: { blockedAttempts, allowedAttempts }

  • ht[x].blockedAttempts = number of loads where x's set-cookie was blocked as a third party
  • ht[x].allowedAttempts = number of loads where x's set-cookie was allowed as a third party


On attempted set-cookie:

 if domain (ETLD+1) x is thirdparty:
   if set-cookie is blocked by policy:
     ht[x].blockedAttempts++
   else
     ht[x].allowedAttempts++

On roll-up (prepare to ping):

 HA, HB = new histograms.
 for each key x in ht:
   HA.incrementBucketFor(ht[x].allowedAttempts)
   HB.incrementBucketFor(ht[x].blockedAttempts)

Reset ht and HA, HB per session.

Expected Results

Our hypothesis is that the most persistent sites (those with many attempts) will also be the sites impacted by this change.

Future Work

We also want to measure what types of sites are impacted by this.

We can do this by instead of bucketing sites by how many cookies were blocked, we can make the buckets represent an Alexa site categorization and count how many of each category are affected by this third party cookie change. This may have performance impacts, and will require more engineering work -- so we'll do it second.