Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: explain the throttling methodology #1546

Merged
merged 18 commits into from
Jun 21, 2024
254 changes: 254 additions & 0 deletions docs/design/dd-007-throttling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Throttling measurement methodology

| | |
|--------------|------------------------------------------------|
| Author | [@bassosimone](https://github.com/bassosimone) |
| Last-Updated | 2024-06-21 |
| Reviewed-by | [@DecFox](https://github.com/DecFox) |
| Status | accepted |

This document explains the throttling measurement methodology implemented inside
the [ooni/probe-cli](https://github.com/ooni/probe-cli) repository.

We are publishing this document as part of this repository for discussion. A future
version of this document may be moved into the [ooni/spec](https://github.com/ooni/spec)
repository.

## Problem statement

We are interested to detect cases of _extreme throttling_. We say that throttling is
_extreme_ when the speed to access web resources is _significantly reduced_ (10x or more)
compared to what is _typically_ observed. We care about extreme throttling because we
are interested in cases in which the performance impact is such to make the website
_unlikely_ to work as intended for web users in a country.

Additionally, as recently discussed with [@inetintel](https://github.com/InetIntel/)
researchers et al., we are interested to detect cases of _targeted throttling_. That is
cases where a set of specific websites gets significantly worse performance while the
overall users' internet experience is unchanged. This kind of throttling is in opposition
to _generalized throttling_ where the internet experience is degraded regardless of the
website compared to the previous internet experience (see [Dimming the Internet by Collin
Anderson](https://censorbib.nymity.ch/#Anderson2013a) for seminal work on this topic).

We, and other researchers, have documented extreme, targeted throttling in the
past. See, for example:

1. [our blog post documenting twitter throttling in Russia](
https://ooni.org/post/2022-russia-blocks-amid-ru-ua-conflict/), which is the
first instance in which we tested this methodology.

2. [our blog post documenting throttling in Kazakhstan](
https://ooni.org/post/2023-throttling-kz-elections/).

3. ["Throttling Twitter: an emerging censorship technique in Russia" by Xue et al.](
https://censorbib.nymity.ch/#Xue2021a).

OONI Probe measures websites as part of the [Web Connectivity experiment](
https://github.com/ooni/spec/blob/master/nettests/ts-017-web-connectivity.md) and
these measurements contain peformance metrics.

The next section explains which performance metrics we collect and how these can
be useful to document episodes of extreme, targeted throttling.

## Methodology

The overall idea of our methodology is that, as a first approximation,
we're not concerned with _how_ throttling is implemented, rather we aim to
show clearly-degraded network performance.

We aim to detect such a degradation by comparing metrics collected by OONI Probe instances
running in a country and network with measurements previously collected by users and/or with
concurrent measurements towards different targets.

### Network Events

Web Connectivity v0.5 collects the first 64 [network events](
https://github.com/ooni/spec/blob/master/data-formats/df-008-netevents.md) occurring
on a given TCP connection. These events include "read" and "write" events, which
map directly to network I/O operations (i.e., the `recv` and `send` syscalls
respectively). We focus on throttling in the download direction, therefore we're
mostly interested into "read" events.

The basic structure of a "read" network events is the following:

```JSON
{
"address": "1.1.1.1:443",
"failure": null,
"num_bytes": 4114,
"operation": "read",
"proto": "tcp",
"t0": 1.001,
"t": 1.174,
"tags": [],
"transaction_id": 1,
}
```

Through these events, we know when "read" returned (`t`), for how much time it was blocked
(`t - t0`), and the number of bytes received (`num_bytes`).

The slope of the integral of "read" events, provides information about the speed
at which we were receiving data from the network. Slow downs in the stream either correspond
to reordering and retransmission events (where there is head-of-line blocking) or to
timeout events (where the TCP pipe is empty).

Additionally, network events contain events such as `"tls_handshake_start"` and
`"tls_handshake_done`", which look like the following:

```JSON
{
"address": "1.1.1.1:443",
"failure": null,
"num_bytes": 0,
"operation": "tls_handshake_start",
"proto": "tcp",
"t0": 1.001,
"t": 1.001,
"tags": [],
"transaction_id": 1,
}
```

These events allow us to know when we started and we stopped handshaking.

Now, considering that the amount of bytes transferred by a TLS handshake with the
same server using similar client code is not far from being constant (i.e., it's a relatively
narrow gaussian with small sigma), we can model the problem of TLS handshaking as
the problem of downloading a ~fixed amount of data.

With many users measuring popular websites using OONI Probe in a given country
and network, we can therefore establish comparisons of current performance metrics with
previous performance metrics. In case of extreme throttling, where the download speed
is reduced of at least 10x, we would notice a performance difference. The _time_
required to complete the TLS handshake should be a sufficient metric (and, in fact,
_is_ a performance metric used by speed tests such as
[speed.cloudflare.com](https://speed.cloudflare.com/)).

Additionally, in Web Connectivity v0.5, the "read" events data collection does not
stop after the TLS handshake, therefore, we will have several post-handshake data
points we could also use to make statements about throttling. The size of the webpage
fetched from a given country and network, in fact, should also be pretty constant,
so a reasoning similar to the one made above for the TLS handshake also applies to the
process of handshadking and then downloading a web page. However, because very long
downloads could collect lots of "read" events, and because we want to limit the maximum
amount of "read" events we collected to 64, we have also introduced the following,
complementary metric to investigate throttling.

### Download speed metrics

Web Connectivity v0.5 also collects download speed samples for connections
used to access websites. We use the same methodology used by [ndt7](
https://github.com/m-lab/ndt-server/blob/main/spec/ndt7-protocol.md). We measure
the cumulative number of bytes received by a connection using a truncated exponential
distribution to decide when to collect samples. By not collecting samples at fixed
intervals, we [should have PASTA properties](https://en.wikipedia.org/wiki/Arrival_theorem#Theorem_for_arrivals_governed_by_a_Poisson_process).

The total TLS handshaking, HTTP round trip and body fetching time is bounded by a fixed amount of
seconds (currently ten seconds for the handshake and ten additional seconds for HTTP). Additionally,
there is a cap on the maximum amount of body bytes to download (`1<<19`).

The expected size of a downloaded webpage should be pretty constant for clients
attempting to fetch such a webpage from the same country and network. Therefore, we
can model handshaking plus fetching the body as asking the question of how much
time it takes to download `handshake_size + min(body_size, 1<<19)` bytes in up to
~twenty seconds.

If we assume that the server is not going to throttle downloads (which is still
an hypothesis worth considering), save for some (healthy) packet pacing, significant
changes in the _time_ required to perform the whole set of operations would be
an indication of extreme throttling. However, in using time as the metric, or any
other metric, we need to remember to classify measurements that time out (i.e., are
not able to fetch the whole body) apart from the ones that complete successfully.

Those measurements, in fact, should not be considered "failed" for the purpose of
measuring throttling. Rather, if the TCP connection could progress into the handshake
and possibily into downloading a webpage, these measurements would possibly be
an additional indication of extreme, targeted throttling.

## Discussion

This methodology leverages existing performance metrics inside of Web Connectivity
v0.5 to passively detect extreme throttling. Because this methodology models
the TLS handshake and fetching the body as speed tests, it is, however, not possible
to provide users with clear indication of throttling after a single run. We will,
instead, need to collect several samples over time and cross compare them using
the [ooni/data](https://github.com/ooni/data) measurement pipeline.

More specifically, we would need to compare current measurements with past
measurements collected for the same target website by users living in the same
country and using the same autonomous system. Alternatively, we could compare
measurements collected during the same time frame towards different websites, even
though this signal is weaker because it can just be caused by interconnection
issues. In any case, these considerations imply that our methodology rests
on the assumption that we will have several measurements for the targer websites,
bassosimone marked this conversation as resolved.
Show resolved Hide resolved
and our confidence would clearly lower with little available data.

In analysing the data, it would also be useful to consider the possibility of
checking whether specific HTTP headers or the host name (after a redirect) clearly
indicate specific geographic locations. For example, Cloudflare includes a
`cf-ray` header indicating the specific cache that is serving the content using
the name of the nearest airport.

Additionally, with the availability of
[richer input](https://github.com/ooni/probe-cli/blob/master/docs/design/dd-008-richer-input.md),
it would become possible to
run custom `urlgetter` experiments where we use possibly offending and possibly not
offending SNIs with target addresses and possibly-unrelated addresses, thus giving
us a chance to narrow down the cause of throttling to, say, the SNI being used. This
kind of A/B experiments would basically replicate the functionality of [the prototype
that we originally wrote to investigate throttling](https://github.com/ooni/probe-cli/pull/684).

Throttling could be caused by policers and shapers as well as by forcing specific
users to pass through a congested path. When policers and shapers are used, we
expect the speed to likely converge to predictable values (e.g., 128 kbit/s). On the
contrary, when throttling is driven by congestion, we expect to see higher variance
in the results, possibly correlated with daily usage patterns.

## Digital Divide Implications

By collecting passive performance metrics, we are not only equipped to detect
extreme, targeted throttling, but we are also gathering information about the performance
achievable by clients in several world regions for reaching specific websites. The
availability of HTTP headers and the practice of some CDNs of annotating the
responses with headers indicating which specific cache is being used (as mentioned above
in the case of Cloudflare) could also be exploited to make interesting
digital-divide statements.

## Future Work

With network events, we can also collect some ~baseline RTT samples. The `t - t0` time
of the TCP connect event provides an upper bound of the path RTT _unless_ there is a
retransmission of the `SYN` segment. The TLS handshake also involves sending TCP segments
back and forth in such a fashion that it's possible to extract RTT metrics. Howewer, we
should be careful to exclude segments sent back to back.

In general, detecting more precisely the characteristics of throttling
requires additional research aimed at classifying the stream of events emitted
by a receiving socket under specific throttling conditions. A possible starting
point for this research could be ["Strengthening measurements from the edges:
application-level packet loss rate estimation" by Basso et al.](
https://www.sigcomm.org/sites/default/files/ccr/papers/2013/July/2500098-2500104.pdf).

An alternative approach, already mentioned above, would require the possibility
of providing OONI experiments such as `urlgetter` with
[richer input](https://github.com/ooni/probe-cli/blob/master/docs/design/dd-008-richer-input.md)
parameters that
could provide additional data to answer more-narrow research questions. For example, if
there are reports that a website is throttled by SNI, we could perform a download
from a given test server with certificate verification disabled, using the offending
SNI and an innocuous SNI.

Because HTTP/3 used QUIC and because QUIC operates in userspace, there is
also the possibility of instrumenting the QUIC library to periodically collect
snapshots about the receiver's state. However, in general, sender stats are
much more useful to understand QUIC performance. This fact implies that we could
instrument a QUIC library to observe the sender's state and gather information
about throttling uploads.

Yet, the whole design of Web Connectivity is not
such that we upload resources, therefore we would need to figure out whether
it is possible to overcome this fundamental limitation for HTTP/1.1 and HTTP/2
first. A technique that has sometimes been applied is that of including very
large headers into the request body, even though servers may not
necessarily accept such headers.
Loading