Tor targets #495

hellais · 2022-05-16T16:42:51Z

Upon manual inspection I noticed that 2 things change (it's hard to see
in the diff because we weren't previously sorting keys):

The removal of the smallerRichard bridge as requested by @meskio in remove smallerRichard builtin bridge #493
The id of the bridge 51.222.13.177:80 changed from
fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to
662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The
new ID is correct and it's unclear how this ID was determined in the
original PR: Add new default obfs4 bridge. #463.

The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.

@meskio

Upon manual inspection I noticed that 2 things change (it's hard to see in the diff because we weren't previously sorting keys): 1. The removal of the smallerRichard bridge as requested by @meskio in #493 2. The id of the bridge 51.222.13.177:80 changed from fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to 662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The new ID is correct and it's unclear how this ID was determined in the original PR: #463. The second point might create data analysis issues in the future, but I don't really see a way around it except remembering about this inconsistency so we can manually remap the in the future.

bassosimone · 2022-05-18T09:49:40Z

The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Which would otherwise be the best place to document this oddity in your opinion?

hellais · 2022-05-18T09:53:30Z

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Yeah basically it will mean we won't be able to easily link measurements for this bridge from the past with the future. I don't think it's a huge deal, but it's worth taking note of it.

Which would otherwise be the best place to document this oddity in your opinion?

I think we could maybe add it as a note to the tor ooni/spec document?

bassosimone

Mostly LGTM but there are some talking points that I think we should discuss before proceeding with merging. My main concern here has been to double check the algorithms and suggest ways to either improve the code or document how we could improve the code.

Thank you for moving this work forward ❤️ 🥳 🦜

bassosimone · 2022-05-18T09:50:58Z

ansible/roles/probe-services/templates/tor_targets.json

@@ -1,124 +1,101 @@
 {


Did you perhaps figure out why we have two distinct files for serving the same input? If so, it may perhaps be useful to document this oddity with (1) an issue and (2) a comment in the generating script referencing the issue

I don't know the answer to this. I updated both because I wasn't sure which one is actually used by the backend to provide data to probes.
@FedericoCeratto should know the answer to this.

If one of the two roles is deprecated, I think we should delete it.

I suggest to do the minimum work of adding a note mentioning that one role is deprecated and the other one is the new role so anyone reading is aware of this oddity and calling it a day, then. (If you could have more precise information that would be better but I don't think we should block this PR on more precise information.)

The "probe-services" role is legacy and was used for the backend hosts on Greenhost like hkg-ps.ooni.nu
The new backend uses "ooni-backend"
I suggest deleting the legacy role.

bassosimone · 2022-05-18T09:52:52Z

scripts/make_tor_targets.py

@@ -0,0 +1,114 @@
+import json


Suggested change

import json

"""

This script fetches bridge descriptors from Tor's `moat' bridge API and saves files on

disk in locations that are consumed by the OONI API.

The output data format is compatible with https://github.com/ooni/spec/blob/master/nettests/ts-023-tor.md#expected-inputs.

"""

import json

Also, it would be probably worth mentioning in some readme/manual for deployment what one needs to do when their goal is to update the list of tor bridges we use. (Or, is there some plan to perform this kind of update automatically when we're deploying? Would that be safe and desirable?)

In an ideal world this would be done automatically and periodically on the deployed host. Since I wasn't sure if we wanted to do this I implemented it for the time being as something that somebody has to manually run.

If we are to require manually running it, we should document it's usage in the backend docs, though I am not sure what is the correct place to do so.

We currently have backend related documentation in the following places:

https://github.com/ooni/backend/wiki

https://ooni.github.io/pipeline/

https://github.com/ooni/pipeline#ooni-backend

It's unclear to me which one is the latest and where I should be putting this documentation.

cc @FedericoCeratto

scripts/make_tor_targets.py

bassosimone · 2022-05-18T10:01:02Z

scripts/make_tor_targets.py

+    bridges = {}
+    for b in j["obfs4"]:
+        bridge_id, bd = parse_bridge_line(b)
+        assert bridge_id not in bridges


You should probably explain with a comment why the policy is to fail the script in case of duplicates. (Other policies may be possible, such as using the first or the last entry, but here you chose explicitly to fail the script if that happens, which likely has some background reasoning. Because it may be that you're not the only future person running this script, I was wondering whether it would be best to explain the policy with a comment and provide additional instructions in the same comment regarding what the operator should do in this case.)

It's failing because the data format is a dict, so if there is a duplicate key, you will end up overwriting the previous entry without noticing.

I agree this should be added as a comment. I will do so.

bassosimone · 2022-05-18T10:06:36Z

scripts/make_tor_targets.py

+    res = {}
+    res.update(dir_auths)
+    res.update(bridges)
+


So, this is probably the right place where to document why we have two files using a comment.

scripts/make_tor_targets.py

bassosimone · 2022-05-18T10:25:48Z

scripts/make_tor_targets.py

+    for line in da_lines:
+        da = parse_dirauth(line)
+
+        or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]


Suggested change

or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]

# Note: this algorithm only manipulates IPv4 addresses

or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]

Honestly here I would be happier to use correct parsing of endpoints rather than splitting by : because that would break when using IPv6. I would recommend borrowing more robust code to perform this kind of splitting from here: https://github.com/bassosimone/websteps-illustrated/blob/ef7a88b15ac5d13fb95525551e9cb078853cfa7b/python/websteps.py#L325

bassosimone · 2022-05-18T10:26:42Z

scripts/make_tor_targets.py

+            "protocol": "or_port_dirauth"
+        }
+
+        assert da["dir_address"] not in dir_auths


Same as above, maybe mention what an operator should do in case of error?

Co-authored-by: Simone Basso <[email protected]>

bassosimone · 2022-05-18T10:59:57Z

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Yeah basically it will mean we won't be able to easily link measurements for this bridge from the past with the future. I don't think it's a huge deal, but it's worth taking note of it.

Which would otherwise be the best place to document this oddity in your opinion?

I think we could maybe add it as a note to the tor ooni/spec document?

Sounds good!

Co-authored-by: Simone Basso <[email protected]>

FedericoCeratto · 2022-05-18T11:30:15Z

scripts/make_tor_targets.py

@@ -0,0 +1,114 @@
+import json


This script should go in the API package with its dedicated .timer and .service
With logging and metrics similar to the API so that we can monitor the run and alarm when needed.
The 2 .json files can be removed from here at a later time. The API runner would start without .json files and download the first copy.

Additionally, the script in the API package could be shipped with some basic unit and integ test as a first step.

See: ooni/sysadmin#495

hellais added 2 commits May 16, 2022 18:37

Add script for creating tor_targets files

aaec7fb

bassosimone reviewed May 18, 2022

View reviewed changes

Update scripts/make_tor_targets.py

99adbab

Co-authored-by: Simone Basso <[email protected]>

hellais and others added 4 commits May 18, 2022 13:00

Update scripts/make_tor_targets.py

57da1e5

Co-authored-by: Simone Basso <[email protected]>

Update scripts/make_tor_targets.py

9178678

Co-authored-by: Simone Basso <[email protected]>

Update scripts/make_tor_targets.py

0c4e494

Co-authored-by: Simone Basso <[email protected]>

Update scripts/make_tor_targets.py

c3fcfc0

Co-authored-by: Simone Basso <[email protected]>

FedericoCeratto suggested changes May 18, 2022

View reviewed changes

hellais added 3 commits May 18, 2022 13:32

Improve comments

c104580

Improve typing

c8f653e

Improve the IP parsing code

92812da

bassosimone mentioned this pull request May 18, 2022

Add new default bridge to OONI tests ooni/probe#1915

Closed

hellais added a commit to ooni/spec that referenced this pull request May 18, 2022

Add data analysis consideration related to the changing bridge ID

fe02270

See: ooni/sysadmin#495

hellais mentioned this pull request May 18, 2022

Add data analysis consideration related to the changing bridge ID ooni/spec#240

Merged

hellais added a commit to ooni/spec that referenced this pull request May 18, 2022

Add data analysis consideration related to the changing bridge ID (#240)

d6e8fbe

See: ooni/sysadmin#495

FedericoCeratto merged commit 9d96f6c into master May 23, 2022

FedericoCeratto deleted the tor-targets branch May 23, 2022 11:20

hellais mentioned this pull request May 23, 2022

remove smallerRichard builtin bridge #493

Closed

bassosimone mentioned this pull request May 23, 2022

tor: obfs4 handshake: ntor AUTH mismatch ooni/probe#1986

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tor targets #495

Tor targets #495

hellais commented May 16, 2022 •

edited

Loading

bassosimone commented May 18, 2022

hellais commented May 18, 2022

bassosimone left a comment

bassosimone May 18, 2022

hellais May 18, 2022

bassosimone May 18, 2022 •

edited

Loading

FedericoCeratto May 18, 2022 •

edited

Loading

bassosimone May 18, 2022

bassosimone May 18, 2022

hellais May 18, 2022

bassosimone May 18, 2022

hellais May 18, 2022

bassosimone May 18, 2022

bassosimone May 18, 2022

bassosimone May 18, 2022

bassosimone commented May 18, 2022

FedericoCeratto May 18, 2022 •

edited

Loading

-import json
+"""
+This script fetches bridge descriptors from Tor's `moat' bridge API and saves files on
+disk in locations that are consumed by the OONI API.
+The output data format is compatible with https://github.com/ooni/spec/blob/master/nettests/ts-023-tor.md#expected-inputs.
+"""
+import json

	or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]
	# Note: this algorithm only manipulates IPv4 addresses
	or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]

Tor targets #495

Tor targets #495

Conversation

hellais commented May 16, 2022 • edited Loading

bassosimone commented May 18, 2022

hellais commented May 18, 2022

bassosimone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bassosimone May 18, 2022 • edited Loading

Choose a reason for hiding this comment

FedericoCeratto May 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bassosimone commented May 18, 2022

FedericoCeratto May 18, 2022 • edited Loading

Choose a reason for hiding this comment

hellais commented May 16, 2022 •

edited

Loading

bassosimone May 18, 2022 •

edited

Loading

FedericoCeratto May 18, 2022 •

edited

Loading

FedericoCeratto May 18, 2022 •

edited

Loading