-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tor targets #495
Tor targets #495
Conversation
Upon manual inspection I noticed that 2 things change (it's hard to see in the diff because we weren't previously sorting keys): 1. The removal of the smallerRichard bridge as requested by @meskio in #493 2. The id of the bridge 51.222.13.177:80 changed from fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to 662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The new ID is correct and it's unclear how this ID was determined in the original PR: #463. The second point might create data analysis issues in the future, but I don't really see a way around it except remembering about this inconsistency so we can manually remap the in the future.
Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?) Which would otherwise be the best place to document this oddity in your opinion? |
Yeah basically it will mean we won't be able to easily link measurements for this bridge from the past with the future. I don't think it's a huge deal, but it's worth taking note of it.
I think we could maybe add it as a note to the tor ooni/spec document? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM but there are some talking points that I think we should discuss before proceeding with merging. My main concern here has been to double check the algorithms and suggest ways to either improve the code or document how we could improve the code.
Thank you for moving this work forward ❤️ 🥳 🦜
@@ -1,124 +1,101 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you perhaps figure out why we have two distinct files for serving the same input? If so, it may perhaps be useful to document this oddity with (1) an issue and (2) a comment in the generating script referencing the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the answer to this. I updated both because I wasn't sure which one is actually used by the backend to provide data to probes.
@FedericoCeratto should know the answer to this.
If one of the two roles is deprecated, I think we should delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to do the minimum work of adding a note mentioning that one role is deprecated and the other one is the new role so anyone reading is aware of this oddity and calling it a day, then. (If you could have more precise information that would be better but I don't think we should block this PR on more precise information.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "probe-services" role is legacy and was used for the backend hosts on Greenhost like hkg-ps.ooni.nu
The new backend uses "ooni-backend"
I suggest deleting the legacy role.
@@ -0,0 +1,114 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import json | |
""" | |
This script fetches bridge descriptors from Tor's `moat' bridge API and saves files on | |
disk in locations that are consumed by the OONI API. | |
The output data format is compatible with https://github.com/ooni/spec/blob/master/nettests/ts-023-tor.md#expected-inputs. | |
""" | |
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be probably worth mentioning in some readme/manual for deployment what one needs to do when their goal is to update the list of tor bridges we use. (Or, is there some plan to perform this kind of update automatically when we're deploying? Would that be safe and desirable?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an ideal world this would be done automatically and periodically on the deployed host. Since I wasn't sure if we wanted to do this I implemented it for the time being as something that somebody has to manually run.
If we are to require manually running it, we should document it's usage in the backend docs, though I am not sure what is the correct place to do so.
We currently have backend related documentation in the following places:
- https://github.com/ooni/backend/wiki
- https://ooni.github.io/pipeline/
- https://github.com/ooni/pipeline#ooni-backend
It's unclear to me which one is the latest and where I should be putting this documentation.
scripts/make_tor_targets.py
Outdated
bridges = {} | ||
for b in j["obfs4"]: | ||
bridge_id, bd = parse_bridge_line(b) | ||
assert bridge_id not in bridges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably explain with a comment why the policy is to fail the script in case of duplicates. (Other policies may be possible, such as using the first or the last entry, but here you chose explicitly to fail the script if that happens, which likely has some background reasoning. Because it may be that you're not the only future person running this script, I was wondering whether it would be best to explain the policy with a comment and provide additional instructions in the same comment regarding what the operator should do in this case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's failing because the data format is a dict, so if there is a duplicate key, you will end up overwriting the previous entry without noticing.
I agree this should be added as a comment. I will do so.
res = {} | ||
res.update(dir_auths) | ||
res.update(bridges) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is probably the right place where to document why we have two files using a comment.
scripts/make_tor_targets.py
Outdated
for line in da_lines: | ||
da = parse_dirauth(line) | ||
|
||
or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"] | |
# Note: this algorithm only manipulates IPv4 addresses | |
or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"] |
Honestly here I would be happier to use correct parsing of endpoints rather than splitting by :
because that would break when using IPv6. I would recommend borrowing more robust code to perform this kind of splitting from here: https://github.com/bassosimone/websteps-illustrated/blob/ef7a88b15ac5d13fb95525551e9cb078853cfa7b/python/websteps.py#L325
scripts/make_tor_targets.py
Outdated
"protocol": "or_port_dirauth" | ||
} | ||
|
||
assert da["dir_address"] not in dir_auths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, maybe mention what an operator should do in case of error?
Co-authored-by: Simone Basso <[email protected]>
Sounds good! |
Co-authored-by: Simone Basso <[email protected]>
Co-authored-by: Simone Basso <[email protected]>
Co-authored-by: Simone Basso <[email protected]>
Co-authored-by: Simone Basso <[email protected]>
@@ -0,0 +1,114 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script should go in the API package with its dedicated .timer and .service
With logging and metrics similar to the API so that we can monitor the run and alarm when needed.
The 2 .json files can be removed from here at a later time. The API runner would start without .json files and download the first copy.
Additionally, the script in the API package could be shipped with some basic unit and integ test as a first step.
Upon manual inspection I noticed that 2 things change (it's hard to see
in the diff because we weren't previously sorting keys):
fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to
662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The
new ID is correct and it's unclear how this ID was determined in the
original PR: Add new default obfs4 bridge. #463.
The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.