Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tor targets #495

Merged
merged 10 commits into from
May 23, 2022
Merged

Tor targets #495

merged 10 commits into from
May 23, 2022

Conversation

hellais
Copy link
Member

@hellais hellais commented May 16, 2022

Upon manual inspection I noticed that 2 things change (it's hard to see
in the diff because we weren't previously sorting keys):

  1. The removal of the smallerRichard bridge as requested by @meskio in remove smallerRichard builtin bridge #493
  2. The id of the bridge 51.222.13.177:80 changed from
    fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to
    662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The
    new ID is correct and it's unclear how this ID was determined in the
    original PR: Add new default obfs4 bridge. #463.

The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.

hellais added 2 commits May 16, 2022 18:37
Upon manual inspection I noticed that 2 things change (it's hard to see
in the diff because we weren't previously sorting keys):

1. The removal of the smallerRichard bridge as requested by @meskio in #493
2. The id of the bridge 51.222.13.177:80 changed from
fa69b2ee7a7c8975af2452ecd566f67a6459d397a4cefc30be86a670675cdc23 to
662218447d396b9d4f01b585457d267735601fedbeb9a19b86b942f238fe4e7b. The
new ID is correct and it's unclear how this ID was determined in the
original PR: #463.

The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.
@bassosimone
Copy link
Contributor

The second point might create data analysis issues in the future, but I
don't really see a way around it except remembering about this
inconsistency so we can manually remap the in the future.

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Which would otherwise be the best place to document this oddity in your opinion?

@hellais
Copy link
Member Author

hellais commented May 18, 2022

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Yeah basically it will mean we won't be able to easily link measurements for this bridge from the past with the future. I don't think it's a huge deal, but it's worth taking note of it.

Which would otherwise be the best place to document this oddity in your opinion?

I think we could maybe add it as a note to the tor ooni/spec document?

Copy link
Contributor

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM but there are some talking points that I think we should discuss before proceeding with merging. My main concern here has been to double check the algorithms and suggest ways to either improve the code or document how we could improve the code.

Thank you for moving this work forward ❤️ 🥳 🦜

@@ -1,124 +1,101 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you perhaps figure out why we have two distinct files for serving the same input? If so, it may perhaps be useful to document this oddity with (1) an issue and (2) a comment in the generating script referencing the issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer to this. I updated both because I wasn't sure which one is actually used by the backend to provide data to probes.
@FedericoCeratto should know the answer to this.

If one of the two roles is deprecated, I think we should delete it.

Copy link
Contributor

@bassosimone bassosimone May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to do the minimum work of adding a note mentioning that one role is deprecated and the other one is the new role so anyone reading is aware of this oddity and calling it a day, then. (If you could have more precise information that would be better but I don't think we should block this PR on more precise information.)

Copy link
Contributor

@FedericoCeratto FedericoCeratto May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "probe-services" role is legacy and was used for the backend hosts on Greenhost like hkg-ps.ooni.nu
The new backend uses "ooni-backend"
I suggest deleting the legacy role.

@@ -0,0 +1,114 @@
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import json
"""
This script fetches bridge descriptors from Tor's `moat' bridge API and saves files on
disk in locations that are consumed by the OONI API.
The output data format is compatible with https://github.com/ooni/spec/blob/master/nettests/ts-023-tor.md#expected-inputs.
"""
import json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be probably worth mentioning in some readme/manual for deployment what one needs to do when their goal is to update the list of tor bridges we use. (Or, is there some plan to perform this kind of update automatically when we're deploying? Would that be safe and desirable?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world this would be done automatically and periodically on the deployed host. Since I wasn't sure if we wanted to do this I implemented it for the time being as something that somebody has to manually run.

If we are to require manually running it, we should document it's usage in the backend docs, though I am not sure what is the correct place to do so.

We currently have backend related documentation in the following places:

It's unclear to me which one is the latest and where I should be putting this documentation.

cc @FedericoCeratto

scripts/make_tor_targets.py Outdated Show resolved Hide resolved
scripts/make_tor_targets.py Outdated Show resolved Hide resolved
bridges = {}
for b in j["obfs4"]:
bridge_id, bd = parse_bridge_line(b)
assert bridge_id not in bridges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably explain with a comment why the policy is to fail the script in case of duplicates. (Other policies may be possible, such as using the first or the last entry, but here you chose explicitly to fail the script if that happens, which likely has some background reasoning. Because it may be that you're not the only future person running this script, I was wondering whether it would be best to explain the policy with a comment and provide additional instructions in the same comment regarding what the operator should do in this case.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failing because the data format is a dict, so if there is a duplicate key, you will end up overwriting the previous entry without noticing.

I agree this should be added as a comment. I will do so.

res = {}
res.update(dir_auths)
res.update(bridges)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is probably the right place where to document why we have two files using a comment.

scripts/make_tor_targets.py Outdated Show resolved Hide resolved
scripts/make_tor_targets.py Outdated Show resolved Hide resolved
for line in da_lines:
da = parse_dirauth(line)

or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]
# Note: this algorithm only manipulates IPv4 addresses
or_address = da["dir_address"].split(":")[0] + ":" + da["or_port"]

Honestly here I would be happier to use correct parsing of endpoints rather than splitting by : because that would break when using IPv6. I would recommend borrowing more robust code to perform this kind of splitting from here: https://github.com/bassosimone/websteps-illustrated/blob/ef7a88b15ac5d13fb95525551e9cb078853cfa7b/python/websteps.py#L325

"protocol": "or_port_dirauth"
}

assert da["dir_address"] not in dir_auths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, maybe mention what an operator should do in case of error?

@bassosimone
Copy link
Contributor

Wouldn't the new fingerprint make the bridge appear as a new bridge? (Which is the primary key we use for selecting bridges for the purpose of testing?)

Yeah basically it will mean we won't be able to easily link measurements for this bridge from the past with the future. I don't think it's a huge deal, but it's worth taking note of it.

Which would otherwise be the best place to document this oddity in your opinion?

I think we could maybe add it as a note to the tor ooni/spec document?

Sounds good!

@@ -0,0 +1,114 @@
import json
Copy link
Contributor

@FedericoCeratto FedericoCeratto May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script should go in the API package with its dedicated .timer and .service
With logging and metrics similar to the API so that we can monitor the run and alarm when needed.
The 2 .json files can be removed from here at a later time. The API runner would start without .json files and download the first copy.

Additionally, the script in the API package could be shipped with some basic unit and integ test as a first step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants