Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for the HTTP tracker #162

Merged

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Jan 17, 2023

Add tests for the HTTP tracker.

  • Tests for request authentication.
  • Tests for announce request.
  • Tests for scrape request.

These subtasks require discussion and a formal definition of the solution we will implement. I will create new issues for them:

UPDATE: 01-02-2023

… running

Add a communication channel to wait until the new job is running.
This is specially useful for testing, becuase tests need the HTTP server
up and running before making requests.
@josecelano josecelano linked an issue Jan 17, 2023 that may be closed by this pull request
@josecelano josecelano added the Quality & Assurance Relates to QA, Testing, and CI label Jan 17, 2023
@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from 06f5472 to 1ba7f9f Compare January 20, 2023 19:41
@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from 1ba7f9f to 1a558d2 Compare January 20, 2023 19:54
If you try to convert a peer Id into a String it returns an empty
String.

```
let id = peer::Id(*b"-qB00000000000000000");
assert_eq!(id.to_string(), "");
```
@josecelano
Copy link
Member Author

josecelano commented Jan 23, 2023

hi @WarmBeer, It seems the HTTP tracker is returning an empty Peer Id in the announce response

I've added a test in a commit:

#[test]
fn should_be_converted_into_string() {
    let id = peer::Id(*b"-qB00000000000000000");
    assert_eq!(id.to_string(), "");
}

If you try to convert the peer Id into a String you get an empty string. I can fix it to return the same value we return in the API endpoint, which is 2d71423030303030303030303030303030303030

{
  "info_hash": "6d8f96b4e4761a04a3193ed729ebb8993ad17f0e",
  "seeders": 1,
  "completed": 0,
  "leechers": 0,
  "peers": [
    {
      "peer_id": {
        "id": "2d7142343431302d78514c65535f7828622e6c53",
        "client": "qBittorrent"
      },
      "peer_addr": "2.137.87.41:17548",
      "updated": 1674482474001,
      "updated_milliseconds_ago": 1674482474001,
      "uploaded": 0,
      "downloaded": 0,
      "left": 0,
      "event": "Started"
    }
  ]
}

but I do not know if that is the correct value for the HTTP tracker specification. I have not found the original specification for the HTTP tracker. Reading these two pages:

It's unclear which format the peer ID should have in the announce response.

I think in the announce GET request:

http://127.0.0.1:7070/announce?info_hash=%9C8B%22%13%E3%0B%FF%21%2B0%C3%60%D2o%9A%02%13d%22&peer_addr=192.168.1.88&downloaded=0&uploaded=0&peer_id=%2DqB00000000000000000&port=17548&left=0&event=completed&compact=0

The info_hash and peer_id are percent encoded byte arrays (binary data).

Do you think we should use the same format: 2d71423030303030303030303030303030303030?

That's the hex representation of the byte array:

-qB00000000000000000
- = 2d
q = 71
B = 42
0 = 30
...

By the way, I think the problem is we are using two different functions to convert the ID into an String:

impl std::fmt::Display for Id {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut buffer = [0u8; 20];
        let bytes_out = binascii::bin2hex(&self.0, &mut buffer).ok();
        match bytes_out {
            Some(bytes) => write!(f, "{}", std::str::from_utf8(bytes).unwrap()),
            None => write!(f, ""),
        }
    }
}

and:

pub fn get_id(&self) -> Option<String> {
    let buff_size = self.0.len() * 2;
    let mut tmp: Vec<u8> = vec![0; buff_size];
    binascii::bin2hex(&self.0, &mut tmp).unwrap();

    std::str::from_utf8(&tmp).ok().map(std::string::ToString::to_string)
}

The first one is used in the HTTP tracker (announce_handler). I can fix it using the get_id function in the Display trait.

I just want to confirm that 2d71423030303030303030303030303030303030 is the right format for the announce response.

UPDATE: It's not working on the main branch either.

@josecelano
Copy link
Member Author

The original specification for the HTTP tracker is on the BEP 3.

And the "compact" response format on the BRP 23

@josecelano
Copy link
Member Author

hey, @WarmBeer, it seems the only mandatory params for the GET Announce request are:

  • info_hash
  • peer_id
  • port

See this commit.

The official specification BEP 3 says only ip and event are optional.

I tried with another tracker:

https://tracker.gbitt.info/announce?info_hash=%9c8B%22%13%e3%0b%ff!%2b0%c3%60%d2o%9a%02%13d%22&peer_id=-qB4410-I2d*caHj1O~h&port=17548

and you get the following error if you do not provide, for example, the uploaded param.

d14:failure reason16:missing uploadede

I'm writing tests for the current behaviour; once we migrate to Axum, we can change the behaviour if we want. In this case, I would respond with a concrete error like the "tracker.gbitt.info" tracker response. In fact, it's what the specification says you have to do:

Tracker responses are bencoded dictionaries. If a tracker response has a key failure reason, then that maps to a human readable string which explains why the query failed, and no other keys are required. Otherwise, it must have two keys: interval, which maps to the number of seconds the downloader should wait between regular rerequests, and peers. peers maps to a list of dictionaries corresponding to peers, each of which contains the keys peer id, ip, and port, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively. Note that downloaders may rerequest on nonscheduled times if an event happens or they need more peers.

@mickvandijke
Copy link
Member

mickvandijke commented Jan 24, 2023

Hi @josecelano ,

hi @WarmBeer, It seems the HTTP tracker is returning an empty Peer Id in the announce response

That is not expected behaviour for the non-compact response (for the compact response it is). It is not a breaking bug however, since the Peer Id seems to be completely useless for clients.

The info_hash and peer_id are percent encoded byte arrays (binary data).

Do you think we should use the same format: 2d71423030303030303030303030303030303030?

The Vuze Wiki mentions this:

Screenshot 2023-01-24 at 14 09 37

From this I would think that the Peer Id needs to be returned as a non-hexed string. So in the "-qB0000" format I guess.

@josecelano
Copy link
Member Author

Hi @josecelano ,

hi @WarmBeer, It seems the HTTP tracker is returning an empty Peer Id in the announce response

That is not expected behaviour for the non-compact response (for the compact response it is). It is not a breaking bug however, since the Peer Id seems to be completely useless for clients.

The info_hash and peer_id are percent encoded byte arrays (binary data).

Do you think we should use the same format: 2d71423030303030303030303030303030303030?

The Vuze Wiki mentions this:

Screenshot 2023-01-24 at 14 09 37

From this I would think that the Peer Id needs to be returned as a non-hexed string. So in the "-qB0000" format I guess.

OK, thank you @WarmBeer! The BitTorrent client is using an Id like this peer_id=-qB4410-I2d*caHj1O~h in the query param. I suppose clients use 20 bytes that are also valid ASCII chars. The BEP 3 says:

image

The peer is a string, not a byte array. But in the BEP 23 they say:

image

On the other hand, peer_id is bencoded and the Bencode encoding allows you to use any byte value (including values that are not ASCII or UTF8)

image

I have installed this tracker. I've run it locally, and they are using the UFT8 representation: -qB000000000000000004

Request:

http://localhost:8000/announce?info_hash=%9C8B%22%13%E3%0B%FF%21%2B0%C3%60%D2o%9A%02%13d%22&peer_addr=192.168.1.88&downloaded=0&uploaded=0&peer_id=%2DqB00000000000000000&port=17548&left=0&event=completed&compact=0

Response:

d8:completei1e10:incompletei0e8:intervali600e5:peersld2:ip3:::17:peer id20:-qB000000000000000004:porti17548eeee

I'm going to fix it to also use that format.

@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from 30ecd47 to 080f3c4 Compare January 25, 2023 16:37
@josecelano
Copy link
Member Author

josecelano commented Jan 26, 2023

hi @WarmBeer , I've just added new tests for how the tracker assign IP address to peers.

When the tracker is running behind a reverse proxy the tracker assigns the the IP on the X-Forwarded-For header.

The peer_addr function is taking the rightmost IP address:

/// Get `PeerAddress` from `RemoteAddress` or Forwarded
fn peer_addr((on_reverse_proxy, remote_addr, x_forwarded_for): (bool, Option<SocketAddr>, Option<String>)) -> WebResult<IpAddr> {
    if !on_reverse_proxy && remote_addr.is_none() {
        return Err(reject::custom(Error::AddressNotFound));
    }

    if on_reverse_proxy && x_forwarded_for.is_none() {
        return Err(reject::custom(Error::AddressNotFound));
    }

    if on_reverse_proxy {
        let mut x_forwarded_for_raw = x_forwarded_for.unwrap();
        // remove whitespace chars
        x_forwarded_for_raw.retain(|c| !c.is_whitespace());
        // get all forwarded ip's in a vec
        let x_forwarded_ips: Vec<&str> = x_forwarded_for_raw.split(',').collect();
        // set client ip to last forwarded ip
        let x_forwarded_ip = *x_forwarded_ips.last().unwrap();

        IpAddr::from_str(x_forwarded_ip).map_err(|_| reject::custom(Error::AddressNotFound))
    } else {
        Ok(remote_addr.unwrap().ip())
    }
}

Here you can see the test I've added for that beviour.

Shouldn't be the the leftmost IP address @WarmBeer?

I think the leftmost IP address is the original client address and the rightmost IP is the IP of the last proxy in the chain.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For

Maybe It should not even be the first one:

image

@josecelano
Copy link
Member Author

hi @WarmBeer , I've just added new tests for how the tracker assign IP address to peers.

When the tracker is running behind a reverse proxy the tracker assigns the the IP on the X-Forwarded-For header.

The peer_addr function is taking the rightmost IP address:

/// Get `PeerAddress` from `RemoteAddress` or Forwarded
fn peer_addr((on_reverse_proxy, remote_addr, x_forwarded_for): (bool, Option<SocketAddr>, Option<String>)) -> WebResult<IpAddr> {
    if !on_reverse_proxy && remote_addr.is_none() {
        return Err(reject::custom(Error::AddressNotFound));
    }

    if on_reverse_proxy && x_forwarded_for.is_none() {
        return Err(reject::custom(Error::AddressNotFound));
    }

    if on_reverse_proxy {
        let mut x_forwarded_for_raw = x_forwarded_for.unwrap();
        // remove whitespace chars
        x_forwarded_for_raw.retain(|c| !c.is_whitespace());
        // get all forwarded ip's in a vec
        let x_forwarded_ips: Vec<&str> = x_forwarded_for_raw.split(',').collect();
        // set client ip to last forwarded ip
        let x_forwarded_ip = *x_forwarded_ips.last().unwrap();

        IpAddr::from_str(x_forwarded_ip).map_err(|_| reject::custom(Error::AddressNotFound))
    } else {
        Ok(remote_addr.unwrap().ip())
    }
}

Here you can see the test I've added for that beviour.

Shouldn't be the the leftmost IP address @WarmBeer?

I think the leftmost IP address is the original client address and the rightmost IP is the IP of the last proxy in the chain.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For

Maybe It should not even be the first one:

image

hi @WarmBeer, this is an interesting article shared by @da2ce7 :

https://adam-p.ca/blog/2022/03/x-forwarded-for/

I think the reason why we are getting the last IP in the header is that we do not trust any of the IPs in the header. We only trust our own proxy, so we use the IP received by our own proxy. Is that the reasoning behind that decision? If that were the case, it means we would never get the real client IP when there are some other proxies in the middle, even it we potentially trust them, right?

So basically, the config option on_reverse_proxy was not intended to get the real client IP but to replace our proxy IP with the IP received by our proxy, right?

If that is OK, I will document it on the test.

I thought the intended behaviour was just getting the header's originator client IP.

On the other hand, what would be the problem if the client IP is spoofed in the header? In that case, we would simply add a peer with an invalid IP. If we use the last IP, we are going to add a peer that we are sure exists but it could not be the real peer. I suppose adding a random peer is worse that adding a peer that is not a peer but at least has passed a request to the tracker.

@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from ebfb1e3 to 8cad64d Compare January 26, 2023 18:06
@josecelano
Copy link
Member Author

There are new Clippy errors not shown in my local version:

https://github.com/torrust/torrust-tracker/actions/runs/4017899158/jobs/6902857248

I'm using: clippy 0.1.66 (06f049a 2022-10-17)

By the way, I've just realized Clippy was the name of:

image

@mickvandijke
Copy link
Member

I think the reason why we are getting the last IP in the header is that we do not trust any of the IPs in the header. We only trust our own proxy, so we use the IP received by our own proxy. Is that the reasoning behind that decision? If that were the case, it means we would never get the real client IP when there are some other proxies in the middle, even it we potentially trust them, right?

So basically, the config option on_reverse_proxy was not intended to get the real client IP but to replace our proxy IP with the IP received by our proxy, right?

This was indeed my reasoning. We can not trust the entire X-Forwarded-For chain, only the last added. This does indeed have the downside that you can not have multiple proxies in between.

If that is OK, I will document it on the test.

I thought the intended behaviour was just getting the header's originator client IP.

On the other hand, what would be the problem if the client IP is spoofed in the header? In that case, we would simply add a peer with an invalid IP. If we use the last IP, we are going to add a peer that we are sure exists but it could not be the real peer. I suppose adding a random peer is worse that adding a peer that is not a peer but at least has passed a request to the tracker.

So we can either choose to be sure that we receive the actual IP of the peer by selecting the last IP in the chain of the X-Forwarded-For header, with the downside that we can only support one proxy.

Or we choose to support multiple proxies by selecting the first IP in the X-Forwarded-For header chain, but with the downside that the IP can be literally anything supplied by the client. We could of course check that the first string set in the X-Forwarded-For chain is an actual IP, but we won't know if this IP address is the actual IP of the client.

I would prefer to know the actual IP of a client, so that a client can't just spam announce requests for the same info-hash using a different spoofed IP every time. What do you think? Maybe we should think of a totally different solution.

@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from 8cad64d to badb791 Compare January 27, 2023 08:27
```
peer::Id(*b"-qB00000000000000000").to_string()
```

was always returning an empty string.

It has been changed to return the Hex representations of the byte array.
@josecelano
Copy link
Member Author

josecelano commented Jan 27, 2023

hi @WarmBeer, regarding our previous discussion about the Peer Id format in responses:

I've fixed the Display trait in this commit.

Now:

asser_eq!(peer::Id(*b"-qB00000000000000000").to_string(), "2d71423030303030303030303030303030303030")

it returns the hex string for the byte array.

We also use that format when we serialize the peer id into json.

With that change, the HTTP tracker response will contain that hex string instead of empty string.

I have not implemented what we agreed here yet. We agreed on using the string representation, just mapping each byte to the ASCCII char -qB00000000000000000.

The reason is that I've changed my mind. The HTTP tracker returns bencoded responses. And bencoded responses allow any random byte array (not only well-formed UTF8 string). I think we should avoid converting the peer into a string and then bencode the string. We should just send the encoded 20-byte array.

What do you think?

On the other hand, maybe the Display trait should print the byte array without making any conversion. But I'm not sure. Maybe it could try to convert it to a string, and if it is not possible, it could use the hex string. I suppose all clients use peer ids that can be shown as strings even if the protocol accepts any 20-byte array for some reason.

Conclusions

  • We need to decide if we return the bencoded byte array or the bencoded string of the peer id.
  • We need to decide what it's going to show the Display trait.

I would return the bencoded byte array in responses (since we do not know if some client can be using peer ids with non-valid string chars), and I would Display the byte array. I would use other specific methods when you need alternative representations/formats/types.

If you agree, I can make the changes in this PR. If you think we need to discuss it, I will create a new issue. I would prefer to create a new issue to find out what other trackers and clients are doing.

cc @da2ce7

@josecelano
Copy link
Member Author

josecelano commented Jan 27, 2023

@WarmBeer I made a test with https://github.com/webtorrent/bittorrent-tracker

Peer Id with ASCII values (byte<128)

Announce URL:

http://localhost:8000/announce?info_hash=%3B%24U%04%CF%5F%11%BB%DB%E1%20%1C%EAjk%F4Z%EE%1B%C0&peer_addr=2.137.87.41&downloaded=0&uploaded=0&peer_id=-qB00000000000000001&port=17548&left=0&event=completed&compact=0

The peer id: -qB00000000000000001
Bytes (hex): 2d 7142 3030 3030 3030 3030 3030 3030 3030 3030 30

Response (bytes with the bencoded response):

00000000: 6438 3a63 6f6d 706c 6574 6569 3165 3130  d8:completei1e10
00000010: 3a69 6e63 6f6d 706c 6574 6569 3065 383a  :incompletei0e8:
00000020: 696e 7465 7276 616c 6936 3030 6535 3a70  intervali600e5:p
00000030: 6565 7273 6c64 323a 6970 333a 3a3a 3137  eersld2:ip3:::17
00000040: 3a70 6565 7220 6964 3230 3a2d 7142 3030  :peer id20:-qB00
00000050: 3030 3030 3030 3030 3030 3030 3030 3134  0000000000000014
00000060: 3a70 6f72 7469 3137 3534 3865 6565 65    :porti17548eeee

Peer Id with non ASCII values (byte>128)

Announce URL:

http://localhost:8000/announce?info_hash=%3B%24U%04%CF%5F%11%BB%DB%E1%20%1C%EAjk%F4Z%EE%1B%C0&peer_addr=2.137.87.41&downloaded=0&uploaded=0&peer_id=%81%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00&port=17548&left=0&event=completed&compact=0

The peer id: %E2%82%ACqB0000000000000001
Bytes: [129, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Bytes (hex): 81 0000 0000 0000 0000 0000 0000 0000 0000 0000 00

Response (bytes with the bencoded response):

00000000: 6438 3a63 6f6d 706c 6574 6569 3165 3130  d8:completei1e10
00000010: 3a69 6e63 6f6d 706c 6574 6569 3065 383a  :incompletei0e8:
00000020: 696e 7465 7276 616c 6936 3030 6535 3a70  intervali600e5:p
00000030: 6565 7273 6c64 323a 6970 333a 3a3a 3137  eersld2:ip3:::17
00000040: 3a70 6565 7220 6964 3231 3ac2 8100 0000  :peer id21:.....
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 343a 706f 7274 6931 3735 3438 6565 6565  4:porti17548eeee

It is strange that it returns 21 bytes:

Bytes (hex): c2 81 0000 0000 0000 0000 0000 0000 0000 0000 0000 00

It's adding an extra byte, c2 (194), at the beginning.

@josecelano josecelano force-pushed the issue-159-add-tests-for-http-tracker branch from eba03e0 to dc304e7 Compare January 30, 2023 20:01
…follow new scrape conventions

Deserialization from bencoded bytes for announce and scrape request is
done in two phases. First using `serde_bencode::from_bytes` and later
with a custom parser. The reason is the `serde_bencode` crate does not
allow direct deserialization for the strcuts we need.

The strcut resulting from the first deserialization done by
`serde_bencode` is the `DeserializedCompact` and the second one just
`Compact`. So the prefix `Deserialized` is used when the bytes in the
reponse body are converted into a struct.
@josecelano
Copy link
Member Author

This is the final list of tests:

the_http_tracker:
    for_all_config_modes:
        receiving_an_announce_request:
            should_assign_to_the_peer_ip_the_remote_client_ip_instead_of_the_peer_address_in_the_request_param
            should_consider_two_peers_to_be_the_same_when_they_have_the_same_peer_id_even_if_the_ip_is_different
            should_fail_when_a_mandatory_field_is_missing
            should_fail_when_the_downloaded_param_is_invalid
            should_fail_when_the_info_hash_param_is_invalid
            should_fail_when_the_left_param_is_invalid
            should_fail_when_the_peer_id_param_is_invalid
            should_fail_when_the_port_param_is_invalid
            should_fail_when_the_request_is_empty
            should_fail_when_the_uploaded_param_is_invalid
            should_increase_the_number_of_tcp4_announce_requests_handled_in_statistics
            should_increase_the_number_of_tcp4_connections_handled_in_statistics
            should_increase_the_number_of_tcp6_announce_requests_handled_in_statistics
            should_increase_the_number_of_tcp6_connections_handled_in_statistics
            should_not_fail_when_the_compact_param_is_invalid
            should_not_fail_when_the_event_param_is_invalid
            should_not_fail_when_the_peer_address_param_is_invalid
            should_not_increase_the_number_of_tcp6_announce_requests_handled_if_the_client_is_not_using_an_ipv6_ip
            should_not_increase_the_number_of_tcp6_connections_handled_if_the_client_is_not_using_an_ipv6_ip
            should_not_return_the_compact_response_by_default
            should_respond_if_only_the_mandatory_fields_are_provided
            should_return_no_peers_if_the_announced_peer_is_the_first_one
            should_return_the_compact_response
            should_return_the_list_of_previously_announced_peers
            when_the_client_ip_is_a_loopback_ipv4_it_should_assign_to_the_peer_ip_the_external_ip_in_the_tracker_configuration
            when_the_client_ip_is_a_loopback_ipv6_it_should_assign_to_the_peer_ip_the_external_ip_in_the_tracker_configuration
            when_the_tracker_is_behind_a_reverse_proxy_it_should_assign_to_the_peer_ip_the_ip_in_the_x_forwarded_for_http_header
            should_accept_multiple_infohashes
            should_fail_when_the_info_hash_param_is_invalid
        receiving_an_scrape_request:
            should_fail_when_the_request_is_empty
            should_increase_the_number_ot_tcp4_scrape_requests_handled_in_statistics
            should_increase_the_number_ot_tcp6_scrape_requests_handled_in_statistics
            should_return_a_file_with_zeroed_values_when_there_are_no_peers
            should_return_the_file_with_the_complete_peer_when_there_is_one_peer_with_no_bytes_pending_to_download
            should_return_the_file_with_the_incomplete_peer_when_there_is_one_peer_with_bytes_pending_to_download
    configured_as_private:
        and_receiving_an_announce_request:
            should_fail_if_the_peer_authentication_key_is_not_valid
            should_fail_if_the_peer_has_not_provided_the_authentication_key
            should_respond_to_authenticated_peers
        and_receiving_an_scrape_request:
            should_return_the_real_file_stats_when_the_client_is_authenticated
            should_return_the_zeroed_file_when_the_authentication_key_provided_by_the_client_is_invalid
            should_return_the_zeroed_file_when_the_client_is_not_authenticated
    configured_as_whitelisted:
        and_receiving_an_announce_request:
            should_allow_announcing_a_whitelisted_torrent
            should_fail_if_the_torrent_is_not_in_the_whitelist
        and_receiving_an_scrape_request:
            should_return_the_file_stats_when_the_requested_file_is_whitelisted
            should_return_the_zeroed_file_when_the_requested_file_is_not_whitelisted

@josecelano josecelano marked this pull request as ready for review February 1, 2023 14:09
@josecelano josecelano requested a review from da2ce7 February 1, 2023 14:09
@josecelano
Copy link
Member Author

ACK c46e417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quality & Assurance Relates to QA, Testing, and CI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add tests for the HTTP tracker
2 participants