Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: mapping mode #466

Open
PenelopeFudd opened this issue Feb 27, 2023 · 28 comments
Open

Feature request: mapping mode #466

PenelopeFudd opened this issue Feb 27, 2023 · 28 comments

Comments

@PenelopeFudd
Copy link

Hi;

I'm trying to map my network using MTR, and wondered if there was a command line option that would send the minimum number of packets required in order to identify each hop.

For instance, I'm currently using the command line mtr -r -c 5 -j -n -i 0.1 -G 0.1 $host, which sends five requests and exits.

Some hops return ???, which seems to indicate that the router is throttling the number of ICMP responses coming back. One solution is to increase the number of packets sent, but for those hops that sent a response on the first try, the additional packets are a waste. Additionally, some hops won't respond at all, so we should give up after a certain number of tries.

A more advanced mapping algorithm would involve caching, and starting with higher hop numbers first. When you get the ip address of a hop that's in the cache, you could skip the rest of the smaller hops, under the assumption that there is only one route from that hop, and it's already in the cache. For networks that violate that assumption, users wouldn't want to use the cache option.

In my particular case, I'm trying to compare the results produced by using three different VPN servers in different parts of the world, and so I'd use different cache files for each of the maps.

Thanks for considering this, and for producing MTR in general!

@rewolff
Copy link
Collaborator

rewolff commented Feb 27, 2023

I think that this might be done by writing a new application that uses mtr-packet.

On the other hand "if host-is-known, don't send packet" option would be easy to add to mtr.

My first thought was to preload from the cache and then consider those "known" but then you wouldn't find out when to revert to real probing.

Probing in the reverse direction is complex. If you start probing at 32 hops, then you'll find "reached target in 32, 31, 30, 29 ... hops", wasting way more than when you start at the other side.

I get the hint that you're going to automate this and send a LOT of probes. I'd suggest you pass --min-hops (or whatever it is) to skip probing the first few hops once you learn that you'll get the same responses for those. Aim to find an overlap in the first hop (or two) you probe and schedule for retrying with no first hop if you don't get the first hop you expected.

The -c option already limits the amount of probes. If we simply don't probe once we've found a hit, that'd be quite a lot along the way to what you'd want.

@PenelopeFudd
Copy link
Author

Yes, exactly; making the program stop probing when it's found a hit is a straightforward optimization; basically leave the -c option as it is, but add another option to enable this.

I think your idea of preloading from a cache would work: the user would either decide to use a different cache file when they know the network changed, throwing away the cache if it's too old, or the cache would perhaps need to be spot-checked on startup to see if it still describes reality. Actually, rechecking the entire cache would be a useful function too, just to see if the network has been altered since the cache was made, although this starts to wander into "network monitoring" territory. :-)

To reduce wasted packets, it might be possible to examine the TTL on the first "reached target" response packet and use a heuristic to decide how far away the host is, or perhaps do a binary search (1,2,4,8,... or 32, 16, 24, 20, 18...).

@matt-kimball
Copy link
Contributor

If you'd like to write the logic for this process in Python, you might consider using mtr-packet-python: https://github.com/matt-kimball/mtr-packet-python

@rewolff
Copy link
Collaborator

rewolff commented Feb 27, 2023

the option to start at xxx is called: first-ttl

@rewolff
Copy link
Collaborator

rewolff commented Feb 27, 2023

The "cache" doesn't work.

I know for my connection, the first 7 hops leave me in Amsterdam and from there, things usually start to diverge. But that's likely "5 hops" for you, etc. And tracing to say newyork or LA may end up with 10 hops the same as "last trace" (i.e. new york) but then the next one to france (which you may know by hostname: not obvious it is in france) might start to differ at 7 again.

besides, I might every once in a while trace to a host that is hosted on the network of my provider, i.e. it might branch off earlier than those 7 hops.

So... I don't like mtr "guessing" things. It should just be a tool and when it reports something, that should be exactly what it measured.

@PenelopeFudd
Copy link
Author

Oh, I agree; when I usually use mtr, I want it to report what it sees right now, not what it remembers (or imagines). That's what diagnostic tools like mtr are designed for: debugging the problems of the moment.

However, when I'm trying to map my network (using mtr to reach every host I've got, and then using Graphviz to draw a tree diagram), my nearby routers throttle the TTL exceeded messages, and mtr starts showing "???" for the first few hops. In addition, those first hops are placing a significant load on my network but returning the same information, which is a waste.

For long traces that go halfway around the world, any hops that are in "the cloud" are of little interest because they won't be part of the map I'm generating. The routes converge once they leave the cloud at the far datacenter (for example), so it's not a problem if the cloud hops aren't 100% correct.

@rewolff
Copy link
Collaborator

rewolff commented Mar 10, 2023

Then you get to indicate to mtr that you don't need probes sent to hops 1-10.

@PenelopeFudd
Copy link
Author

PenelopeFudd commented Mar 10, 2023

Here's what I came up with. It uses jq and sed, and outputs lines suitable for inclusion in a Graphviz "dot" file:

#!/bin/bash
CACHE=~/.cache/mtr-cached
DEFAULT_MAX=20

if [[ "$1" == "" ]]; then
  echo "Usage: ${0##*/} host"
  echo "  This does a traceroute combined with a cache."
  echo "  The cache is in $CACHE"
  exit 1
fi

if [[ ! -e "$CACHE" ]]; then
  echo "Creating $CACHE"
  mkdir -p "$CACHE"
fi

function do_mtr() {
  if [[ "$EUID" == 0 ]]; then # go faster:
    mtr -r -c 5 -j -n -i 0.1 -G 0.1 "$@"
  else
    mtr -r -c 5 -j -n -i 1 -G 0.1 "$@"
  fi
}

function get_hops() {
  do_mtr "$@" | jq -r '.report.hubs[].host | sub("^[?][?][?]$";"")'
}

function trace_one() {
  DESTINATION="$1"
  route=()
  NEXTHOP="${DESTINATION}"
  for x in $(seq "$DEFAULT_MAX" -1 2); do
    if [[ "$NEXTHOP" != "" && -e "$CACHE/$NEXTHOP" ]]; then
      THISHOP="${NEXTHOP}"
      NEXTHOP=$(cat "$CACHE/$THISHOP")
    else
      hops=( $(get_hops -f $(( x - 1 )) -m $x "$DESTINATION" ) )

      if [[ "${#hops[@]}" == 2 ]]; then
        NEXTHOP="${hops[0]}"
        THISHOP="${hops[1]}"
        echo "${NEXTHOP}" > "$CACHE/$THISHOP"
      fi
    fi
    route[$x]="$THISHOP"
    route[$(( x - 1 ))]="$NEXTHOP"
  done

  routespace=""
  sep=""
  for x in $(seq 1 "$DEFAULT_MAX"); do
    routespace="${routespace}${sep}${route[$x]}"
    sep=" "
    if [[ "${route[$x]}" == "${DESTINATION}" ]]; then break; fi
  done
  # Append destination to last hop:
  routespace="${routespace}(${DESTINATION})"
  # Delete destination if same as last hop:
  routespace=$(echo "$routespace" | sed 's/ \([0-9.][0-9.]*\)[(]\1[)]/ \1/')

  # Turn list of hops into dot syntax:
  echo "${routespace}" | sed 's/^/"/; s/$/";/; s/  */" -> "/g'
}

for n in "$@"; do
  trace_one "$n"
done

Example output:

$ mtr-cached.sh 8.8.8.8

"209.205.110.1" -> "209.205.90.122" -> "142.250.160.222" -> "142.251.229.137" -> "142.251.48.211" -> "8.8.8.8";

@PenelopeFudd
Copy link
Author

It's not the greatest, but it conveys the gist of what I'm trying to achieve.

@yvs2014
Copy link

yvs2014 commented Mar 12, 2023

@PenelopeFudd > nearby routers throttle the TTL exceeded messages, and mtr starts showing "???"

If that's only about to skip first nearby routers, filter '???' hops out, and leave unique ones,
what about in one long line without caching?

$ mtr -f3 -rnC -c5 yahoo.com | tail +2 | cut -d',' -f6 | grep -v '???' | uniq | sed 's/\(.*\)/"\1"/g' | sed ':a; N; $!ba; s/\n/ -> /g; s/$/;/'
"91.210.251.25" -> "91.210.248.77" -> "216.66.93.221" -> "184.104.192.165" -> "72.52.92.166" -> "184.104.197.54" -> "198.32.118.24" -> "209.191.64.155" -> "74.6.227.61" -> "74.6.122.65" -> "74.6.123.242" -> "74.6.98.138" -> "74.6.143.25";

update: plus AS path

$ mtr -y0 -f3 -rnC -c5 yahoo.com | tail +2 | cut -d',' -f7 | grep -v '???' | uniq | sed 's/\(.*\)/"\1"/g' | sed ':a; N; $!ba; s/\n/ -> /g; s/$/;/'
"AS48438" -> "AS6939" -> "AS10310" -> "AS36646";

@PenelopeFudd
Copy link
Author

That's 100% efficient (no hops seen twice) for distant (more than 3 hops away) widely-diverging hosts (i.e. after 3 hops they've got no hosts in common).

I'd like to generalize the solution a bit more, making only the assumption that if the current hop has been seen before, the nearer hops will be identical. It won't be 100% efficient, because you won't know if a hop has been seen before until you see it a second time, but I don't see a better way.

As noted by rewolff, that assumption doesn't always hold true in the cloud, which is one reason why this caching shouldn't be turned on by default. Of course, it doesn't matter if the hops in the cloud change, unless we're mapping the cloud.

The other reason is that diagnostic tools should report what the situation is now, not what the cache says it is (or rather, was). Here, we're trying to use a diagnostic tool as a mapping tool, so that's ok.

@yvs2014
Copy link

yvs2014 commented Mar 12, 2023

if the current hop has been seen before, the nearer hops will be identical

What does it mean identical in this context?
Possible identity (besides ip-address) could be ASN, route-prefix, org-name, or difference of average-delays with some post-processing.
For example, how a path looks like with prefix/asn/org comparing to addresses and delays
2023-03-12_11-25-15

@rewolff
Copy link
Collaborator

rewolff commented Mar 12, 2023

The original assumption in mtr is that you don't know the TTL to the destination, and also want to find out the path a packet will travel. Thus probing from the start of the travels and stopping when you hit the destination is most efficient.

But if you know the TTL to the destination, then you could just as well probe the path in descending order.

So... how about a tool that guesses the TTL to the destination by simply sending it a ping and looking at the TTL of the result? If you guess the starting TTL of the packet, you can calculate the number of hops.

Then we can implement: "If you find a host in the cache, stop probing", and "probe paths starting at the highest TTL".

My guess is that most hosts will nowadays use 64 (IIRC). But we need to check that. So what we'd need is a tool that sends a ping and reports the TTL of the returned packet and then starts mtr with min TTL set to one less than the guessed number and max-TTL set to that number. With the two probes that mtr sends you will often get your answer: Did we guess correctly.

I propose to randomly select an IPV4 address and try this on that host. With all IPV4 addresses having been exhausted, we should have a better than 50% chance of hitting a real host, right? Hmmm.... tested: I pinged 10 random IP addresses and all I got was ONE error. This disputes with 99.9% certainty the hypothesis: the chances are > 50%....
So I don't know how to go about finding "willing" IP4V addresses to participate in our study.

@rewolff
Copy link
Collaborator

rewolff commented Mar 12, 2023

What does it mean identical in this context?

What Charles wants is to make a map of a subset of IPV4 (or -6?) expecting it to be mostly static. What he expects to be static is the info that can be derived from the IP address like the secondary info that MTR can look up for you. He doesn't care about delay times. Those are also assumed to be "mostly static". (e.g. when I trace a server for the domain of a USA college/university I always get first hop .2 ms (local network), then 11 ms to my provider's first host, then a few more ms to amsterdam and possibly 20ms across the ocean. (mit teleported to amsterdam as I was writing this! ... Static my a$$. )

@yvs2014
Copy link

yvs2014 commented Mar 12, 2023

expecting it to be mostly static

Just thought that the more general information, the more static it is and less likely to be changed. Like for the path above:
-- about 15 hops in ip-addresses
-- 8 hops counted in route-prefixes
-- 4 hops in ASNs
-- 3 hops in org-names
-- 2 hops in country-codes

but yeah it doesn't reduce the number of sending packets, so maybe it's not very suitable in this case

@PenelopeFudd
Copy link
Author

Thank you, you've helped me I realize I wasn't thinking about the bigger picture. When I thought of "making a map", I was only considering ipV4 addresses, and ignoring ipV6 addresses, round-trip times, route-prefixes, ASNs, org-names or country-codes. It would be neat to colour the edges or place the nodes based on those values.

Also, when I said "static", I was thinking about how, if you trace the route to the same destination two times, you'd get the same list of hops both times. Another assumption was that, if the routes to two different destinations go through the same hop, then all the nearer hops would be the same as well. These assumptions do not hold true in all cases, but when you're mapping a network that doesn't use multi-path protocols in the parts you're interested in, they work.

There was one more assumption that may or may not make a difference: if you trace a route to a destination, and then trace a route to each of the hops along that route, you'll get the same hops. For example, if the first trace is "A -> B -> C -> D", I'd assume that the route to "C" would be "A -> B -> C", but it may very well be "A -> E -> F -> C". That's why it's probably good to treat destinations and hops differently in the cache.

@rewolff
Copy link
Collaborator

rewolff commented Mar 12, 2023

Why do you think that "destination" vs "hop" would make a difference, but you are assuming dest-A vs dest-B would lead to the same intermediate hops???

You're making (big) assumptions: that the network doesn't change much. This means that if the map is

A-B-C-D-E
.......C-F-G

you're assuming that A->G also goes through B as soon as you see "C" show up.

Then you might as well assume that if you trace to C, it also goes through B.

@PenelopeFudd
Copy link
Author

I'm saying "destination" and "hop" might make a difference because I'm not sure which interface routers might report in their ICMP TTL exceeded message (admin interface?), and because of programs like fakeroute that can twist the answer to be anything. It's an assumption that's probably not important, but I thought I'd throw it out there for completeness.

And yes, I'm assuming the parts of the network I'm interested in don't change at all between when I start using the cache and when I finish, which I'm hoping will be about 5 minutes or less.

The alternative to using a cache is just to retry hops that don't respond a few more times.

@yvs2014
Copy link

yvs2014 commented Mar 13, 2023

@PenelopeFudd

when you're mapping a network that that doesn't use multi-path protocols
not sure which interface routers might report in

traceroute (or mtr) shows ip-addresses of routers from sender's side. It's usually that ip-address of network interface through which traffic is routed to a sender back. So, changes in backward direction can impact to.

The idea to dynamically reduce ttl set in probes to destination during some short period of time (i.e. probing only not responded hops at this time) looks quite reasonable at least in some cases, sorry I haven't got this idea before, thought it's about tracing transit hops directly.

@yvs2014
Copy link

yvs2014 commented Mar 14, 2023

I've tried to add some cache mode inside mtr (to older version so far) with the next algorithm:
if n-th hop is 'up' and seen less than 'cache-timeout' period, then don't ping that hop. Then after 'cache-timeout', probe it again, and so on.
In few tests, looks like it works in this "no ping to known hops" mode. On the picture it is seen in "Snt" column
2023-03-14_17-44-03

@yvs2014
Copy link

yvs2014 commented Mar 15, 2023

adaptation of this cache-mode mentioned in previous message to current mtr
https://github.com/yvs2014/mtr085/tree/mtr20230314-with-cachemode

@PenelopeFudd
Copy link
Author

Looks good, I'll have to give it a try! :-)

@PenelopeFudd
Copy link
Author

Yep, it does what it needs to do!

  • It tries to ping each hop until it gets an answer or runs out of (-c) cycles.
  • When the cache expires for a given hop, it pings again.

The next level of optimization would be caching the same data so that multiple copies of mtr could use it in parallel, or have mtr accept a list of hosts to trace in parallel. We'd still have to come up with a way to decide if any of the cached data contains the answer that you'd get if you pinged a given hop. Not easy. :-/

@yvs2014
Copy link

yvs2014 commented Mar 15, 2023

caching the same data so that multiple copies of mtr could use it in parallel

it's far enough out of scope mtr itself, and a bit complicated because of different context of those copies and their data. I.e. no idea about this kind of synchronization. Maybe someone else will find it out.

@PenelopeFudd
Copy link
Author

I just want to do parallel mtr -j {} :::: inputlist.txt > output.json and not flood our poor router if I can help it. :-)
(BTW, Gnu Parallel rocks!)

@yvs2014
Copy link

yvs2014 commented Mar 16, 2023

not flood our poor router

a bit calculations with the longest imaginable path (30 hops):
-- without cache it's 30 packets per second (30 pps) on permanent basis
-- with cache when 1/6 of hops are silent - 5 pps
both these numbers are not big even for the poorest router in the world, so it's all about how many entries in that inputlist.txt, or something else

On the aim to reduce generated traffic with already available options:   with default 60sec "cache" - it's 1sec for unresponded packets and 60sec for responded ones. If you set '-i 10 --use-cache --cache-timeout 300', it will be 50 times less (0.1pps per target for the above example). And if absolute majority of unresponded packets are designed to nearby routers, then cutting them off from a path can drastically decrease the number of pings (almost to negligible comparing to prenumbers).   I.e. something like '-f3 -i10 --use-cache' would be not too bad option (and a poor router on the hop#1 will be a little less poor, not?)

@PenelopeFudd
Copy link
Author

Oh, I meant that the input list has a couple hundred hosts in it, and starting up that many copies of mtr in parallel would end up getting no response from the first hop for a majority of them. So the -f 3 option would be of great benefit, and I guess I can fix the ??? entries in post-production. :-)

Thanks for the caching changes!

@yvs2014
Copy link

yvs2014 commented Mar 17, 2023

@PenelopeFudd You're welcome
in addition I've combined two cache options in one (-X/--cache SECONDS), maybe it's a bit more common with a short option

@rewolff Could this cache mode be interesting to other people?
In the branch master...yvs2014:mtr085:mtr20230314-with-cachemode
first commit to fix wait-time calculation according first-ttl, other ones are cachemode related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants