Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaxMind DB for Location #492

Open
JDarzan opened this issue Oct 6, 2023 · 9 comments
Open

MaxMind DB for Location #492

JDarzan opened this issue Oct 6, 2023 · 9 comments

Comments

@JDarzan
Copy link

JDarzan commented Oct 6, 2023

Hello everyone,

I've been considering the possibility of enhancing the results with more detailed insights. It might be beneficial to integrate the MaxMind's MMDB to gain accurate information on IP locations and ASN details.
Something, for instance, for us to use as a parameter --mmdb GeoIP2-City.mmdb

Such an integration could offer a more detailed and enriched view of the hops, especially for those wishing to see the location in real-time. Is there any ongoing discussion or plan regarding this?

Thank you in advance for considering this suggestion, and I look forward to community feedback.

Hope that helps with your communication with the repository!

@rewolff
Copy link
Collaborator

rewolff commented Oct 7, 2023

I am open to patches that implement this.

I think your approach of specifying the database to use is the right approach.

On the other hand, having a default that works and just a flag to enable it might be better. Then I can use MTR_OPTIONS environment variable to set my default geoip provider and then enable it with a simple flag instead of having to type my provider everytime I want to use it.

@yvs2014
Copy link

yvs2014 commented Oct 7, 2023

Just btw, does GeoIP2 have online API? (If so, are there some restrictions like number of queries, etc.)
Comparing to (idk, for example, to IP-API) is it provided more data?

p.s. some info from there for example
                                         mtr-0.85 -fa -y5,2,3,5,6,12,13 yahoo.com: in pause
Keys: hints quit                                                                                    rhome: Sat Oct  7 18:33:14 2023
                                                                                          Packets      Pings
    CC RC City     Zip   AS Name                       Host                               Loss   Snt   Last   Avg  Best  Wrst StDev
15. US NY Lockport 14095 AS26101 Oath Holdings Inc.    media-router-fp73.prod.media.vip.    0%   283    115   115   114   118   0.3

@JDarzan
Copy link
Author

JDarzan commented Oct 10, 2023

@rewolff
Thank you for your response. I am running some tests and should be releasing a patch and will notify you about it.

Indeed, your approach on how to capture IP information by setting it in the MTR_OPTIONS environment variable provides more autonomy over the queries.

We only have one issue: due to the different data architecture from each provider, I need to think about a better approach for these situations, such as searching locally in the mmdb or externally through an API.

@JDarzan
Copy link
Author

JDarzan commented Oct 10, 2023

@yvs2014
Yhap... there are limitations even for paid accounts!
ipinfo.io, MaxMind (GeoIP2), IP2Location, ipstack, ip-api and others

For exemple with my account on ipinfo.io:
500k lookups
$60 per additional 50K lookups

Result of API Business plan have more details:

{
  "ip": "40.107.218.61",
  "city": "San Antonio",
  "region": "Texas",
  "country": "US",
  "loc": "29.4241,-98.4936",
  "postal": "78205",
  "timezone": "America/Chicago",
  "asn": {
    "asn": "AS8075",
    "name": "Microsoft Corporation",
    "domain": "microsoft.com",
    "route": "40.104.0.0/14",
    "type": "business"
  },
  "company": {
    "name": "Microsoft Corporation",
    "domain": "microsoft.com",
    "type": "business"
  },
  "privacy": {
    "vpn": false,
    "proxy": false,
    "tor": false,
    "relay": false,
    "hosting": false,
    "service": ""
  },
  "abuse": {
    "address": "US, WA, Redmond, One Microsoft Way, 98052",
    "country": "US",
    "email": "[email protected]",
    "name": "Microsoft Abuse Contact",
    "network": "40.74.0.0-40.125.127.255",
    "phone": "+1-425-882-8080"
  },
  "domains": {
    "total": 0,
    "domains": []
  }
}

@JDarzan
Copy link
Author

JDarzan commented Oct 10, 2023

@yvs2014

A good approach would be for you to maintain a local SQLite database and before querying ipinfo (maybe), you first check the IP in your SQLite database and store the result from ipinfo (maybe) in this database, keeping it updated.

So, in case the IP is repeated, you won't need to make a new request

@yvs2014
Copy link

yvs2014 commented Oct 11, 2023

@JDarzan
− there's not so many queries in an ordinary trace
− in case of using dns api, those replies are usualy cached by local dns server
in many cases it's enough

@rewolff
Copy link
Collaborator

rewolff commented Oct 11, 2023

sqlite is "low overhead". Just a library putting things in a file with a way to access them as if it were a database.
One probem with sqlite is that you need to specify an expected maximum number of entries in the database beforehand. It will handle more, no problem, but it will become slow. Maybe in our case not a problem. but for E2FSCK that WAS a problem (I fixed the problem before the original fsck would have finished. Fixed it still took around 24h...). For "normal" people say 100 hosts in the database might be enough. Not being too wasteful, you'd initialize it to 1000. But then someone doing a wide scan will run into the not O(1) but O(N^2) problems of exceeding the initial "max items" estimate....

I'm not impressed by the caching of DNS servers. I get the impression that this often doesn't work somehow. (not working includes reporting "nope, no such host" while it is still waiting for a reply from the other end, and when the reply does come in it retries the whole request forwarding and again reports "nope" before the answer comes in. Stuff like that.).

For MTR I think adding another dependency is the most important part. I don't like it.

Having a fallback if sqlite is not available sounds like an option to me:

// pseudocode
void add_stuff_to_database (ipaddr, data)
{
    fp=  fopen (thedatabasefile, APPEND); 
   fprintf (fp, "%s  %s", format_ip (ipaddr), data); 
   close (fp); 
}
char *get_data_from_database (ipaddr, data)
{ 
  fp=  fopen (thedatabasefile, APPEND); 
  while (fgets (fp, buf, 1024) != NULL) {
     sscanf (buf, "%s %s" , ipaddr2, data2); 
     if (ipaddr == ipaddr2) {
        strcpy (data, data2); 
         fclose (fp);
        return data;
  }
  fclose (fp);
  return NULL;
}

Something like this as a fallback if sqlite is not available shouldn't take more than about twice the number of lines here to get it to work.

For the "limited number of queries" cases, preparing for the "huge scan" kind of application would be good, Suppose someone is going to do 100k scans with 10 hops on average. A normal user doing 10 scans of 10 hops is not a problem. But the 1M lookups with more than 90% doubles would cost serious money otherwise.

@yvs2014
Copy link

yvs2014 commented Oct 11, 2023

sqlite is "low overhead". Just a library putting things in a file ...

Anyway it needs some external databases working with persistent storage. On other hand, keeping a couple dozens entries in memory (gotten from free online source) is usually enough to an ordinary trace (it's my use case).

A normal user doing 10 scans of 10 hops is not a problem.
But the 1M lookups with more than 90% doubles would cost serious money otherwise.

i.e. it's not free and not for common use

@Jamie-Landeg-Jones
Copy link

I use the maxmind api in other projects, and have been meaning to modify mtr to use it too.

However, you appear to be discussing a realtime dns lookup method. I'm using the downloadable version of the database, which is already stored in it's own DB format.

The API to read that is quite simple, and exists in many languages already: https://maxmind.github.io/MaxMind-DB/

Is anyone considering this method of providing location data within mtr, and/or is there any interest in such a patch if I can finally get somewhat through my "TODO" list?

ps. I use the free version. It's refreshed monthly and is not quite as current as the paid-for version, but I've never noticed any issues with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants