-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this still active? #134
Comments
I've seen a few forks of this around but they only seem to contain the commits that have been PRd back to this repo 😞 |
Hmm, the library has some issues that seem to be very common like unicode decode error or timeouts that have been reported and are probably waiting for fixes there on some pull request. |
+1, I've found the same as @cristianocca |
This guy did an amazing job with this package. He defied the mythical whois monster. I'm surprised how little work it took to get it to a level where it yields better results than anything I've ever tried before (and I did try a lot...I think I have at least 1,000,000 sites down so far). |
Is the library dead? is there any active fork? |
@Ni-Knight yes there is https://github.com/botlabio/pywhois. The package have been somewhat cleaned, slightly refactored and tested with 100,000 sites. As a result of that test there is now a better idea of the actual coverage, how to improve that, etc. |
Hi all, apologies for the long radio silence. I've had a bunch of personal issues to deal with for the past few years, so maintenance of libraries has slipped quite a bit; especially for my Python libraries. As I generally use Node.js for my own projects nowadays, my original intention was to maintain this in parallel with a JS implementation, sharing the parsing ruleset between them; but I've not found the time to get anything done on it. Given the increasing amount of registries that just totally shut off WHOIS data access due to the GDPR (even for company data, where this isn't necessary!), I'm unsure about the future of this library. On the one hand it's something I'd like to maintain and I have some ideas for improving on it, but on the other hand it may not be very useful in the future with decreasing WHOIS data access. The persisting encoding issues have also contributed significantly to this library falling by the wayside; there doesn't seem to have ever been a canonical solution for these issues that works in both Python 2 and Python 3 without introducing an additional dependency - and adding a dependency is not something I really like to do considering the rather... lacking and conflict-prone dependency model in Python, which is a big part of why I gave up on Python in the first place. So... I'm not sure how to proceed. A few questions for you all, as users of this library, that'll help me determine how to continue:
|
@joepie91 I guess many will be pleased to hear from you here! GDPR wall sounds bad, I did not know about this yet but was wondering how this will play out. To your questions:
I think that one big thing with this library is that the code needs to be refactored. IMO all special cases should be handled in separate functions that reside in their own files in a /exceptions sub module or something like that. |
With the introduction of the GDPR, it is no longer allowed to process or publish personally identifiable information of EU residents, without either a) a predetermined legal basis for doing so, or b) explicit and voluntary permission to do so (and it's not allowed to require that 'permission' to use a service). For WHOIS data, this basically works out to "you can no longer legally publish registrant data for EU residents". While this does not apply to non-EU residents and organizations (eg. companies), an increasing amount of registries is simplifying their implementation by just hiding all registrant data for everybody. For example, if you WHOIS my domain
There is no standardized way for indicating such removal/replacement of PII, so there would need to be special rules for detecting GDPR-related information removals per registry. It also means that you will get less and less data out of registries over time.
I'd prefer avoiding dependencies at all, since Python uses a flat dependency model; if two dependencies in the same project use different (incompatible) versions of an encoding-detection package, it will cause a potentially unresolvable version conflict. This also applies when only supporting Python 3.
That was the original plan, but I kept running into weirder and weirder edge cases; and since there's no centralized repository of formats and edge cases, any such architecture would need to be changed over time anyway to accommodate newly discovered kinds of edge cases. That's not to say that things can't be refactored, but it's an ongoing and never-ending process rather than a one-off todo item. |
This "not disclosed" business looks bad. How will this end? Will we be deprived from the joys of mindless parsing of whois records? |
Likely with increasingly smaller amounts of (useful) information being present in WHOIS data over time, hence my uncertainty on how to proceed with this project. |
@joepie91 generally speaking I prefer signals that are consistent across all observations, which as you know was kind of a struggle with WHOIS data to start with. That said, my use-case is quite specific and might not be affected too much here. Actually I'd be ok with just registration date, and I think that's not going to be masked at any point for any reason. Instead of regex, a deep learning model could be used for detecting it which would avoid a lot of the headache. My second use-case is to identify use of whois privacy, which I think could be mostly done in the current scenario. The third case is a more conventional reverse lookup with reg email, org name, etc which is obviously affected (and actually because of wider adoption of whois privacy already was). But I think there are better, much harder to mask ways, to do that these days. |
This fork works well 👍 |
I guess it's a silly question since the last commit was 3 years ago. But tried out a few options and this one seems to yield the best results so far, did this project continue somewhere else?
The text was updated successfully, but these errors were encountered: