-
Notifications
You must be signed in to change notification settings - Fork 113
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add an asynchronous method so DNS queries can be run asynchronously
- Loading branch information
Showing
12 changed files
with
461 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ Key features: | |
can display to end-users. | ||
* Checks deliverability (optional): Does the domain name resolve? | ||
(You can override the default DNS resolver to add query caching.) | ||
* Can be called asynchronously with `await`. | ||
* Supports internationalized domain names and internationalized local parts. | ||
* Rejects addresses with unsafe Unicode characters, obsolete email address | ||
syntax that you'd find unexpected, special use domain names like | ||
|
@@ -83,6 +84,9 @@ This validates the address and gives you its normalized form. You should | |
checking if an address is in your database. When using this in a login form, | ||
set `check_deliverability` to `False` to avoid unnecessary DNS queries. | ||
|
||
See below for examples for caching DNS queries and calling the library | ||
asynchronously with `await`. | ||
|
||
Usage | ||
----- | ||
|
||
|
@@ -161,6 +165,30 @@ while True: | |
validate_email(email, dns_resolver=resolver) | ||
``` | ||
|
||
### Asynchronous call | ||
|
||
The library has an alternative, asynchronous method named `validate_email_async` which must be called with `await`. This method uses an [asynchronous DNS resolver](https://dnspython.readthedocs.io/en/latest/async.html) so that multiple DNS-based deliverability checks can be performed in parallel. | ||
|
||
Here how to use it. In this example, `import ... as` is used to alias the async method to the usual method name `validate_email`. | ||
|
||
```python | ||
from email_validator import validate_email_async as validate_email, \ | ||
EmailNotValidError, caching_async_resolver | ||
|
||
resolver = caching_async_resolver(timeout=10) | ||
|
||
email = "[email protected]" | ||
try: | ||
emailinfo = await validate_email(email) | ||
email = emailinfo.normalized | ||
except EmailNotValidError as e: | ||
print(str(e)) | ||
``` | ||
|
||
Note that to create a caching asynchronous resolver, use `caching_async_resolver`. As with the synchronous version, creating a resolver is optional. | ||
|
||
When processing batches of email addresses, I found that chunking around 25 email addresses at a time (using e.g. `asyncio.gather()`) resulted in the highest performance. I tested on a residential Internet connection with valid addresses. | ||
|
||
### Test addresses | ||
|
||
This library rejects email addresses that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailSyntaxError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost` (although they might be able to still do so via a malicious MX record). However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,24 +5,86 @@ | |
# python -m email_validator [email protected] | ||
# python -m email_validator < LIST_OF_ADDRESSES.TXT | ||
# | ||
# Provide email addresses to validate either as a command-line argument | ||
# or in STDIN separated by newlines. Validation errors will be printed for | ||
# invalid email addresses. When passing an email address on the command | ||
# line, if the email address is valid, information about it will be printed. | ||
# When using STDIN, no output will be given for valid email addresses. | ||
# Provide email addresses to validate either as a single command-line argument | ||
# or on STDIN separated by newlines. | ||
# | ||
# When passing an email address on the command line, if the email address | ||
# is valid, information about it will be printed to STDOUT. If the email | ||
# address is invalid, an error message will be printed to STDOUT and | ||
# the exit code will be set to 1. | ||
# | ||
# When passsing email addresses on STDIN, validation errors will be printed | ||
# for invalid email addresses. No output is given for valid email addresses. | ||
# Validation errors are preceded by the email address that failed and a tab | ||
# character. It is the user's responsibility to ensure email addresses | ||
# do not contain tab or newline characters. | ||
# | ||
# Keyword arguments to validate_email can be set in environment variables | ||
# of the same name but upprcase (see below). | ||
|
||
import json | ||
import os | ||
import sys | ||
import itertools | ||
|
||
from .validate_email import validate_email | ||
from .deliverability import caching_resolver | ||
from .deliverability import caching_async_resolver | ||
from .exceptions_types import EmailNotValidError | ||
|
||
|
||
def main_command_line(email_address, options, dns_resolver): | ||
# Validate the email address passed on the command line. | ||
|
||
from . import validate_email | ||
|
||
try: | ||
result = validate_email(email_address, dns_resolver=dns_resolver, **options) | ||
print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False)) | ||
return True | ||
except EmailNotValidError as e: | ||
print(e) | ||
return False | ||
|
||
|
||
async def main_stdin(options, dns_resolver): | ||
# Validate the email addresses pased line-by-line on STDIN. | ||
# Chunk the addresses and call the async version of validate_email | ||
# for all the addresses in the chunk, and wait for the chunk | ||
# to complete. | ||
|
||
import asyncio | ||
|
||
from . import validate_email_async as validate_email | ||
|
||
dns_resolver = dns_resolver or caching_async_resolver() | ||
|
||
# https://stackoverflow.com/a/312467 | ||
def split_seq(iterable, size): | ||
it = iter(iterable) | ||
item = list(itertools.islice(it, size)) | ||
while item: | ||
yield item | ||
item = list(itertools.islice(it, size)) | ||
|
||
CHUNK_SIZE = 25 | ||
|
||
async def process_line(line): | ||
email = line.strip() | ||
try: | ||
await validate_email(email, dns_resolver=dns_resolver, **options) | ||
# If the email was valid, do nothing. | ||
return None | ||
except EmailNotValidError as e: | ||
return (email, e) | ||
|
||
chunks = split_seq(sys.stdin, CHUNK_SIZE) | ||
for chunk in chunks: | ||
awaitables = [process_line(line) for line in chunk] | ||
errors = await asyncio.gather(*awaitables) | ||
for error in errors: | ||
if error is not None: | ||
print(*error, sep='\t') | ||
|
||
|
||
def main(dns_resolver=None): | ||
# The dns_resolver argument is for tests. | ||
|
||
|
@@ -36,24 +98,14 @@ def main(dns_resolver=None): | |
if varname in os.environ: | ||
options[varname.lower()] = float(os.environ[varname]) | ||
|
||
if len(sys.argv) == 1: | ||
# Validate the email addresses pased line-by-line on STDIN. | ||
dns_resolver = dns_resolver or caching_resolver() | ||
for line in sys.stdin: | ||
email = line.strip() | ||
try: | ||
validate_email(email, dns_resolver=dns_resolver, **options) | ||
except EmailNotValidError as e: | ||
print(f"{email} {e}") | ||
if len(sys.argv) == 2: | ||
return main_command_line(sys.argv[1], options, dns_resolver) | ||
else: | ||
# Validate the email address passed on the command line. | ||
email = sys.argv[1] | ||
try: | ||
result = validate_email(email, dns_resolver=dns_resolver, **options) | ||
print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False)) | ||
except EmailNotValidError as e: | ||
print(e) | ||
import asyncio | ||
asyncio.run(main_stdin(options, dns_resolver)) | ||
return True | ||
|
||
|
||
if __name__ == "__main__": | ||
main() | ||
if not main(): | ||
sys.exit(1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.