Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable DNS cache #2262

Closed
asvetlov opened this issue Sep 11, 2017 · 8 comments
Closed

Disable DNS cache #2262

asvetlov opened this issue Sep 11, 2017 · 8 comments
Labels
Milestone

Comments

@asvetlov
Copy link
Member

If servers are hosted on Amazon DNS caches first answered IP.
In its turn Amazon uses round-robin to iterate over all IPs registered in load balancer.

Therefore aiohttp with cache enabled sends all payload to only one server from registered group.

The solution might don't use DNS cache for threaded DNS resolver and respect TTL returned by server for aiodns based implementation.

To implement the feature the moving DNS caching into resolver makes sense.

@asvetlov asvetlov added this to the 2.3 milestone Sep 11, 2017
@pfreixes
Copy link
Contributor

Hi @asvetlov I was involved on the MR that put in place the current DNS cache mechanism implemented by aiohttp, here [1] and here [2].

The implementation should meet the requirements that you are asking as a showstopper. Indeed, this implementation was done taking into account the AWS enviornment, where we are running our services.

The TTL initially proposed was 60 because this is the TTL proposed by AWS [1], I think that at some point @fafhrd91 proposed the current default value 10 seconds [3] (warning, the docstring is invalid [4].

Also related to that, the current implementation caches all IP addresses returned by the DNS server, choosing at each request the next IP address based on a round-robin strategy.

I'm missing something?

[1] #1819
[2] #1836
[3] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L570
[4] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L557

@asvetlov
Copy link
Member Author

Well, I'm not a DNS expert but @hellysmile could shed a light on the issue.

BTW, what MR means? PR is for Pull Request, what is MR for?

PPS.
Very thank you for the message. Maybe I've missed something, I'd like to have a discussion about DNS cache policy here.

@pfreixes
Copy link
Contributor

MR = PR I'm crossing words that I use in my daily work.

Let's wait for @hellysmile comments, I'm not aware of any issue right now with the current DNS cache implementation.

@hellysmile
Copy link
Member

hellysmile commented Sep 21, 2017

I'll try to explain issues that we have.

http://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html

Request Routing

Before a client sends a request to your load balancer, it resolves the load balancer's domain name using a Domain Name System (DNS) server. The DNS entry is controlled by Amazon, because your load balancers are in the amazonaws.com domain. The Amazon DNS servers return one or more IP addresses to the client, which are the IP addresses of the load balancer nodes for your load balancer. As traffic to your application changes over time, Elastic Load Balancing scales your load balancer and updates the DNS entry. Note that the DNS entry also specifies the time-to-live (TTL) as 60 seconds, which ensures that the IP addresses can be remapped quickly in response to changing traffic.

The client determines which IP address to use to send requests to the load balancer. The load balancer node that receives the request selects a healthy registered targets and sends the request to the target using its private IP address.

OS DNS call -> socket.gethostbyname -> OS cache for 60 seconds, according to response from AWS DNS -> aiohttp can cache cache results for 60 seconds.

But if You have more than 1 process|aiohttp.ClientSession, exactly same call can be done for example 59 seconds ago from another process|aiohttp.ClientSession, application receives response which is valid only for 1 second and there is no way to receive TTL. It makes application use out of date ELB IP for 59 seconds in case of caching dns responses. And if You not lucky one ELB removes cached IP from balancer and that's it...

I mean that only OS can cache DNS in this case. Any caching on application side without knowing TTL can bring issues.

@pfreixes
Copy link
Contributor

pfreixes commented Sep 21, 2017

Hi @hellysmile thanks for your comment, much appreciated.

The design of the current implementation was done using the AWS principles that you claim. The only difference is what TTL is used, for that as I've already commented in the original PR [1] :

Unfortunately, retrieve the TTL as a related metadata of a DNS query can end up being a mess, the default implementation of getaddrinfo relays on the C implementation that implements for POSIX systems the most common RFC, including as a response a limited structure [1]. I would say that using aiodns and avoiding the naive gethostbyname implementation and using the raw access to the cares via the query function it should be possible.

Because of that, we decided to put the TTL as a hardcoded value. The value of that TTL, remember that is 10 seconds, is less than the value recommended by AWS [2], in fact 10 seconds value is some heuristic value that was found as the most optimal value to get rid of sporadic network failures [3]

Regarding your comments. But, by default, the OS never caches DNS queries, at least for GNU/Linux environments that do not have a DNS service installed and configured.

I do not understand your example of the processes and the DNS cache, can you evolve it a bit more ?

[1] #1819 (comment)
[2] http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
[3] http://engineering.curalate.com/2016/03/25/elb-and-dns.html

@hellysmile
Copy link
Member

@pfreixes thanks You so much for detailed response!

Seems I've missed #1819 , we had serious issues when aiohttp were using deadly caching.

I was mostly sure that there is built-in DNS cache on OS level, were forced to purge it on my Mac computer often.

I've found dnsmasq installed and running on AWS. So it was the real issue.

There is only one feature which can be done in a future is to use aiodns provided TTL instead of hardcoded value.

@pfreixes
Copy link
Contributor

@asvetlov I believe that this no longer a stopper for the new release.

@hellysmile provide the TTL from the aiodns response its an option when it is enabled, you can do a PR!

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants