Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: fix resolve failed starts without network #13076

Conversation

XadillaX
Copy link
Contributor

@XadillaX XadillaX commented May 17, 2017

Fix the bug that you start process without network at first, but it connected lately, dns.resolve will stay failed with ECONNREFUSED because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self, and last query is not OK, recreating ares_channel operation will be triggered to reload servers.

You can test with this script:

const dns = require("dns");

function test() {
    dns.resolve("google.com", function(err, ret) {
        console.log(err, ret, dns.getServers());
        setTimeout(test, 10000);
    });
}

test();

Turn off your network at first before starting this script. Then try to open the network and then turn off. Make the steps above in a loop and you will get the result.

Fixes: #1644

Solutaion

DNS server matches follow conditions is considered as a fallback server of c-ares:

  1. caused last query ECONNREFUSED; (the fallback server very likely to have this error)
  2. is exactly [ "127.0.0.1" ]; (the fallback server is exactly this value)
  3. haven't set by users or developers by dns.setServers.

Because users or developers may do dns.setServers([ "127.0.0.1" ]) or theirs /etc/resolv.conf is 127.0.0.1.

As a result, I gathered those 3 conditions.

I created two flags at the process start up.

  • cares_query_last_ok_ indicates that is last dns query succeeded or not;
  • cares_is_servers_use_default_ indicates that whether current dns servers is the fallback 127.0.0.1 that c-ares initialized at the beginning.

If cares_query_last_ok_ is true or cares_is_servers_use_default_ is false, everything will be OK.

If not, the code will check current dns servers to see if it's 127.0.0.1 (maybe c-ares' fallback server). When it's 127.0.0.1, old ares_channel will be released and a new one will be created.

Once one query succeeded or not ECONNREFUSED, cares_query_last_ok_ will be marked as true. And once user set dns servers via dns.setServers, cares_is_servers_use_default_ will be marked as false. And what's more, once the first check found the dns servers is not 127.0.0.1, cares_is_servers_use_default_ also will be marked as false.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

dns

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. cares Issues and PRs related to the c-ares dependency or the cares_wrap binding. labels May 17, 2017
@XadillaX XadillaX force-pushed the fix/resolve-fails-when-start-without-network branch from 8662937 to 57f04fd Compare May 17, 2017 09:44
@XadillaX
Copy link
Contributor Author

Anyone who has time to review this code?

/cc @trevnorris @indutny @addaleax @danbev

src/env-inl.h Outdated

inline void Environment::cares_servers_use_default(bool is_default) {
cares_is_servers_use_default_ = is_default;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could name the setters set_FLAG() and the getters just FLAG() (compare with e.g. using_domains)?

@@ -1306,6 +1373,8 @@ void SetServers(const FunctionCallbackInfo<Value>& args) {

delete[] servers;

if (err == ARES_SUCCESS) env->cares_servers_use_default(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely tiny nit: I’d put these on separate lines

@@ -328,10 +386,18 @@ class QueryWrap : public AsyncWrap {
return static_cast<void*>(this);
}

static void AresQuery(Environment* env, const char* name,
int dnsclass, int type, ares_callback callback,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the code around this doesn’t do it either, but once more, would you mind aligning the arguments vertically? 😄

* The fallback servers of cares is [ "127.0.0.1" ] with no user additional
* setting.
*/
void AresCheckServers(Environment* env) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: Check sounds a bit like it’s just testing things, maybe Ensure is better?

/* We do the call to ares_init_option for caller. */
int r = ares_init_options(env->cares_channel_ptr(),
&options,
ARES_OPT_FLAGS | ARES_OPT_SOCK_STATE_CB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto for aligning)

memset(&options, 0, sizeof(options));
options.flags = ARES_FLAG_NOCHECKRESP;
options.sock_state_cb = ares_sockstate_cb;
options.sock_state_cb_data = env;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow share this code with Initialize, to make it more obviously correct? It looks like there’s quite a bit that could be shared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'v extracted these code to SetupCaresChannel().

@addaleax
Copy link
Member

@XadillaX XadillaX force-pushed the fix/resolve-fails-when-start-without-network branch from 57f04fd to 51caf53 Compare May 18, 2017 03:38
@pmq20
Copy link
Contributor

pmq20 commented May 18, 2017

One of Windows CI Machines failed. I wonder why.

not ok 2 parallel/test-async-wrap-getasyncid
  ---
  duration_ms: 0.248
  severity: fail
  stack: |-
    Mismatched <anonymous> function calls. Expected 1, actual 0.
        at Object.<anonymous> (c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vs2015-x86\label\win2008r2\test\parallel\test-async-wrap-getasyncid.js:155:42)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:144:16)
        at bootstrap_node.js:561:3
    (node:5140) [DEP0064] DeprecationWarning: tls.createSecurePair() is deprecated. Please use tls.Socket instead.
  ...

@XadillaX
Copy link
Contributor Author

I don't know why, maybe it's the bug of CI? I'm not sure. @addaleax

@addaleax
Copy link
Member

Yes, the CI failure is nothing to worry about. 👍 (It’s not a bug in the CI but it’s a known failure that’s not related to this PR.)

@XadillaX XadillaX force-pushed the fix/resolve-fails-when-start-without-network branch from 51caf53 to 490672c Compare May 19, 2017 02:52
@XadillaX
Copy link
Contributor Author

XadillaX commented May 19, 2017

Hey @addaleax I've already rebased for some conflicts with master.

@addaleax
Copy link
Member

@addaleax
Copy link
Member

CI is green. Maybe @bnoordhuis or @silverwind could give this a look, too?

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments.

ns_c_in,
ns_t_a,
Callback,
GetQueryArg());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits on one line now.

ns_c_in,
ns_t_aaaa,
Callback,
GetQueryArg());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise. I'll stop pointing it out from here on.

@addaleax
Copy link
Member

addaleax commented May 19, 2017

@XadillaX Actually, one more thing: your author name in this commit is given as “XadillaX”. Is that intended or do you prefer to be listed (changelog, git log, AUTHORS file) with some other name? People typically prefer their full name, but ultimately it’s up to you. (Edit: Just to be clear, that name doesn’t have to be ASCII-only – whatever you prefer, really.)

Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: nodejs#1644
@XadillaX XadillaX force-pushed the fix/resolve-fails-when-start-without-network branch from 490672c to 355a33b Compare May 20, 2017 04:48
@XadillaX
Copy link
Contributor Author

XadillaX commented May 20, 2017

@addaleax I think 'XadillaX - Khaidi Chu' would be fine to me.

@XadillaX
Copy link
Contributor Author

@addaleax I mean I'd like to use XadillaX when in git log, changelog and authors file. And if someday I have the honor of being a collaborator, I'd like to use 'XadillaX - Khaidi Chu' in the list.

@addaleax
Copy link
Member

Landed in 2b54147

@XadillaX I think you’ve already figured out that you can set this information using git config [--global] user.name and … user.email, but if you need any help, just let us know. :)

@addaleax addaleax closed this May 21, 2017
addaleax pushed a commit that referenced this pull request May 21, 2017
Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: #1644
PR-URL: #13076
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@XadillaX
Copy link
Contributor Author

@addaleax Thanks. I wondered if this PR passed the coverity? Shall I trigger that by myself?

@addaleax
Copy link
Member

@XadillaX I am not sure how to do that. I think the tool gets run periodically without being triggered, but really, I’m not sure.

@gibfahn
Copy link
Member

gibfahn commented May 22, 2017

@addaleax Thanks. I wondered if this PR passed the coverity? Shall I trigger that by myself?

see #13117 (comment)

@MylesBorins
Copy link
Contributor

Should this be backported to v6.x? If so it will require to be done manually

XadillaX added a commit to XadillaX/node that referenced this pull request Jul 23, 2017
Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: nodejs#1644
PR-URL: nodejs#13076
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@XadillaX
Copy link
Contributor Author

Should this be backported to v6.x? If so it will require to be done manually

#14434

MylesBorins pushed a commit that referenced this pull request Aug 14, 2017
Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: #1644
Backport-PR-URL: #14434
PR-URL: #13076
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 14, 2017
Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: #1644
Backport-PR-URL: #14434
PR-URL: #13076
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 16, 2017
Fix the bug that you start process without network at first, but it
connected lately, `dns.resolve` will stay failed with ECONNREFUSED
because c-ares servers fallback to 127.0.0.1 at the very beginning.

If c-ares servers "127.0.0.1" is detected and its not set by user self,
and last query is not OK, recreating `ares_channel` operation will be
triggered to reload servers.

Fixes: #1644
Backport-PR-URL: #14434
PR-URL: #13076
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Aug 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. cares Issues and PRs related to the c-ares dependency or the cares_wrap binding.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dns.resolve fails when the io/node process starts without an active network connection
7 participants