Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mongo] make timeout configurable and increase the default #1823

Merged
merged 1 commit into from
Aug 12, 2015

Conversation

benmccann
Copy link
Contributor

We have one MongoDB instance that's misbehaving. It's a bit slow to connect to it because of the problems it's having and so it's difficult to tell what's wrong because we're not getting any metrics from it.

@yannmh yannmh added this to the 5.5.0 milestone Aug 10, 2015
@yannmh yannmh self-assigned this Aug 10, 2015
@yannmh
Copy link
Member

yannmh commented Aug 10, 2015

Thanks for contributing @benmccann !

Yes, let's assess this change for the 5.5.0 agent release. Out of curiosity, do you know long the check takes to connect to your instance ?

@benmccann
Copy link
Contributor Author

Normally it connects very quickly. But it looks like every so often there's some job that's just pounding our instance with unoptimized queries, and so when that happens we're seeing it take about 15s.

@remh
Copy link

remh commented Aug 10, 2015

Can we make it configurable instead ?

@benmccann
Copy link
Contributor Author

Making it configurable sounds reasonable. I'd still like to increase the default since I think it's lower than it needs to be.

Do you have an example of how we'd make it configurable? E.g. is there a certain file we'd edit to set the config value? And then would we read it from init_config or agentConfig?

@remh
Copy link

remh commented Aug 10, 2015

etcd is a good example.
https://github.com/DataDog/dd-agent/blob/5.4.3/conf.d/etcd.yaml.example#L7
https://github.com/DataDog/dd-agent/blob/5.4.3/checks.d/etcd.py#L74

I agree that we can increase the default timeout as well, although 60secs seems a bit excessive.

Does 20 secs seem reasonable ?

Thanks again for the feedback and the PR!

@benmccann
Copy link
Contributor Author

Would there be any problems caused by having the timeout be too high? Seems like it doesn't cost us much to make it higher.

I'd settle for 30 secs ;-) We were taking about 15 secs to connect, which is dangerously close to 20 secs.

@benmccann benmccann force-pushed the mongo-connection-time branch 3 times, most recently from 9b61182 to 9b34c0e Compare August 10, 2015 22:40
@remh
Copy link

remh commented Aug 10, 2015

Having a connection hanging for too long can triggers the watchdog that would kill and restart the agent.
Basically the watchdog kicks off if the overall check loop takes more than 2minutes.

So we are still fine with 30 secs but it's something we have to be cautious with :)

Thanks again for this!

@benmccann benmccann force-pushed the mongo-connection-time branch 2 times, most recently from 08deb81 to 1444eb7 Compare August 12, 2015 17:44
@yannmh
Copy link
Member

yannmh commented Aug 12, 2015

Thanks a lot @benmccann, tests should be passing now 🎉.

Since we can now set this timeout parameter from the YAML configuration file, I think it would make sense to switch back the default value to 10 seconds, so that collector doesn't hang for too long when the instance is unreachable.

Also, we recently introduced some contribution guidelines to keep a nice, homogeneous, and easy to browse git history.
Would you mind rewording the commit title following these rules please ? Again, thank you for your contribution.

@benmccann benmccann force-pushed the mongo-connection-time branch from 1444eb7 to c7e1c91 Compare August 12, 2015 18:23
@benmccann benmccann changed the title Increase Mongo connection timeout [mongo] make timeout configurable and increase the default Aug 12, 2015
@benmccann
Copy link
Contributor Author

Thanks @yannmh. I rebased to get the fix for the network tests and updated the commit message.

Regarding the timeout, I'm not sure 10s is a good default since we regularly go over that and I wouldn't be surprised if others hit the same issue. I understand we want to stay clear of the 2 min mark that @remh mentioned, but he seemed fine with 30s. Are there other issues? Does it hang the collection of your other metrics as well?

@benmccann
Copy link
Contributor Author

Btw, @yannmh @remh I'll let you guys decide the default timeout issue and update the PR accordingly. I do think that 10s is rather low though and setting it to at least 20 would solve most of the problems we've seen. It's not clear to me what negative effects there might be from setting it higher

@benmccann
Copy link
Contributor Author

Yep, all tests are passing now! Thanks guys!

@yannmh
Copy link
Member

yannmh commented Aug 12, 2015

Thanks for the further explanations @benmccann. Let's keep it at 30s then.

Merging 🚀 !

yannmh added a commit that referenced this pull request Aug 12, 2015
[mongo] make timeout configurable and increase the default
@yannmh yannmh merged commit 1d83aab into DataDog:master Aug 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants