Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from persistent "i/o timeout" or "Closed explicitly" pool errors #69

Merged
merged 1 commit into from
Dec 27, 2017

Conversation

bachue
Copy link

@bachue bachue commented Dec 20, 2017

In current version, if mgo is failed to connnect to mongodb server for a while(longer than 10 seconds)(because of e.g. timeout), then even mongodb connection is recovered, mgo still cannot rebuild the connection. The patch can fix the problem.

@domodwyer
Copy link

domodwyer commented Dec 20, 2017

Hi @bachue

Thanks for this! We've had a couple of tickets that this could be the root cause of - great catch!

I'm away so can't review properly at the moment (back in the new year) but would you mind retargeting this to the development branch (see the contribution guidelines) - thanks!

Dom

@domodwyer domodwyer changed the title try to fix the issue that socket could returns "i/o timeout" or "Closed explicitly" error Recover from persistent "i/o timeout" or "Closed explicitly" pool errors Dec 20, 2017
@bachue bachue force-pushed the fix/io-timeout-issue branch from 1b899db to 38eea2b Compare December 20, 2017 14:03
@bachue bachue changed the base branch from master to development December 20, 2017 14:04
@bachue
Copy link
Author

bachue commented Dec 20, 2017

@domodwyer
Done. And this is my test program: https://github.com/bachue/mgo-test/blob/master/main.go which might be helpful.

@szank
Copy link

szank commented Dec 21, 2017

Looks good. Nice work!

@domodwyer
Copy link

domodwyer commented Dec 27, 2017

Hi @bachue

Great work narrowing this down - it's a brilliant fix!

I initially added the "needs tests" tag, but it's difficult/takes a long time per test run to do in a repeatable, automated way - I've confirmed the fix works manually so happy to merge.

Thanks for the contribution @bachue.

Dom

@domodwyer domodwyer merged commit a104bfb into globalsign:development Dec 27, 2017
@domodwyer domodwyer mentioned this pull request Jan 15, 2018
domodwyer added a commit that referenced this pull request Jan 15, 2018
Includes:

* Reduced memory in bulk operations (#56)
* Native x509 authentication (#55)
* Better connection recovery (#69)
* Example usage (#75 and #78)

Thanks to:

* @bachue
* @csucu 
* @feliixx


---
[Throughput overview](https://user-images.githubusercontent.com/9275968/34954403-3d3253dc-fa18-11e7-8eef-0f2b0f21edc3.png)

Select throughput has increased by ~600 requests/second with slightly increased variance:
```
x => r2017.11.06-select-zipfian-throughput.log 
y => 9acbd68-select-zipfian-throughput.log 

     n   min   max median  average   stddev      p99
x 3600 49246 71368  66542 66517.26 2327.675 70927.01
y 3600 53304 72005  67151 67145.36 2448.534 71630.00

          62000       64000       66000        68000       70000       72000    
 |----------+-----------+-----------+------------+-----------+-----------+-----|
                              +---------+--------+                              
1          -------------------|         |        |--------------------          
                              +---------+--------+                              
                                 +---------+---------+                          
2    ----------------------------|         |         |--------------------      
                                 +---------+---------+                          
Legend: 1=data$x, 2=data$y

At 95% probablitiy:
===> average is statistically significant     (p=0.000000, diff ~628.094444)
===> variance is statistically significant    (p=0.002398)

```

* [insert-latency.txt](https://github.com/globalsign/mgo/files/1632474/insert-latency.txt)
* [insert-throughput.txt](https://github.com/globalsign/mgo/files/1632475/insert-throughput.txt)
* [select-zipfian-latency.txt](https://github.com/globalsign/mgo/files/1632476/select-zipfian-latency.txt)
* [select-zipfian-throughput.txt](https://github.com/globalsign/mgo/files/1632477/select-zipfian-throughput.txt)
* [update-zipfian-latency.txt](https://github.com/globalsign/mgo/files/1632478/update-zipfian-latency.txt)
* [update-zipfian-throughput.txt](https://github.com/globalsign/mgo/files/1632479/update-zipfian-throughput.txt)

Note: latencies are approximations calculated from grouped data
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
Includes:

* Reduced memory in bulk operations (globalsign#56)
* Native x509 authentication (globalsign#55)
* Better connection recovery (globalsign#69)
* Example usage (globalsign#75 and globalsign#78)

Thanks to:

* @bachue
* @csucu 
* @feliixx


---
[Throughput overview](https://user-images.githubusercontent.com/9275968/34954403-3d3253dc-fa18-11e7-8eef-0f2b0f21edc3.png)

Select throughput has increased by ~600 requests/second with slightly increased variance:
```
x => r2017.11.06-select-zipfian-throughput.log 
y => 9acbd68-select-zipfian-throughput.log 

     n   min   max median  average   stddev      p99
x 3600 49246 71368  66542 66517.26 2327.675 70927.01
y 3600 53304 72005  67151 67145.36 2448.534 71630.00

          62000       64000       66000        68000       70000       72000    
 |----------+-----------+-----------+------------+-----------+-----------+-----|
                              +---------+--------+                              
1          -------------------|         |        |--------------------          
                              +---------+--------+                              
                                 +---------+---------+                          
2    ----------------------------|         |         |--------------------      
                                 +---------+---------+                          
Legend: 1=data$x, 2=data$y

At 95% probablitiy:
===> average is statistically significant     (p=0.000000, diff ~628.094444)
===> variance is statistically significant    (p=0.002398)

```

* [insert-latency.txt](https://github.com/globalsign/mgo/files/1632474/insert-latency.txt)
* [insert-throughput.txt](https://github.com/globalsign/mgo/files/1632475/insert-throughput.txt)
* [select-zipfian-latency.txt](https://github.com/globalsign/mgo/files/1632476/select-zipfian-latency.txt)
* [select-zipfian-throughput.txt](https://github.com/globalsign/mgo/files/1632477/select-zipfian-throughput.txt)
* [update-zipfian-latency.txt](https://github.com/globalsign/mgo/files/1632478/update-zipfian-latency.txt)
* [update-zipfian-throughput.txt](https://github.com/globalsign/mgo/files/1632479/update-zipfian-throughput.txt)

Note: latencies are approximations calculated from grouped data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants