Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory in go, after replicating one million records #20

Closed
brailateo opened this issue Feb 1, 2013 · 14 comments
Closed

Out of memory in go, after replicating one million records #20

brailateo opened this issue Feb 1, 2013 · 14 comments

Comments

@brailateo
Copy link

I build a CouchDB database with 2 millions documents containing 8 attributes (2 to 10 chars long) and a "channel" attribute with 2 strings.
Tried to replicate it to sync_gateway, after a hour (CouchBase reports already 2.587.000 records in data bucket) the go program has crashed with the following message

...
2013/01/31 22:20:13     Assigning doc "c5e35237e98fc32e21b8be5142097c30" to channels []
2013/01/31 22:20:13     Assigning doc "c5e35237e98fc32e21b8be51420988ab" to channels []
2013/01/31 22:20:13     --> 201 
2013/01/31 22:20:40 HEAD /sync_gateway/
2013/01/31 22:20:42 GET /sync_gateway/
2013/01/31 22:20:42 GET /sync_gateway/_local/eff358b2433202e86fd73599c24b3d84
2013/01/31 22:20:42     --> 404 missing
2013/01/31 22:20:42 GET /sync_gateway/_local/784b528714c810ff80f22b6a202ae4f5
2013/01/31 22:20:42     --> 404 missing
2013/01/31 22:20:42 GET /sync_gateway/_changes?feed=normal&style=all_docs&since=0&heartbeat=10000
2013/01/31 22:20:47 GET /sync_gateway/
...
2013/01/31 22:23:39 GET /sync_gateway/
runtime: memory allocated by OS (0xb326a000) not in usable range [0x18a00000,0x98a00000)
throw: out of memory

goroutine 516 [running]:
reflect.unsafe_NewArray(0x18a7b000, 0x826fff4, 0x8d, 0x81eeb58, 0x176, ...)
    /usr/local/go/src/pkg/runtime/iface.c:702 +0x80
reflect.MakeSlice(0x18a7b000, 0x81eeb58, 0x5e, 0x8d, 0x81eeb58, ...)
    /usr/local/go/src/pkg/reflect/value.go:1650 +0x14a
encoding/json.(*decodeState).array(0x58b430e4, 0x81eeb58, 0x62ac6c84)
    /usr/local/go/src/pkg/encoding/json/decode.go:364 +0x471
encoding/json.(*decodeState).value(0x58b430e4, 0x81eeb58, 0x62ac6c84)
    /usr/local/go/src/pkg/encoding/json/decode.go:246 +0x1ba
encoding/json.(*decodeState).object(0x58b430e4, 0x826b5b0, 0x62ac6c80)
    /usr/local/go/src/pkg/encoding/json/decode.go:555 +0x946
encoding/json.(*decodeState).value(0x58b430e4, 0x81ec4e8, 0x62ac6c80)
    /usr/local/go/src/pkg/encoding/json/decode.go:249 +0x19f
encoding/json.(*decodeState).unmarshal(0x58b430e4, 0x81ec4e0, 0x62ac6c80, 0x0, 0x0, ...)
    /usr/local/go/src/pkg/encoding/json/decode.go:136 +0x121
encoding/json.(*Decoder).Decode(0x58b430d0, 0x81ec4e0, 0x62ac6c80, 0x18aa4820, 0x74a37500, ...)
    /usr/local/go/src/pkg/encoding/json/stream.go:48 +0xff
github.com/couchbaselabs/go-couchbase.(*Bucket).ViewCustom(0x18a6e3c0, 0x82a306c, 0xc, 0x8298e6c, 0x8, ...)
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/go-couchbase/views.go:105 +0x403
github.com/couchbaselabs/sync_gateway/db._func_001(0x1e0cd680, 0x1e0cd678, 0x1e0cd650, 0x1e0cd658, 0x1e0cd668, ...)
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/changes.go:97 +0x244
created by github.com/couchbaselabs/sync_gateway/db.(*Database).ChangesFeed
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/changes.go:163 +0x3d2

Running again the go program and trying to resume replication fails with the same message.

2013/02/01 00:12:17 GET /sync_gateway/
2013/02/01 00:12:19 GET /sync_gateway/_changes?feed=normal&style=all_docs&since=0&heartbeat=10000
2013/02/01 00:12:22 GET /sync_gateway/
2013/02/01 00:12:28 GET /sync_gateway/
2013/02/01 00:12:33 GET /sync_gateway/
runtime: memory allocated by OS (0xb3216000) not in usable range [0x18a00000,0x98a00000)
throw: out of memory

goroutine 12 [running]:
encoding/json.Unmarshal(0x98939d90, 0x66, 0x67, 0x81ec580, 0x9896f270, ...)
    /usr/local/go/src/pkg/encoding/json/decode.go:55 +0x29
github.com/couchbaselabs/sync_gateway/db.RevTree.UnmarshalJSON(0x98937d20, 0x98939d90, 0x66, 0x67, 0x0, ...)
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/revtree.go:77 +0x110
github.com/couchbaselabs/sync_gateway/db.(*RevTree).UnmarshalJSON(0x9896f1cc, 0x98939d90, 0x66, 0x67, 0x0, ...)
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/attachment.go:0 +0x74
encoding/json.(*decodeState).object(0x98873f80, 0x8281368, 0x9896f1cc)
    /usr/local/go/src/pkg/encoding/json/decode.go:415 +0xc7
encoding/json.(*decodeState).value(0x98873f80, 0x8281368, 0x9896f1cc)
    /usr/local/go/src/pkg/encoding/json/decode.go:249 +0x19f
encoding/json.(*decodeState).object(0x98873f80, 0x82772e0, 0x9896f1b0)
    /usr/local/go/src/pkg/encoding/json/decode.go:555 +0x946
encoding/json.(*decodeState).value(0x98873f80, 0x81ec560, 0x9896f1b0)
    /usr/local/go/src/pkg/encoding/json/decode.go:249 +0x19f
encoding/json.(*decodeState).unmarshal(0x98873f80, 0x81ec558, 0x9896f1b0, 0x0, 0x0, ...)
    /usr/local/go/src/pkg/encoding/json/decode.go:136 +0x121
encoding/json.Unmarshal(0x98939d20, 0xd7, 0xd7, 0x81ec558, 0x9896f1b0, ...)
    /usr/local/go/src/pkg/encoding/json/decode.go:65 +0xcd
github.com/couchbaselabs/sync_gateway/db._func_001(0x18b20d78, 0x18b20d70, 0x18b20d48, 0x18b20d50, 0x18b20d60, ...)
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/changes.go:126 +0x85b
created by github.com/couchbaselabs/sync_gateway/db.(*Database).ChangesFeed
    /home/teo/eclipse/workspace/bd2012/src/github.com/couchbaselabs/sync_gateway/db/changes.go:163 +0x3d2

Teo

@snej
Copy link
Contributor

snej commented Feb 1, 2013

Are you running 32-bit? Apparently the Go 1.0.3 garbage collector has known issues in 32-bit mode that can cause it to leak memory over time. I've seen posts on the Go mailing list saying that this causes HTTP servers to crash after a while. Switching to 64-bit avoids the problem.

@brailateo
Copy link
Author

Yes! I'm running the 32 bit version! I'll install go 1.0.3 on a 64bit machine and I'll be back with more tests!

@brailateo
Copy link
Author

Installed a fresh go 1.0.3 copy on a 12 Gb , 64 bit Linux machine!
This time it didn't died but replicating curl died with message:

teo@teo:/var/www/identificare$ curl -X POST http://172.16.0.53:5984/_replicate -d '{"target":"http://172.16.0.53:5984/afoni","source":"http://172.16.0.53:4984/sync_gateway"}' -H "Content-Type: application/json"
{"error":"changes_reader_died"}

At changes.go , line 252 func (db *Database) GetChanges
you tried to concatenate all changes into a single slice "changes"?
I think this is not scalable to hundreds of millions records ... that we have now in a PostgreSQL database and intend to use with couchbase & sync_gateway!
Teo

@snej
Copy link
Contributor

snej commented Feb 1, 2013

That could be the problem. We definitely haven't done any work on scalability or performance yet! It may be too early to start stress-testing the gateway ... at least unless you're interested in helping improve it :)

@brailateo
Copy link
Author

Yes, we are interested in improving it! My collegue has already studied the source, he discovered that and he says that he knows how to make it work! Right now I am restarting the replication from scratch (after deleting the couchbase bucket) but with continuous replication. Until now it transferred more than 50% records and it seems to be stable! The "continuous replication" functions in go are scalable! I'll keep you informed.

@snej
Copy link
Contributor

snej commented Feb 1, 2013

That'd be great! The obvious fix for the changes feed is to pass an IO writer to the function instead of having it return an array, so it can write the changes as JSON array elements one at a time.

@brailateo
Copy link
Author

That's exactly what he plan to do ...

@dustin
Copy link
Contributor

dustin commented Feb 1, 2013

The obvious fix for the changes feed is to pass an IO writer to the function instead of having it return an array

I haven't been following this closely, but I will return a chan X instead of []X quite often for this reason. It provides almost the same syntax to the caller (looping over it) with the same structured data type.

Not sure if it applies here, but if it could, I can help out with it.

@brailateo
Copy link
Author

Just a short update for @snej , we are working on replication ... found a lot of bugs, my friend has solved most of them, we are testing right now with 2 millions records database, I'll come back with more info when it's ready!

@snej
Copy link
Contributor

snej commented Feb 4, 2013

Great! Sorry about the bugs; we've definitely been working more on features than correctness, so far. It'll be great to have those fixes!

@brailateo
Copy link
Author

@snej please get the modified files from https://mega.co.nz/#!jJcEgY4b!CR_h7FhCbLBgL4V6jdwOpxWTQ35ynVQgdF_ikd7wdYw

With these changes, replication works now in both directions, continuous or normal replications, with or without channels! He made some optimisations in order to avoid to frequently changes to __seq , reserving a chunk of 200 numbers at a time, giving a 20% boost in replication speed. He added also some missed protocol headers and footers in JSON responses. You will find also there something about trying to change the sequnce that came form couchbase the supposed to be integer but it's coming as float and couchdb throws that "badarg" error when it's receiving it!
Please take a look at the changes, import them into your source and give it a try. We have made a lot of tests here between couchdb 1.2.1 and sync_gateway with more than 2.500.000 records in a database! Everything works OK!
If there is something there that isn't too clear, you can contact my friend at v.mitache { at } that big company with G mail ;-)
Teo

@snej
Copy link
Contributor

snej commented Feb 5, 2013

Sorry, but I was unable to download the file from mega — first it complained that I wasn't using Chrome (I'm on the latest Safari), then it said it had downloaded the file but it didn't show up anywhere in my filesystem. I'm not willing to install a new browser just to download a 16kbyte patch!

The best way to send patches is using a Github pull request. Alternatively, you can email it to me at jens at couchbase dot com.

@brailateo
Copy link
Author

I just send them!
Teo

@snej
Copy link
Contributor

snej commented Feb 6, 2013

FYI, a lot of the conversation has been via email, but the patches have now been committed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants