Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ThreadIO dying when data is too large #16

Closed
wants to merge 1 commit into from
Closed

Fix ThreadIO dying when data is too large #16

wants to merge 1 commit into from

Conversation

gboucher90
Copy link

Max data size in CachedData is not enough.
Indeed Memcached configuration is about the max item size which includes:
key, value and the item_struct sizes.
Checking the value size is not enough. ERR_2BIG can be triggered.

When EER_2BIG is triggered, the operation is marked as completed as soon as the server has answered with the error code.
The write operation is still in the write queue as there are bytes remaining to be written.
A NullPointerException is then throwned since the operation buffer is cleared when it is marked completed.
Finally the ThreadIO dies as it does not catch this type of exception.

Max data size in CachedData is not enough.
Indeed Memcached configuration is about the max item size which includes:
key, value and the item_struct sizes.
Checking the value size is not enough. ERR_2BIG can be triggered.

When EER_2BIG is triggered, the operation is marked as completed as soon as the server has answered with the error code.
The write operation is still in the write queue as there are bytes remaining to be written.
A NullPointerException is then throwned since the operation buffer is cleared when it is marked completed.
Finally the ThreadIO dies as it does not catch this type of exception.
@gboucher90 gboucher90 closed this Oct 27, 2016
lmikkelsen added a commit to lmikkelsen/spymemcached that referenced this pull request Jan 24, 2017
I apologize for using a PR to report an issue, but I'm unable to find a public
issue tracker for spymemcached.

Under certain circumstances `TCPMemcachedNodeImpl.fillWriteBuffer()` will throw
a NullPointerException because the current write operation has transitioned to
the `COMPLETE` state and thus doesn't have a write buffer. This issue has
previously been reported in couchbase#16 and
https://code.google.com/archive/p/spymemcached/issues/304.

I'm able to consistently reproduce the issue using spymemcached 1.12.1 and
memcached 1.4.34 installed using Homebrew on macOS 10.12.2. The daemon is
running with the default options (`memcached -l localhost`).

I'm running the included `Main.java` with the 1.12.1 release JAR:

```
$ javac -classpath spymemcached-2.12.1.jar Main.java
$ java -cp spymemcached-2.12.1.jar:. Main
```

It consistently produces the following output:

```
2017-01-23 20:40:24.678 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-01-23 20:40:25.170 INFO net.spy.memcached.MemcachedConnection:  Reconnecting due to exception on {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=6, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}
java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:723)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:683)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:436)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
2017-01-23 20:40:25.172 WARN net.spy.memcached.MemcachedConnection:  Closing, and reopening {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=9, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}, attempt 0.
2017-01-23 20:40:25.173 INFO net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  No buffer for current write op, removing
2017-01-23 20:40:25.173 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.174 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.175 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
```

Occasionally the IO thread dies:

```
2017-01-23 20:40:27.116 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11211}" java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleInputQueue(MemcachedConnection.java:582)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:409)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
```

As far as I can tell the condition occurs when
`MemcachedConnection.handleWrites()` exits before having written all of the
current operation because the buffer is full. If the operation causes the
daemon to return an error `OperationImpl.finishedPayload()` will transition the
operation to the `COMPLETE` state (which sets the buffer to null). Since the
operation isn't removed from the write queue a NullPointerException is thrown
on the next `handleWrites()`.

I'm using a large (2MB) data object to trigger an `ERR_2BIG` error, but I
believe the condition could happen with any of the `ERR_BUSY`, `ERR_TEMP_FAIL`,
etc. errors as well:

https://github.com/couchbase/spymemcached/blob/2.12.1/src/main/java/net/spy/memcached/protocol/binary/OperationImpl.java#L244-L252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant