Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in fillWriteBuffer() with binary protocol #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lmikkelsen
Copy link

I apologize for using a PR to report an issue, but I'm unable to find a public
issue tracker for spymemcached.

Under certain circumstances TCPMemcachedNodeImpl.fillWriteBuffer() will throw
a NullPointerException because the current write operation has transitioned to
the COMPLETE state and thus doesn't have a write buffer. This issue has
previously been reported in #16 and
https://code.google.com/archive/p/spymemcached/issues/304.

I'm able to consistently reproduce the issue using spymemcached 1.12.1 and
memcached 1.4.34 installed using Homebrew on macOS 10.12.2. The daemon is
running with the default options (memcached -l localhost).

I'm running the included Main.java with the 1.12.1 release JAR:

$ javac -classpath spymemcached-2.12.1.jar Main.java
$ java -cp spymemcached-2.12.1.jar:. Main

It consistently produces the following output:

2017-01-23 20:40:24.678 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-01-23 20:40:25.170 INFO net.spy.memcached.MemcachedConnection:  Reconnecting due to exception on {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=6, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}
java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:723)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:683)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:436)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
2017-01-23 20:40:25.172 WARN net.spy.memcached.MemcachedConnection:  Closing, and reopening {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=9, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}, attempt 0.
2017-01-23 20:40:25.173 INFO net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  No buffer for current write op, removing
2017-01-23 20:40:25.173 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.174 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.175 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.

Occasionally the IO thread dies:

2017-01-23 20:40:27.116 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11211}" java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleInputQueue(MemcachedConnection.java:582)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:409)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)

As far as I can tell the condition occurs when
MemcachedConnection.handleWrites() exits before having written all of the
current operation because the buffer is full. If the operation causes the
daemon to return an error OperationImpl.finishedPayload() will transition the
operation to the COMPLETE state (which sets the buffer to null). Since the
operation isn't removed from the write queue a NullPointerException is thrown
on the next handleWrites().

I'm using a large (2MB) data object to trigger an ERR_2BIG error, but I
believe the condition could happen with any of the ERR_BUSY, ERR_TEMP_FAIL,
etc. errors as well:

https://github.com/couchbase/spymemcached/blob/2.12.1/src/main/java/net/spy/memcached/protocol/binary/OperationImpl.java#L244-L252

I apologize for using a PR to report an issue, but I'm unable to find a public
issue tracker for spymemcached.

Under certain circumstances `TCPMemcachedNodeImpl.fillWriteBuffer()` will throw
a NullPointerException because the current write operation has transitioned to
the `COMPLETE` state and thus doesn't have a write buffer. This issue has
previously been reported in couchbase#16 and
https://code.google.com/archive/p/spymemcached/issues/304.

I'm able to consistently reproduce the issue using spymemcached 1.12.1 and
memcached 1.4.34 installed using Homebrew on macOS 10.12.2. The daemon is
running with the default options (`memcached -l localhost`).

I'm running the included `Main.java` with the 1.12.1 release JAR:

```
$ javac -classpath spymemcached-2.12.1.jar Main.java
$ java -cp spymemcached-2.12.1.jar:. Main
```

It consistently produces the following output:

```
2017-01-23 20:40:24.678 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2017-01-23 20:40:25.170 INFO net.spy.memcached.MemcachedConnection:  Reconnecting due to exception on {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=6, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}
java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:723)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:683)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:436)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
2017-01-23 20:40:25.172 WARN net.spy.memcached.MemcachedConnection:  Closing, and reopening {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=155, #iq=9, topRop=null, topWop=Cmd: 1 Opaque: 81 Key: test Cas: 0 Exp: 60 Flags: 2048 Data Length: 2097152, toWrite=0, interested=5}, attempt 0.
2017-01-23 20:40:25.173 INFO net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  No buffer for current write op, removing
2017-01-23 20:40:25.173 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.174 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
2017-01-23 20:40:25.175 WARN net.spy.memcached.MemcachedConnection:  Could not redistribute to another node, retrying primary node for test.
```

Occasionally the IO thread dies:

```
2017-01-23 20:40:27.116 INFO net.spy.memcached.MemcachedConnection:  Added {QA sa=localhost/127.0.0.1:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
Exception in thread "Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11211}" java.lang.NullPointerException
	at net.spy.memcached.protocol.TCPMemcachedNodeImpl.fillWriteBuffer(TCPMemcachedNodeImpl.java:206)
	at net.spy.memcached.MemcachedConnection.handleWrites(MemcachedConnection.java:800)
	at net.spy.memcached.MemcachedConnection.handleInputQueue(MemcachedConnection.java:582)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:409)
	at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
```

As far as I can tell the condition occurs when
`MemcachedConnection.handleWrites()` exits before having written all of the
current operation because the buffer is full. If the operation causes the
daemon to return an error `OperationImpl.finishedPayload()` will transition the
operation to the `COMPLETE` state (which sets the buffer to null). Since the
operation isn't removed from the write queue a NullPointerException is thrown
on the next `handleWrites()`.

I'm using a large (2MB) data object to trigger an `ERR_2BIG` error, but I
believe the condition could happen with any of the `ERR_BUSY`, `ERR_TEMP_FAIL`,
etc. errors as well:

https://github.com/couchbase/spymemcached/blob/2.12.1/src/main/java/net/spy/memcached/protocol/binary/OperationImpl.java#L244-L252
@daschl
Copy link

daschl commented Jan 24, 2017

Hi, thanks for opening the PR here - its fine!

Btw the issue tracker used is here: https://issues.couchbase.com/browse/SPY

I'll look into it and report back

@ingenthr ingenthr assigned ingenthr and unassigned daschl Apr 30, 2018
pbruneton pushed a commit to criteo-forks/spymemcached that referenced this pull request Aug 21, 2018
Related to BES-16032, this patch helps to gather more information about
the issue encountered that kill the Memcached IO thread.

Please note that this patch will not kill the Memcached IO thread which
will continue to run.

Previous known bug reports related to this issue are:
couchbase#7
couchbase#17

Change-Id: If1d9887963e96ac6ee4a8ec0dd0c19626e521346
JIRA:WBSC-3570
@mako-fdc
Copy link

A fix for this problem can be found in PR #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants